Information Extraction from Electronic Health Records Written in Spanish for Epidemic Intelligence

Authors: Javier Petri, Pilar Barcena Barbeira and Viviana Cotik.

Abstract:
Automatic symptom detection from electronic health records is a valuable source for event-based surveillance systems. In this study, we develop tools to automatically detect symptoms associated with febrile illnesses in electronic health records written in Spanish. Therefore, we use a custom corpus, comprising 6228 expertly labeled and approximately 1 million unlabeled health reports. Our approach involved fine-tuning state-of-the-art named entity recognition models, including BiLSTM-CRF and transformer-based models like RoBERTa. We focused on domain-adaptive and task-adaptive models to enhance performance: the former were pretrained on biomedical corpora, while the latter were further pretrained on our unlabeled health reports. Despite computational constraints, our models demonstrated promising results, with RoBERTa-Clinico, a task-adaptive transformer model pretrained in our unlabeled corpus, showing the best micro recall performance (79.30), and 70.83 micro F1 score, which are comparable to results in similar studies. In this way, we contribute to the limited body of work in BioNLP in Spanish.

More information:
http://dx.doi.org/10.1007/978-3-031-80366-6_35

Andres Juarez2025-04-07T10:48:09-03:00 7/April/2025|Papers|

Activity homogeneity: a measure for comparing time discretization and state quantization in ODE simulation

A note on busy beaver bounds

EnCodecMAE: Leveraging Neural Codecs for Universal Audio Representation Learning

Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching

Modal Abstractions for Smart Contract Validation

Integrating Bayesian and neural networks models for eye movement prediction in hybrid search

Algorithms to prove the maximum number of MUBs in arbitrary dimensión

Rauzy complexity and block entropy

Hybrid resource allocation control in cyber-physical systems: a novel simulation-driven methodology with applications to UAVs

Mapping Semantic Segmentation to Point Clouds Using Structure from Motion for Forest Analysis

A multi-scale agent-based model of aerosol-mediated indoor infections in heterogeneous scenarios

Non-crossing H-graphs: a generalization of proper interval graphs admitting FPT algorithms

The discrepancy estimate of the Champernowne constant

No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study

Information Extraction from Electronic Health Records Written in Spanish for Epidemic Intelligence

Compartir en las redes

Related Posts