Information Extraction from Electronic Health Records Written in Spanish for Epidemic Intelligence

Authors: Javier Petri, Pilar Barcena Barbeira and Viviana Cotik.

Abstract:
Automatic symptom detection from electronic health records is a valuable source for event-based surveillance systems. In this study, we develop tools to automatically detect symptoms associated with febrile illnesses in electronic health records written in Spanish. Therefore, we use a custom corpus, comprising 6228 expertly labeled and approximately 1 million unlabeled health reports. Our approach involved fine-tuning state-of-the-art named entity recognition models, including BiLSTM-CRF and transformer-based models like RoBERTa. We focused on domain-adaptive and task-adaptive models to enhance performance: the former were pretrained on biomedical corpora, while the latter were further pretrained on our unlabeled health reports. Despite computational constraints, our models demonstrated promising results, with RoBERTa-Clinico, a task-adaptive transformer model pretrained in our unlabeled corpus, showing the best micro recall performance (79.30), and 70.83 micro F1 score, which are comparable to results in similar studies. In this way, we contribute to the limited body of work in BioNLP in Spanish.

More information:
http://dx.doi.org/10.1007/978-3-031-80366-6_35

2025-04-07T10:48:09-03:00 7/April/2025|Papers|
Go to Top