Subscribe to RSS
DOI: 10.1055/s-0043-1768726
Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey
Summary
Objectives: This survey aims to provide an overview of the current state of biomedical and clinical Natural Language Processing (NLP) research and practice in Languages other than English (LoE). We pay special attention to data resources, language models, and popular NLP downstream tasks.
Methods: We explore the literature on clinical and biomedical NLP from the years 2020-2022, focusing on the challenges of multilinguality and LoE. We query online databases and manually select relevant publications. We also use recent NLP review papers to identify the possible information lacunae.
Results: Our work confirms the recent trend towards the use of transformer-based language models for a variety of NLP tasks in medical domains. In addition, there has been an increase in the availability of annotated datasets for clinical NLP in LoE, particularly in European languages such as Spanish, German and French. Common NLP tasks addressed in medical NLP research in LoE include information extraction, named entity recognition, normalization, linking, and negation detection. However, there is still a need for the development of annotated datasets and models specifically tailored to the unique characteristics and challenges of medical text in some of these languages, especially low-resources ones. Lastly, this survey highlights the progress of medical NLP in LoE, and helps at identifying opportunities for future research and development in this field.
Keywords
Multilingualism - natural language processing - datasets as topic - language models - shared tasks* These authors contributed equally to this work
6 We use the standardized nomenclature ISO 639-3 for the language codes (https://iso639-3.sil.org/code_tables/639/data).
Publication History
Article published online:
26 December 2023
© 2023. IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Zhou B, Yang G, Shi Z, Ma S. Natural Language Processing for Smart Healthcare. IEEE Rev Biomed Eng 2022;1–17. doi:10.1109/RBME.2022.3210270.
- 2 Aramaki E, Wakamiya S, Yada S, Nakamura Y. Natural Language Processing: from Bedside to Everywhere. Yearb Med Inform 2022 Jun 2; doi:10.1055/s-0042-1742510.
- 3 Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, et al. Neural Natural Language Processing for unstructured data in electronic health records: A review. Comput Sci Rev 2022 Nov 1;46:100511. doi:10.1016/j.cosrev.2022.100511.
- 4 Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semant 2018 Mar 30;9(1):12. doi:10.1186/s13326-018-0179-8.
- 5 Walpole SC. Including papers in languages other than English in systematic reviews: important, feasible, yet often omitted. J Clin Epidemiol 2019 Jul 1;111:127–34. doi:10.1016/j.jclinepi.2019.03.004.
- 6 Dalianis H. Characteristics of Patient Records and Clinical Corpora. Dalianis H, editor. Clinical Text Mining: Secondary Use of Electronic Patient Records. Cham: Springer International Publishing; 2018. p. 21–34. doi:10.1007/978-3-319-78503-5_4.
- 7 Soares F, Yamashita GH. On the crucial role of multilingual biomedical databases in epidemic events (SARS-CoV-2 analysis). Int J Infect Dis 2020 Jul;96:352–4. doi:10.1016/j.ijid.2020.05.023.
- 8 Grabar N, Grouin C. Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing. Yearb Med Inform 2021 Aug;30(1):257–63. doi:10.1055/s-0041-1726528.
- 9 Laparra E, Mascio A, Velupillai S, Miller T. A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records. Yearb Med Inform 2021 Aug;30(1):239–44. doi:10.1055/s-0041-1726522.
- 10 Yang F, Wang X, Ma H, Li J. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models. BMC Med Inform Decis Mak 2021 Jul 30;21(Suppl 2):90. doi:10.1186/s12911-021-01459-0.
- 11 Jati BS, Widyawan S, Muhammad Nur Rizal ST. Multilingual Named Entity Recognition Model for Indonesian Health Insurance Question Answering System. Proceedings of the 3rd International Conference on Information and Communications Technology (ICOIACT). 2020. p. 180–4. doi:10.1109/ICOIACT50329.2020.9332027.
- 12 Gérardin C, Wajsbürt P, Vaillant P, Bellamine A, Carrat F, Tannier X. Multilabel classification of medical concepts for patient clinical profile identification. Artif Intell Med 2022 Jun;128:102311. doi:10.1016/j.artmed.2022.102311.
- 13 Wang B, Xie Q, Pei J, Tiwari P, Li Z, Fu J. Pre-trained Language Models in Biomedical Domain: A Systematic Survey. arXiv; 2021. Available at: http://arxiv.org/abs/2110.05006.
- 14 AlShuweihi M, Salloum SA, Shaalan K. Biomedical Corpora and Natural Language Processing on Clinical Text in Languages Other Than English: A Systematic Review. In: Al-Emran M, Shaalan K, Hassanien AE, editors. Recent Advances in Intelligent Systems and Smart Applications. Cham: Springer International Publishing; 2021. p. 491–509. (Studies in Systems, Decision and Control). doi:10.1007/978-3-030-47411-9_27.
- 15 Ge Y, Guo Y, Yang YC, Al-Garadi MA, Sarker A. Few-shot learning for medical text: A systematic review. arXiv; 2022. Available at: https://arxiv.org/abs/2204.14081.
- 16 Magnini B, Altuna B, Lavelli A, Speranza M, Zanoli R. The E3C Project: Collection and Annotation of a Multilingual Corpus of Clinical Cases. Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 : Bologna, Italy, March 1-3, 2021. Torino: Accademia University Press; 2021. p. 258–64. (Collana dell'Associazione Italiana di Linguistica Computazionale). doi:10.4000/books.aaccademia.8663.
- 17 Miranda-Escalada A, Farre-Maduell E, Lima-Lopez S, Estrada D, Gasco L, Krallinger M. Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources. Procesamiento del Lenguaje Natural 2022;69:241–53. doi:10.26342/2022-69-21.
- 18 Miranda-Escalada, Antonio, Eulàlia Farré, Gasco L, Lima S, Krallinger M. DisTEMIST corpus: detection and normalization of disease mentions in Spanish clinical cases. Zenodo; 2022. doi:10.5281/ZENODO.6408476.
- 19 Blinov P, Nesterov A, Zubkova G, Reshetnikova A, Kokh V, Shivade C. RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain. PhysioNet; 2022. doi:10.13026/gxzd-cf80.
- 20 Shivade C. MedNLI - A Natural Language Inference Dataset For The Clinical Domain. PhysioNet; 2017. doi:10.13026/C2RS98
- 21 Frei J, Kramer F. GERNERMED -- An Open German Medical NER Model. arXiv; 2021. Available at: http://arxiv.org/abs/2109.12104.
- 22 Frei J, Kramer F. Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP. arXiv; 2022. Available at: http://arxiv.org/abs/2208.14493.
- 23 Modersohn L, Schulz S, Lohr C, Hahn U. GraSCCo-The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. Stud Health Technol Inform 2022;296:66–72. doi:10.3233/SHTI220805.
- 24 Borchert F, Lohr C, Modersohn L. GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines. arXiv; 2020. Available at: https://arxiv.org/abs/2007.06400.
- 25 Borchert F, Lohr C, Modersohn L, Witt J, Langer T, Follmann M, et al. GGPONC 2.0-the German clinical guideline corpus for oncology: Curation workflow, annotation policy, baseline NER taggers. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022. p. 3650–60. Available at: https://aclanthology.org/2022.lrec-1.389/.
- 26 Kittner M, Lamping M, Rieke DT, Götze J, Bajwa B, Jelas I, et al. Annotation and initial evaluation of a large annotated German oncological corpus. JAMIA Open 2021 Apr 1;4(2):ooab025. doi:10.1093/jamiaopen/ooab025.
- 27 Grabar N, Dalloux C, Claveau V. CAS: corpus of clinical cases in French. J Biomed Semant 2020 Aug 6;11(1):7. doi:10.1186/s13326-020-00225-x.
- 28 Hiebel N, Ferret O, Fort K, Névéol A. CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives. Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2022. p. 4306–15. Available at: https://aclanthology.org/2022.lrec-1.459.
- 29 Yada S, Nakamura Y, Wakamiya S, Aramaki E. Real-MedNLP: Overview of REAL document-based MEDical Natural Language processing Task. Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies NII. 2022.
- 30 Kim YM, Lee TH. Korean clinical entity recognition from diagnosis text using BERT. BMC Med Inform Decis Mak 2020 Sep 30;20(7):242. doi:10.1186/s12911-020-01241-8.
- 31 Kim YM, Lee TH, Na SO. Constructing novel datasets for intent detection and ner in a korean healthcare advice system: guidelines and empirical results. Appl Intell 2022;53(1):1–21. doi:10.1007/s10489-022-03400-y.
- 32 Sazzed S. BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali). Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin, Ireland: Association for Computational Linguistics; 2022. p. 323–9. doi:10.18653/v1/2022.bionlp-1.31.
- 33 Van Nguyen K, Van Huynh T, Nguyen DV, Nguyen AGT, Nguyen NLT. New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles. ACM Trans Asian Low-Resour Lang Inf Process 2022 Sep 23;21(5):105:1-105:28. doi:10.1145/3527631.
- 34 Boudjellal N, Zhang H, Khan A, Ahmad A, Naseem R, Dai L. A Silver Standard Biomedical Corpus for Arabic Language. Complexity 2020 Oct 9;2020:e8896659. doi:10.1155/2020/8896659.
- 35 Cherednichenko O, Kanishcheva O, Yakovleva O, Arkatov D. Collection and Processing of a Medical Corpus in Ukrainian. Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Systems. Lviv: CEUR; 2020. Available at: https://ceur-ws.org/Vol-2604/paper21.pdf.
- 36 Hammoud J, Vatian A, Dobrenko N, Vedernikov N, Shalyto A, Gusarova N. New Arabic Medical Dataset for Diseases Classification. arXiv; 2021. Available at: http://arxiv.org/abs/2106.15236.
- 37 Zhuoma C, Cairang J, Sangjie D, Yangmao Z, Zhuoma Z. Tibetan Medical Named Entity Recognition Study for Tibetan Clinical Electronic Medical Records. SSRN; 2022 Feb 22; doi: 10.2139/ssrn.4040676.
- 38 Zaghir J, Goldman JP, Bjelogrlic M, Keszthelyi D, Gaudet-Blavignac C, Turbé H, et al. Performance of Machine Learning Methods to Classify French Medical Publications. Stud Health Technol Inform 2022;294:874–5. doi:10.3233/SHTI220613.
- 39 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.
- 40 Lewis P, Ott M, Du J, Stoyanov V. Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art. Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics; 2020. p. 146–57. doi:10.18653/v1/2020.clinicalnlp-1.17.
- 41 Boudjellal N, Zhang H, Khan A, Ahmad A, Naseem R, Shang J, et al. ABioNER: a BERT-based model for Arabic biomedical named-entity recognition. Complexity 2021;2021.
- 42 Kim Y, Kim JH, Lee JM, Jang MJ, Yum YJ, Kim S, et al. A pre-trained BERT for Korean medical natural language processing. Sci Rep 2022;12(1):1–10.
- 43 Bressem KK, Adams LC, Gaudin RA, Tröltzsch D, Hamm B, Makowski MR, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 2020 Nov 1;36(21):5255–61. doi:10.1093/bioinformatics/btaa668.
- 44 Turkmen H, Dikenelli O, Eraslan C, Callı MC. Bioberturk: Exploring Turkish Biomedical Language Model Development Strategies in Low Resource Setting. Research Square; 2022. doi:10.21203/rs.3.rs-2165226/v1.
- 45 Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc 2022 Jan 31;3(1):1–23. doi:10.1145/3458754.
- 46 Carrino CP, Llop J, Pàmies M, Gutiérrez-Fandiño A, Armengol-Estapé J, Silveira-Ocampo J, et al. Pretrained Biomedical Language Models for Clinical NLP in Spanish. Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin, Ireland: Association for Computational Linguistics; 2022. p. 193–9. doi:10.18653/v1/2022.bionlp-1.19.
- 47 Li X, Zhang H, Zhou XH. Chinese clinical named entity recognition with variant neural structures based on BERT methods. J Biomed Inform 2020;107:103422. doi:10.1016/j.jbi.2020.103422.
- 48 Carrino CP, Armengol-Estapé J, Gutiérrez-Fandiño A, Llop-Palao J, Pàmies M, Gonzalez-Agirre A, et al. Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario. arXiv; 2021. Available at: http://arxiv.org/abs/2109.03570.
- 49 de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M. BERTje: A Dutch BERT Model. arXiv; 2019. Available at: https://arxiv.org/abs/1912.09582.
- 50 Tanvir H, Kittask C, Eiche S, Sirts K. EstBERT: A Pretrained Language-Specific BERT for Estonian. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Reykjavik, Iceland (Online): Linköping University Electronic Press, Sweden; 2021. p. 11–9. Available at: https://aclanthology.org/2021.nodalida-main.2.
- 51 Olthof AW, van Ooijen PM, Cornelissen LJ. The natural language processing of radiology requests and reports of chest imaging: Comparing five transformer models' multilabel classification and a proof-of-concept study. Health Informatics J 2022 Dec;28(4):14604582221131198. doi:10.1177/14604582221131198.
- 52 Grancharova M, Dalianis H. Applying and sharing pre-trained BERT-models for named entity recognition and classification in Swedish electronic patient records. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). 2021. p. 231–9. Available at: https://aclanthology.org/2021.nodalida-main.23/.
- 53 Bailly A, Blanc C, Guillotin T. Classification multi-label de cas cliniques avec CamemBERT (Multi-label classification of clinical cases with CamemBERT). Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles Atelier DÉfi Fouille de Textes (DEFT). 2021. p. 14–20. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.2/.
- 54 Schneider ETR, de Souza JVA, Knafou J, Silva e Oliveira LE, Copara J, Gumiel YB, et al. BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition. Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics; 2020. p. 65–72. doi:10.18653/v1/2020.clinicalnlp-1.7.
- 55 Bitton Y, Cohen R, Schifter T, Bachmat E, Elhadad M, Elhadad N. Cross-lingual Unified Medical Language System entity linking in online health communities. J Am Med Inform Assoc 2020 Sep 10;27(10):1585–92. doi:10.1093/jamia/ocaa150.
- 56 Johnson A, Karanasou P, Gaspers J, Klakow D. Cross-lingual Transfer Learning for Japanese Named Entity Recognition. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 2019. p. 182–9. doi:10.18653/v1/N19-2023.
- 57 Wang C, Wang H, Zhuang H, Li W, Han S, Zhang H, et al. Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. J Biomed Inform 2020 Nov;111:103583. doi:10.1016/j.jbi.2020.103583.
- 58 Shi B, Zhang L, Huang J, Zheng H, Wan J, Zhang L. MDA: An Intelligent Medical Data Augmentation Scheme Based on Medical Knowledge Graph for Chinese Medical Tasks. Appl Sci 2022;12(20):10655. doi: 10.3390/app122010655.
- 59 Kim YM, Lee TH. Korean clinical entity recognition from diagnosis text using BERT. BMC Med Inform Decis Mak 2020 Sep 30;20(7):242. doi:10.1186/s12911-020-01241-8.
- 60 Kawazoe Y, Shibata D, Shinohara E, Aramaki E, Ohe K. A clinical specific BERT developed using a huge Japanese clinical text corpus. PLoS One 2021 Nov 9;16(11):e0259763. doi:10.1371/journal.pone.0259763.
- 61 de Souza JVA, Schneider ETR, Cezar JO, Silva LE, Gumiel YB, Paraiso EC, et al. A multilabel approach to Portuguese clinical named entity recognition. J Health Inform 2020;12.
- 62 Mitrofan M, Pais V. Improving Romanian BioNER Using a Biologically Inspired System. Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics; 2022. p. 316–22. doi:10.18653/v1/2022.bionlp-1.30.
- 63 Kaplar A, Stošović M, Kaplar A, Brković V, Naumović R, Kovačević A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. Int J Med Inf 2022 Aug;164:104805. doi:10.1016/j.ijmedinf.2022.104805.
- 64 Frei J, Frei-Stuber L, Kramer F. GERNERMED++: Transfer Learning in German Medical NLP. ArXiv; 2022. Available at: https://arxiv.org/abs/2206.14504.
- 65 Wajsbürt P, Sarfati A, Tannier X. Medical concept normalization in French using multilingual terminologies and contextual embeddings. J Biomed Inform 2021 Feb 1;114:103684. doi:10.1016/j.jbi.2021.103684.
- 66 Budiarti RPN, Sukaridhoto S, Al-Hafidz IA, Satrio NA. Symptoms identification of ICD-11 based on clinical NLP mobile apps for diagnosing the disease (ICD-11). Bali Med J 2022;11(3):1162–7.
- 67 French E, McInnes BT. An overview of biomedical entity linking throughout the years. J Biomed Inform 2022 Dec 2;104252. doi:10.1016/j.jbi.2022.104252.
- 68 Névéol A, Grouin C, Leixa J, Rosset S, Zweigenbaum P. The Quaero French Medical Corpus: A Resource for Medical Entity Recognition and Normalization. Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing. 2014. p. 24–30.
- 69 Gonzalez-Agirre A, Marimon M, Intxaurrondo A, Rabal O, Villegas M, Krallinger M. PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track. Proceedings of the 5th Workshop on BioNLP Open Shared Tasks. Hong Kong, China: Association for Computational Linguistics; 2019. p. 1–10. doi:10.18653/v1/D19-5701.
- 70 Miranda-Escalada A, Farré E, Krallinger M. Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results. Proceedings of the Iberian Languages Evaluation Forum. 2020;303–23.
- 71 Liu F, Vulić I, Korhonen A, Collier N. Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021. p. 565–74. doi:10.18653/v1/2021.acl-short.72.
- 72 Shaitarova A, Rinaldi F. Negation typology and general representation models for cross-lingual zero-shot negation scope resolution in Russian, French, and Spanish. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 2021. p. 15–23. doi:10.18653/v1/2021.naacl-srw.3.
- 73 Jiménez-Zafra SM, Morante R, Martín-Valdivia MT, Ureña-López LA. Corpora Annotated with Negation: An Overview. Comput Linguist 2020;46(1):1–52. doi:10.1162/coli_a_00371.
- 74 Mahany A, Khaled H, Elmitwally NS, Aljohani N, Ghoniemy S. Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications. Appl Sci 2022 Jan;12(10):5209. doi:10.3390/app12105209.
- 75 Marimon M, Vivaldi J, Bel N. Annotation of negation in the IULA Spanish Clinical Record Corpus. Proceedings of the Workshop Computational Semantics Beyond Events and Roles. Association for Computational Linguistics; 2017. p. 43–52. doi:10.18653/v1/W17-1807.
- 76 Dalloux C, Claveau V, Grabar N. Speculation and Negation detection in French biomedical corpora. Proceedings of the International Conference on Recent Advances in Natural Language Processing. INCOMA Ltd.; 2019. p. 223–32. doi:10.26615/978-954-452-056-4_026.
- 77 Pabón OS, Montenegro O, Torrente M, González AR, Provencio M, Menasalvas E. Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach. PeerJ Comput Sci 2022 Mar 7;8:e913. doi:10.7717/peerj-cs.913.
- 78 Lima Lopez S, Perez N, Cuadros M, Rigau G. NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts. Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2020. p. 5772–81. Available at: https://aclanthology.org/2020.lrec-1.708.
- 79 Oliveira LESE, Peters AC, da Silva AMP, Gebeluca CP, Gumiel YB, Cintho LMM, et al. SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. J Biomed Semant 2022 May 8;13(1):13. doi:10.1186/s13326-022-00269-1.
- 80 Shaitarova A, Furrer L, Rinaldi F. Cross-lingual transfer-learning approach to negation scope resolution. Proceedings of the Swiss Text Analytics Conference & Conference on Natural Language Processing. 2020 Jun 25; doi:10.5167/UZH-197355.
- 81 Mirzapour M, Abdaoui A, Tchechmedjiev A, Digan W, Bringay S, Jonquet C. French FastContext: A publicly accessible system for detecting negation, temporality and experiencer in French clinical notes. J Biomed Inform 2021 May 1;117:103733. doi:10.1016/j.jbi.2021.103733.
- 82 Santiso S, Pérez A, Casillas A, Oronoz M. Neural negated entity recognition in Spanish electronic health records. J Biomed Inform 2020 May;105:103419. doi:10.1016/j.jbi.2020.103419.
- 83 Dalloux C, Claveau V, Grabar N, Oliveira LES, Moro CMC, Gumiel YB, et al. Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora. Nat Lang Eng 2020;1–21. doi:10.1017/S1351324920000352.
- 84 Funkner A, Balabaeva K, Kovalchuk S. Negation Detection for Clinical Text Mining in Russian. Digit Pers Health Med 2020;342–6. doi:10.3233/SHTI200179.
- 85 Hartmann M, Søgaard A. Multilingual Negation Scope Resolution for Clinical Text. Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis. Association for Computational Linguistics; 2021. p. 7–18. Available at: https://aclanthology.org/2021.louhi-1.2.
- 86 Rivera Zavala R, Martinez P. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study. JMIR Med Inform 2020 Dec 3;8(12):e18953. doi:10.2196/18953.
- 87 Gasco L, Estrada-Zavala D, Farré-Maduell E, Lima-López S, Miranda-Escalada A, Krallinger M. Overview of the SocialDisNER shared task on detection of diseases mentions from healthcare related and patient generated social media content: methods, evaluation and corpora. Proceedings of the Seventh Social Media Mining for Health (# SMM4H) Workshop and Shared Task. 2022. Available at: https://aclanthology.org/2022.smm4h-1.48/.
- 88 Lima-López S, Farré-Maduell E, Miranda-Escalada A, Brivá-Iglesias V, Krallinger M. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. Procesamiento del Lenguaje Natural 2021;67:243–56. doi:10.26342/2021-67-21.
- 89 Chapman WW, Nadkarni PM, Hirschman L, D'avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011 Sep-Oct;18(5):540-3. doi:10.1136/amiajnl-2011-000465.
- 90 Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, et al. Building gold standard corpora for medical natural language processing tasks. AMIA Annu Symp Proc 2012:144-53.
- 91 Wissler L, Almashraee M, Díaz DM, Paschke A. The Gold Standard in Corpus Annotation. Proceedings of the 5th IEEE Germany Student Conference. 2014;21. doi:10.13140/2.1.4316.3523.
- 92 Miranda-Escalada A, Gonzalez-Agirre A, Armengol-Estapé J, Krallinger M. Overview of Automatic Clinical Coding: Annotations, Guidelines, and Solutions for non-English Clinical Cases at CodiEsp Track of CLEF eHealth 2020. CLEF (Working Notes). 2020.
- 93 Costa J, Lopes I, Carreiro AV, Ribeiro D, Soares C. Fraunhofer AICOS at CLEF eHealth 2020 Task 1: Clinical Code Extraction From Textual Data Using Fine-Tuned BERT Models. CLEF (Working Notes). 2020.
- 94 Perea-Ortega JM, López-Úbeda P, Díaz-Galiano MC, Valdivia MTM, López LAU. SINAI at CLEF eHealth 2020: Testing Different pre-trained Word Embeddings for Clinical Coding in Spanish. CLEF (Working Notes). 2020.
- 95 Cossin S, Jouhet V. IAM at CLEF eHealth 2020: Concept Annotation in Spanish Electronic Health Records. CLEF (Working Notes). 2020.
- 96 Mayya V, Kamath SS, Sugumaran V. LAT A − Label Attention Transformer Architectures for ICD-10 Coding of Unstructured Clinical Notes. Proceedings of the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology. 2021. p. 1–7.
- 97 García-Santa N, Cetina K, Cappellato L, Eickhoff C, Ferro N, Nevéol A. FLE at CLEF eHealth 2020: Text Mining and Semantic Knowledge for Automated Clinical Encoding. CLEF (Working Notes). 2020.
- 98 de la Iglesia I, Martínez-Puente M, Platas A, San Miguel I, Atutxa A, Gojenola K. Media team: Clef-2020 ehealth task 1: Multilingual information extraction-codiesp. CLEF (Working Notes). 2020.
- 99 Cotik V, Alemany LA, Filippo D, Luque F, Roller R, Vivaldi J, et al. Overview of CLEF eHealth Task 1-SpRadIE: A challenge on information extraction from Spanish Radiology Reports. CLEF (Working Notes). 2021. p. 732–50.
- 100 Solarte-Pabón O, Montenegro O, Blazquez-Herranz A, Saputro H, Rodriguez-González A, Menasalvas E. Information extraction from Spanish radiology reports using multilingual BERT. CLEF Ehealth. 2021. p. 834–45.
- 101 Fabregat H, Duque A, Araujo L, Martínez-Romo J. LSI_UNED at CLEF eHealth2021: Exploring the effects of transfer learning in negation detection and entity recognition in clinical texts. CLEF (Working Notes). 2021. p. 780–93.
- 102 Ruas P, Neves A, Andrade VD, Couto FM, Aragón ME. LasigeBioTM at CANTEMIST: Named Entity Recognition and Normalization of Tumour Morphology Entities and Clinical Coding of Spanish Health-related Documents. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 422–37.
- 103 Osborne JD, O'Leary T, Del Monte J, Sasse K. Identification of Cancer Entities in Clinical Text Combining Transformers with Dictionary Features. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 458–67.
- 104 Han JC, Tsai RTH. NCU-IISR: Pre-trained Language Model for CANTEMIST Named Entity Recognition. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 347–51.
- 105 López-Úbeda P, Díaz-Galiano MC, Martín-Valdivia MT, López LAU. Extracting Neoplasms Morphology Mentions in Spanish Clinical Cases through Word Embeddings. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 324–34.
- 106 de Vargas Romero G, Segura-Bedmar I. Exploring Deep Learning for Named Entity Recognition of Tumor Morphology Mentions. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 396–411.
- 107 Vunikili R. Clinical NER using Spanish BERT Embeddings. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 505–11
- 108 García-Pablos A, Perez N, Cuadros M, Zotova E. Vicomtech at eHealth-KD Challenge 2020: Deep End-to-End Model for Entity and Relation Extraction in Medical Text. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 102–11.
- 109 Pavanelli L, Schneider ETR, Gumiel YB, Ferreira TC, de Oliveira LFA, de Souza JVA, et al. PUCRJ-PUCPR-UFMG at eHealth-KD Challenge 2021: A Multilingual BERT-based System for Joint Entity Recognition and Relation Extraction. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 683–91.
- 110 Balouchzahi F, Sidorov G, Shashirekha HL. ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 747–57.
- 111 Harkawat J, Vaidhya T. Spanish Pre-Trained Language Models for HealthCare Industry. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 796–802.
- 112 Schwarz M, Chapman K, Häussler B. Multilingual Medical Entity Recognition and Cross-lingual Zero-Shot Linking with Facebook AI Similarity Search. Proceedings of the Iberian Languages Evaluation Forum. 2022.
- 113 Avram AM, Mitrofan M, P is V. Species Entity Recognition Using a Neural Inhibitory Mechanism. Proceedings of the Iberian Languages Evaluation Forum. 2022.
- 114 Tamayo A, Burgos D, Gelbukh A. ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT+ Rules. Proceedings of the Iberian Languages Evaluation Forum. 2022.
- 115 Piad-Morffis A, Gutiérrez Y, Canizares-Diaz H, Estevez-Velarde S, Muñoz R, Montoyo A, et al. Overview of the ehealth knowledge discovery challenge at IberLEF 2020. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 71–84.
- 116 Miranda-Escalada A, Gascó L, Lima-Lopez S, Farré-Maduell E, Estrada D, Nentidis A, et al. Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources. CLEF (Working Notes). 2022; p. 179–203.
- 117 Chizhikova M, Collado-Montañez J, López-Úbeda P, Díaz-Galiano MC, Ureña-López LA, Martín-Valdivia MT. SINAI at CLEF 2022: Leveraging biomedical transformers to detect and normalize disease mentions. CLEF (Working Notes). 2022. p. 265–73.
- 118 Neves A. Unicage at DISTE.MIST-Named Entity Recognition system using only Bash and Unicage tools. CLEF (Working Notes); 2022. p. 325–34
- 119 Borchert F, Schapranow MP. HPI-DHC@ BioASQ DisTEMIST: Spanish Biomedical Entity Linking with Pre-trained Transformers and Cross-lingual Candidate Retrieval. CLEF (Working Notes). 2022. p. 244–58.
- 120 Cardon R, Grabar N, Grouin C, Hamon T. Présentation de la campagne d'évaluation DEFT 2020: similarité textuelle en domaine ouvert et extraction d'information précise dans des cas cliniques. DEFT; 2020. Available at: https://aclanthology.org/2020.jeptalnrecital-deft.1/.
- 121 Copara Zea JL, Knafou JDM, Naderi N, Moro C, Ruch P, Teodoro D. Contextualized French Language Models for Biomedical Named Entity Recognition. Actes de la 6e conférence conjointe Journées d'Études sur la Parole, Traitement Automatique des Langues Naturelles, Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Atelier DÉfi Fouille de Textes (DEFT). 2020. p. 36–48. Available at: https://aclanthology.org/2020.jeptalnrecital-deft.4/.
- 122 Naderi N, Knafou J, Copara J, Ruch P, Teodoro D. Ensemble of deep masked language models for effective named entity recognition in health and life science corpora. Front Res Metr Anal 2021;6. doi:10.3389/frma.2021.689803.
- 123 Grouin C, Grabar N, Illouz G. Classification de cas cliniques et évaluation automatique de réponses d'étudiants : présentation de la campagne DEFT 2021. Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles, Atelier DÉfi Fouille de Textes (DEFT). 2021. p. 1–13. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.1.
- 124 Mannion A, Chevalier T, Schwab D, Goeuriot L. Identification de profil clinique du patient : Une approche de classification de séquences utilisant des modèles de langage français contextualisés. Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. 2021. p. 54–62. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.6/.
- 125 Dou Z, Yamamoto T. Overview of NTCIR-16. Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies. 2022. p. 3–7.
- 126 Tamayo A, Gelbukh A, Burgos DA. NLP-CIC-WFU at SocialDisNER: Disease mention extraction in Spanish tweets using transfer learning and search by propagation. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 19–22. Available at: https://aclanthology.org/2022.smm4h-1.6.
- 127 Cetina K, García-Santa N. FRE at SocialDisNER: Joint learning of language models for named entity recognition. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 68–70. Available at: https://aclanthology.org/2022.smm4h-1.20.
- 128 Verma H, Bagherzadeh P, Bergler S. CLaCLab at SocialDisNER: Using medical gazetteers for named-entity recognition of disease mentions in Spanish tweets. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 68–70. Available at: https://aclanthology.org/2022.smm4h-1.16.
- 129 Montañés-Salas R, López-Bosque I, García-Garcés L, del-Hoyo-Alonso R. ITAINNOVA at SocialDisNER: A Transformers cocktail for disease identification in social media in Spanish. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 71–4. Available at: https://aclanthology.org/2022.smm4h-1.21.
- 130 Fu J, Li S, Yuan HM, Li Z, Gan Z, Chen Y, et al. CASIA@SMM4H'22: A uniform health information mining system for multilingual social media texts. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. p. 143–7. Available at: https://aclanthology.org/2022.smm4h-1.39.
- 131 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, et al. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020. p. 8440–51. doi:10.18653/v1/2020.acl-main.747.
- 132 Canete J, Chaperon G, Fuentes R, Ho JH, Kang H, Pérez J. Spanish pre-trained BERT model and evaluation data. Practical ML for Developing Countries Workshop. 2020:1–10.
- 133 López-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Detection of Tumor Morphology Mentions in Clinical Reports in Spanish Using Transformers. Proceedings of the International Work-Conference on Artificial Neural Networks. Springer; 2021. p. 24–35.
- 134 Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de la Clergerie ÉV, et al. CamemBERT: a Tasty French Language Model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020. p. 7203–19. doi:10.18653/v1/2020.acl-main.645.
- 135 Le H, Vial L, Frej J, Segonne V, Coavoux M, Lecouteux B, et al. FlauBERT: Unsupervised Language Model Pre-training for French. ArXiv; 2020; Available at: http://arxiv.org/abs/1912.05372.
- 136 Otegi A, Agirre A, Campos JA, Soroa A, Agirre E. Conversational question answering in low resource scenarios: A dataset and case study for Basque. Proceedings of the 12th Language Resources and Evaluation Conference. 2020. p. 436–42. Available at: https://aclanthology.org/2020.lrec-1.55/.
- 137 Gutiérrez Fandiño A, Armengol Estapé J, Pàmies M, Llop Palao J, Silveira Ocampo J, Pio Carrino C, et al. MarIA: Spanish language models. Procesamiento del Lenguaje Natural 2022;68:39–60. doi:10.26342/2022-68-3.
- 138 López-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Transformers for Clinical Coding in Spanish. IEEE Access 2021;9:72387–97. doi:10.1109/ACCESS.2021.3080085.
- 139 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2020. p. 1877–901. Available at: https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- 140 Biswas S. ChatGPT and the Future of Medical Writing. Radiology 2023 Apr;307(2):e223312. doi:10.1148/radiol.223312.
- 141 Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Appl Sci 2021 Jan;11(14):6421. doi:10.3390/app11146421.
- 142 Andrade VDT, Ruas P, Couto FM. Named Entity Recognition and Linking: a Portuguese and Spanish Oncological Parallel Corpus. bioRxiv; 2021. p. 2021.09.16.460605. doi:10.1101/2021.09.16.460605.
- 143 Ma H, Yang F, Ren J, Li N, Dai M, Wang X, et al. ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application. BMC Med Inform Decis Mak 2020 Jul 9;20(3):122. doi:10.1186/s12911-020-1116-1.
- 144 Mititelu VB, Mitrofan M. The Romanian medical treebank-SiMoNERo. Proceedings of the 15th Edition of the International Conference on Linguistic Resources and Tools for Natural Language Processing–ConsILR. 2020. p. 7–16.
- 145 Blinov P, Reshetnikova A, Nesterov A, Zubkova G, Kokh V. RuMedBench: A Russian Medical Language Understanding Benchmark. Proceedings of the International Conference on Artificial Intelligence in Medicine. 2022. p. 383–92.
- 146 Campillos-Llanos L, Valverde-Mateos A, Capllonch-Carrión A. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med Inform Decis Mak 2021;21(69). doi:10.1186/s12911-021-01395-z.