Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning

Ismat Mohd Sulaiman; Awang Bulgiba; Sameem Abdul Kareem; Abdul Aziz Latip

doi:10.1055/a-2521-4372

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

CC BY 4.0 · Methods Inf Med
DOI: 10.1055/a-2521-4372

Original Article

Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning

Ismat Mohd Sulaiman

¹Health Informatics Centre, Planning Division, Ministry of Health Malaysia, Putrajaya, Malaysia

,

Awang Bulgiba

²Academy of Sciences, Kuala Lumpur, Malaysia

,

Sameem Abdul Kareem

³Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Wilayah Persekutuan, Malaysia

,

Abdul Aziz Latip

⁴MIMOS Berhad, Kuala Lumpur, Malaysia

› Institutsangaben

› Weitere Informationen

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Abstract

Objective This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.

Methods A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.

Results The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).

Conclusion Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.

Keywords

electronic health record - discharge summaries - word embedding - machine learning - natural language processing - health system management

Publikationsverlauf

Eingereicht: 30. August 2024

Angenommen: 15. Januar 2025

Accepted Manuscript online:
22. Januar 2025

Artikel online veröffentlicht:
11. Februar 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Ahmad MKS, Sakri MSM, Sulaiman IM. et al. MyHarmony: generating statistics from clinical text for monitoring clinical quality indicators. In: 62nd ISI World Statistic Congress. 2019. , 129. Department of Statistics Malaysia (DOSM);

Suche in Google Scholar
2 Latip AA, Domingo MST, Sulaiman IM. et al. Automated SNOMED CT mapping of clinical discharge summary data for cardiology queries in clinical facilities. International Journal of Pharma Medicine and Biological Sciences 2021; 10: 8-16

Crossref PubMed Suche in Google Scholar
3 Ministry of Health Malaysia. Malaysian Health Data Warehouse (MyHDW) 2015–2016 Start up: Initiation. Selangor: Ministry of Health Malaysia; 2017

Suche in Google Scholar
4 Hamiel U, Hecht I, Nemet A. et al. Frequency, comprehension and attitudes of physicians towards abbreviations in the medical record. Postgrad Med J 2018; 94 (1111) 254-258

Crossref PubMed Suche in Google Scholar
5 Koh KC, Lau KM, Yusof SA. et al. A study on the use of abbreviations among doctors and nurses in the medical department of a tertiary hospital in Malaysia. Med J Malaysia 2015; 70 (06) 334-340

PubMed Suche in Google Scholar
6 Shilo L, Shilo G. Analysis of abbreviations used by residents in admission notes and discharge summaries. QJM 2018; 111 (03) 179-183

Crossref PubMed Suche in Google Scholar
7 Wu Y, Xu J, Zhang Y. et al. Clinical abbreviation disambiguation using neural word embeddings. In: Proceedings of BioNLP 15 Beijing,. China: 2015: 171-176 Association for Computational Linguistics;

Suche in Google Scholar
8 Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F. A survey of word embeddings for clinical text. J Biomed Inform 2019; 100S: 100057

Crossref PubMed Suche in Google Scholar
9 Martínez P, Jaber A. Disambiguating clinical abbreviations using pre-trained word embeddings. In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. Porto Alegre, Brazil: 2021: 501-508

Suche in Google Scholar
10 Jaber A, Martínez P. Disambiguating clinical abbreviations using a one-fits-all classifier based on deep learning techniques. Methods Inf Med 2022; 61 (S 01): e28-e34

Thieme Connect PubMed Suche in Google Scholar
11 Kugic A, Schulz S, Kreuzthaler M. Disambiguation of acronyms in clinical narratives with large language models. J Am Med Inform Assoc 2024; 31 (09) 2040-2046

Crossref PubMed Suche in Google Scholar
12 Hosseini M, Hosseini M, Javidan R. Leveraging large language models for clinical abbreviation disambiguation. J Med Syst 2024; 48 (01) 27

Crossref PubMed Suche in Google Scholar
13 Dalianis H. Clinical Text Mining: Secondary Use of Electronic Patient Records. 1st ed. Cham: Springer Publishing Company; , Incorporated; 2018

Crossref Suche in Google Scholar
14 Singhal K, Tao T, Juraj G. et al. Towards Expert-Level Medical Question Answering with Large Language Models. arXiv 2023. ; abs/2305.09617.

Suche in Google Scholar
15 Devlin J, Chang M-W, Lee K. et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota, United States; 2019: 4171-4186

Suche in Google Scholar
16 Huang K, Altosaar J, Rajesh Ranganath R. ClinicalBERT: modeling clinical notes and predicting hospital readmission. In: CHIL 2020 Workshop. Toronto,: 2019

Suche in Google Scholar
17 Lee J, Yoon W, Kim S. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020; 36 (04) 1234-1240

Crossref PubMed Suche in Google Scholar
18 Karabacak M, Margetis K. Embracing large language models for medical applications: opportunities and challenges. Cureus 2023; 15 (05) e39305

PubMed Suche in Google Scholar
19 Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: development, applications, and challenges. Health Care Sci 2023; 2 (04) 255-263

Crossref PubMed Suche in Google Scholar
20 Yang X, Chen A, PourNejatian N. et al. A large language model for electronic health records. NPJ Digit Med 2022; 5 (01) 194

Crossref PubMed Suche in Google Scholar
21 Mikolov T, Grave E, Bojanowski P. et al. Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, 2018. , European Language Resources Association (ELRA).

Suche in Google Scholar
22 Pennington J, Socher R, Manning CD. GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (ed Alessandro Moschitti BP, Walter Daelemans), Doha, Qatar;. 2014: 1532-1543 . Association for Computational Linguistics.

Suche in Google Scholar
23 Bojanowski P, Grave E, Joulin A. et al. Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135-146

Crossref PubMed Suche in Google Scholar
24 Chen Z, He Z, Liu X, Bian J. Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases. BMC Med Inform Decis Mak 2018; 18 (Suppl. 02) 65

Crossref PubMed Suche in Google Scholar
25 Harris ZS. Distributional structure. Word 2015; 1954 (10) 146-162

PubMed Suche in Google Scholar
26 Zhang Y, Chen Q, Yang Z, Lin H, Lu Z. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data 2019; 6 (01) 52

Crossref PubMed Suche in Google Scholar
27 Beltagy I, Lo K, Cohan A. Scibert: a pretrained language model for scientific text. arXiv preprint arXiv:190310676 2019

PubMed Suche in Google Scholar
28 Stenetorp P, Pyysalo S, Topić G. et al. brat: a web-based tool for NLP-assisted test annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (ed Segond F), Avignon, France;. 2012: 102-107 . Association for Computational Linguistics.

Suche in Google Scholar
29 Mikolov T, Sutskever I, Chen K. et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Volume 2. Lake Tahoe, Nevada; 2013: 3111-3119 Curran Associates Inc;

Suche in Google Scholar
30 Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc 2006; 2006: 399-403

PubMed Suche in Google Scholar
31 Moon S, Pakhomov S, Melton GB. Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations. AMIA Annu Symp Proc 2012; 2012: 1310-1319

PubMed Suche in Google Scholar
32 Moon S, Berster B-T, Xu H, Cohen T. Word sense disambiguation of clinical abbreviations with hyperdimensional computing. AMIA Annu Symp Proc 2013; 2013: 1007-1016

PubMed Suche in Google Scholar
33 Wu Y, Denny JC, Trent Rosenbloom S. et al. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017; 24 (e1): e79-e86

Crossref PubMed Suche in Google Scholar
34 Vo C, Cao T. Incremental abbreviation detection in clinical texts. In: 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging,. Vision & Pattern Recognition (icIVPR) Spokane, Washington, United States; 2019: 280-285

Suche in Google Scholar
35 Haibo H, Yang B, Garcia EA. et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Hong Kong: 2008: 1322-1328

Suche in Google Scholar
36 Xu H, Stetson PD, Friedman C. A study of abbreviations in clinical notes. In: AMIA Annu Symp Proc 2007: 821-825

Suche in Google Scholar
37 Charbonnier J, Wartena C. Using Word Embeddings for Unsupervised Acronym Disambiguation. In: 27th International Conference on Computational Linguistics. Santa Fe, New Mexico, United States; 2018: 2610-2619

Suche in Google Scholar
38 Bouzekri K, Sheikh Ahmad MK, Hamdan W. et al. Performing analytics on SNOMED CT coded database, Serdang Hospital use-case. In: SNOMED CT Expo 2015. Montevideo, 2015

Suche in Google Scholar
39 Kumah-Crystal YA, Pirtle CJ, Whyte HM, Goode ES, Anders SH, Lehmann CU. Electronic health record interactions through voice: a review. Appl Clin Inform 2018; 9 (03) 541-552

Thieme Connect PubMed Suche in Google Scholar

RSS-Feed abonnieren

Teilen / Bookmarken

Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning

Abstract

Keywords

Publikationsverlauf

References