DeepSuggest: Using Neural Networks to Suggest Related Keywords for a Comprehensive Search of Clinical Notes

Soheil Moosavinasab; Emre Sezgin; Huan Sun; Jeffrey Hoffman; Yungui Huang; Simon Lin

doi:10.1055/s-0041-1729982

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00034447.xml

Download PDF

CC BY 4.0 · ACI open 2021; 05(01): e1-e12
DOI: 10.1055/s-0041-1729982

Original Article

DeepSuggest: Using Neural Networks to Suggest Related Keywords for a Comprehensive Search of Clinical Notes

Authors

Soheil Moosavinasab

¹Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
Emre Sezgin

¹Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
Huan Sun

²Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, United States
Jeffrey Hoffman

³Department of Pediatrics, Nationwide Children's Hospital, Columbus, Ohio, United States

⁴Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio, United States
Yungui Huang

¹Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
Simon Lin

¹Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States

⁵Department of Biomedical Informatics and Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio, United States

Funding This study received its financial support from Patient-Centered Outcomes Research Institute (grant number: ME-2017C1-6413).

Further Information

Also available at

Permissions and Reprints

Abstract

Objective A large amount of clinical data are stored in clinical notes that frequently contain spelling variations, typos, local practice-generated acronyms, synonyms, and informal words. Instead of relying on established but infrequently updated ontologies with keywords limited to formal language, we developed an artificial intelligence (AI) assistant (named “DeepSuggest”) that interactively offers suggestions to expand or pivot queries to help overcome these challenges.

Methods We applied an unsupervised neural network (Word2Vec) to the clinical notes to build keyword contextual similarity matrix. With a user's input query, DeepSuggest generates a list of relevant keywords, including word variations (e.g., formal or informal forms, synonyms, abbreviations, and misspellings) and other relevant words (e.g., related diagnosis, medications, and procedures). Human intelligence is then used to further refine or pivot their query.

Results DeepSuggest learns the semantic and linguistic relationships between the words from a large collection of local notes. Although DeepSuggest is only able to recall 0.54 of Systematized Nomenclature of Medicine (SNOMED) synonyms on average among the top 60 suggested terms, it covers the semantic relationship in our corpus for a larger number of raw concepts (6.3 million) than SNOMED ontology (24,921) and is able to retrieve terms that are not stored in existing ontologies. The precision for the top 60 suggested words averages at 0.72. Usability test resulted that DeepSuggest is able to achieve almost twice the recall on clinical notes compared with Epic (average of 5.6 notes retrieved by DeepSuggest compared with 2.6 by Epic).

Conclusion DeepSuggest showed the ability to improve retrieval of relevant clinical notes when implemented on a local corpus by suggesting spelling variations, acronyms, and semantically related words. It is a promising tool in helping users to achieve a higher recall rate for clinical note searches and thus boosting productivity in clinical practice and research. DeepSuggest can supplement established ontologies for query expansion.

Keywords

electronic health records - search engine - neural networks - computer - information storage and retrieval - user-computer interface - pediatrics - electronic data processing

Publication History

Received: 04 February 2020

Accepted: 24 March 2021

Article published online:
06 June 2021

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Natarajan K, Stein D, Jain S, Elhadad N. An analysis of clinical queries in an electronic health record search utility. Int J Med Inform 2010; 79 (07) 515-522

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Terry AL, Chevendra V, Thind A, Stewart M, Marshall JN, Cejic S. Using your electronic medical record for research: a primer for avoiding pitfalls. Fam Pract 2010; 27 (01) 121-126

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Hersh WR, Weiner MG, Embi PJ. et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 2013; 51 (08) (Suppl. 03) S30-S37

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Abdulla AAA, Lin H, Xu B, Banbhrani SK. Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 2016; 17 (Suppl. 07) 238

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Rivas AR, Iglesias EL, Borrajo L. Study of query expansion techniques and their application in the biomedical information retrieval. ScientificWorldJournal 2014; 2014: 132158

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Wu H, Toti G, Morley KI. et al. SemEHR: a general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc 2018; 25 (05) 530-537

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Zhu D, Wu S, Carterette B, Liu H. Using large clinical corpora for query expansion in text-based cohort identification. J Biomed Inform 2014; 49: 275-281

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Seyfried L, Hanauer DA, Nease D, Albeiruti R, Kavanagh J, Kales HC. Enhanced identification of eligibility for depression research using an electronic medical record search engine. Int J Med Inform 2009; 78 (12) e13-e18

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Hanauer DA, Mei Q, Law J, Khanna R, Zheng K. Supporting information retrieval from electronic health records: a report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). J Biomed Inform 2015; 55: 290-300

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Ganesan K, Lloyd S, Sarkar V. Discovering related clinical concepts using large amounts of clinical notes: supplementary issue: big data analytics for health. Biomed Eng Comput Biol 2016; 7s2: BECB.S36155

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Mikolov T, Sutskever I, Chen K. et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS. Accessed 2013 at: https://arxiv.org/abs/1310.4546

Download RIS citation
12 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Accessed 2014 at: https://www.aclweb.org/anthology/D14-1162/

Download RIS citation
13 Minarro-Giménez JA, Marín-Alonso O, Samwald M. Exploring the application of deep learning techniques on medical text corpora. Stud Health Technol Inform 2014; 205: 584-588

PubMed Search in Google Scholar
Download RIS citation
14 Turner CA, Jacobs AD, Marques CK. et al. Word2Vec inversion and traditional text classifiers for phenotyping lupus. BMC Med Inform Decis Mak 2017; 17 (01) 126

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Glicksberg BS, Miotto R, Johnson KW. et al. Automated disease cohort selection using word embeddings from electronic health records. Pac Symp Biocomput 2018; 23: 145-156

PubMed Search in Google Scholar
Download RIS citation
16 Wang Y, Liu S, Afzal N. et al. A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 2018; 87: 12-20

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 2018; 22 (05) 1589-1604

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Wang Y, Rastegar-Mojarad M, Komandur-Elayavilli R, Liu H. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database (Oxford) 2017; 2017: bax091

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Ye C, Fabbri D. Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews. J Biomed Inform 2018; 83: 63-72

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Roberts K, Demner-Fushman D, Voorhees EM. et al. Overview of the TREC 2017 Precision Medicine Track. In TREC. Accessed 2017 at: https://pubmed.ncbi.nlm.nih.gov/32776021/

Download RIS citation
21 Galitz WO. The Essential Guide to User Interface Design: An Introduction to GUI Design Principles and Techniques. John Wiley & Sons; 2007

Search in Google Scholar
Download RIS citation
22 Cheung CS, Tong EL, Cheung NT. et al. Factors associated with adoption of the electronic health record system among primary care physicians. JMIR Med Inform 2013; 1 (01) e1

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Bernhardsson E. Annoy: Approximate Nearest Neighbors in C++/Python. Python package version 1.13. Accessed 2018 at: https://pypi.org/project/annoy/

Download RIS citation
24 Norinkavich KM, Howie G, Cariofiles P. Quality improvement study of day surgery for tonsillectomy and adenoidectomy patients. Pediatr Nurs 1995; 21 (04) 341-344

PubMed Search in Google Scholar
Download RIS citation
25 Turchin A, Pendergrass ML, Kohane IS. DITTO - a tool for identification of patient cohorts from the text of physician notes in the electronic medical record. AMIA Annu Symp Proc 2005; 2005: 744-748

PubMed Search in Google Scholar
Download RIS citation
26 Salvadores M, Alexander PR, Musen MA, Noy NF. BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant Web 2013; 4 (03) 277-284

Crossref PubMed Search in Google Scholar
Download RIS citation
27 McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012; 22 (03) 276-282

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Henriksson A, Conway M, Duneld M, Chapman WW. Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records. AMIA Annu Symp Proc 2013; 2013: 600-609

PubMed Search in Google Scholar
Download RIS citation
29 Finstad K. The usability metric for user experience. Interact Comput 2010; 22: 323-327

Crossref Search in Google Scholar
Download RIS citation
30 Joey B, Trish M. Measuring Desirability: New methods for evaluating desirability in a usability lab setting. Proceedings of Usability Professionals Association. Accessed 2002 at: https://www.researchgate.net/publication/228721563_Measuring_Desirability_New_methods_for_evaluating_desirability_in_a_usability_lab_setting

Download RIS citation
31 Davis Z, Khansa L. Evaluating the epic electronic medical record system: a dichotomy in perspectives and solution recommendations. Health Policy Technol 2016; 5: 65-73

Crossref Search in Google Scholar
Download RIS citation
32 Bian J, Gao B, Liu T-Y. Knowledge-Powered Deep Learning for Word Embedding. In: Lecture Notes in Computer Science. 2014: 132-148

Search in Google Scholar
Download RIS citation
33 Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining 2016 August 13; 855–864. Accessed 2016 at: https://dl.acm.org/doi/10.1145/2939672.2939754

Download RIS citation
34 Peters ME, Neumann M, Iyyer M. et al. Deep contextualized word representations. arXiv preprint. Accessed 2018 at: https://arxiv.org/abs/1802.05365?ref=hackernoon.com

Download RIS citation
35 Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Accessed 2018 at: https://arxiv.org/abs/1810.04805

Download RIS citation
36 Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135-146

Crossref Search in Google Scholar
Download RIS citation
37 The DeLone and McLean Model of Information Systems Success. A ten-year update. J Manage Inf Syst 2003; 19: 9-30

Crossref Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

DeepSuggest: Using Neural Networks to Suggest Related Keywords for a Comprehensive Search of Clinical Notes

Authors

Abstract

Keywords

Publication History

References