CC BY 4.0 · ACI open 2021; 05(01): e1-e12
DOI: 10.1055/s-0041-1729982
Original Article

DeepSuggest: Using Neural Networks to Suggest Related Keywords for a Comprehensive Search of Clinical Notes

Soheil Moosavinasab
1   Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
,
Emre Sezgin
1   Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
,
Huan Sun
2   Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, United States
,
Jeffrey Hoffman
3   Department of Pediatrics, Nationwide Children's Hospital, Columbus, Ohio, United States
4   Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio, United States
,
Yungui Huang
1   Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
,
Simon Lin
1   Research Information Solutions and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States
5   Department of Biomedical Informatics and Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio, United States
› Author Affiliations
Funding This study received its financial support from Patient-Centered Outcomes Research Institute (grant number: ME-2017C1-6413).

Abstract

Objective A large amount of clinical data are stored in clinical notes that frequently contain spelling variations, typos, local practice-generated acronyms, synonyms, and informal words. Instead of relying on established but infrequently updated ontologies with keywords limited to formal language, we developed an artificial intelligence (AI) assistant (named “DeepSuggest”) that interactively offers suggestions to expand or pivot queries to help overcome these challenges.

Methods We applied an unsupervised neural network (Word2Vec) to the clinical notes to build keyword contextual similarity matrix. With a user's input query, DeepSuggest generates a list of relevant keywords, including word variations (e.g., formal or informal forms, synonyms, abbreviations, and misspellings) and other relevant words (e.g., related diagnosis, medications, and procedures). Human intelligence is then used to further refine or pivot their query.

Results DeepSuggest learns the semantic and linguistic relationships between the words from a large collection of local notes. Although DeepSuggest is only able to recall 0.54 of Systematized Nomenclature of Medicine (SNOMED) synonyms on average among the top 60 suggested terms, it covers the semantic relationship in our corpus for a larger number of raw concepts (6.3 million) than SNOMED ontology (24,921) and is able to retrieve terms that are not stored in existing ontologies. The precision for the top 60 suggested words averages at 0.72. Usability test resulted that DeepSuggest is able to achieve almost twice the recall on clinical notes compared with Epic (average of 5.6 notes retrieved by DeepSuggest compared with 2.6 by Epic).

Conclusion DeepSuggest showed the ability to improve retrieval of relevant clinical notes when implemented on a local corpus by suggesting spelling variations, acronyms, and semantically related words. It is a promising tool in helping users to achieve a higher recall rate for clinical note searches and thus boosting productivity in clinical practice and research. DeepSuggest can supplement established ontologies for query expansion.



Publication History

Received: 04 February 2020

Accepted: 24 March 2021

Article published online:
06 June 2021

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Natarajan K, Stein D, Jain S, Elhadad N. An analysis of clinical queries in an electronic health record search utility. Int J Med Inform 2010; 79 (07) 515-522
  • 2 Terry AL, Chevendra V, Thind A, Stewart M, Marshall JN, Cejic S. Using your electronic medical record for research: a primer for avoiding pitfalls. Fam Pract 2010; 27 (01) 121-126
  • 3 Hersh WR, Weiner MG, Embi PJ. et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 2013; 51 (08) (Suppl. 03) S30-S37
  • 4 Abdulla AAA, Lin H, Xu B, Banbhrani SK. Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 2016; 17 (Suppl. 07) 238
  • 5 Rivas AR, Iglesias EL, Borrajo L. Study of query expansion techniques and their application in the biomedical information retrieval. ScientificWorldJournal 2014; 2014: 132158
  • 6 Wu H, Toti G, Morley KI. et al. SemEHR: a general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc 2018; 25 (05) 530-537
  • 7 Zhu D, Wu S, Carterette B, Liu H. Using large clinical corpora for query expansion in text-based cohort identification. J Biomed Inform 2014; 49: 275-281
  • 8 Seyfried L, Hanauer DA, Nease D, Albeiruti R, Kavanagh J, Kales HC. Enhanced identification of eligibility for depression research using an electronic medical record search engine. Int J Med Inform 2009; 78 (12) e13-e18
  • 9 Hanauer DA, Mei Q, Law J, Khanna R, Zheng K. Supporting information retrieval from electronic health records: a report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). J Biomed Inform 2015; 55: 290-300
  • 10 Ganesan K, Lloyd S, Sarkar V. Discovering related clinical concepts using large amounts of clinical notes: supplementary issue: big data analytics for health. Biomed Eng Comput Biol 2016; 7s2: BECB.S36155
  • 11 Mikolov T, Sutskever I, Chen K. et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS. Accessed 2013 at: https://arxiv.org/abs/1310.4546
  • 12 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Accessed 2014 at: https://www.aclweb.org/anthology/D14-1162/
  • 13 Minarro-Giménez JA, Marín-Alonso O, Samwald M. Exploring the application of deep learning techniques on medical text corpora. Stud Health Technol Inform 2014; 205: 584-588
  • 14 Turner CA, Jacobs AD, Marques CK. et al. Word2Vec inversion and traditional text classifiers for phenotyping lupus. BMC Med Inform Decis Mak 2017; 17 (01) 126
  • 15 Glicksberg BS, Miotto R, Johnson KW. et al. Automated disease cohort selection using word embeddings from electronic health records. Pac Symp Biocomput 2018; 23: 145-156
  • 16 Wang Y, Liu S, Afzal N. et al. A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 2018; 87: 12-20
  • 17 Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 2018; 22 (05) 1589-1604
  • 18 Wang Y, Rastegar-Mojarad M, Komandur-Elayavilli R, Liu H. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database (Oxford) 2017; 2017: bax091
  • 19 Ye C, Fabbri D. Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews. J Biomed Inform 2018; 83: 63-72
  • 20 Roberts K, Demner-Fushman D, Voorhees EM. et al. Overview of the TREC 2017 Precision Medicine Track. In TREC. Accessed 2017 at: https://pubmed.ncbi.nlm.nih.gov/32776021/
  • 21 Galitz WO. The Essential Guide to User Interface Design: An Introduction to GUI Design Principles and Techniques. John Wiley & Sons; 2007
  • 22 Cheung CS, Tong EL, Cheung NT. et al. Factors associated with adoption of the electronic health record system among primary care physicians. JMIR Med Inform 2013; 1 (01) e1
  • 23 Bernhardsson E. Annoy: Approximate Nearest Neighbors in C++/Python. Python package version 1.13. Accessed 2018 at: https://pypi.org/project/annoy/
  • 24 Norinkavich KM, Howie G, Cariofiles P. Quality improvement study of day surgery for tonsillectomy and adenoidectomy patients. Pediatr Nurs 1995; 21 (04) 341-344
  • 25 Turchin A, Pendergrass ML, Kohane IS. DITTO - a tool for identification of patient cohorts from the text of physician notes in the electronic medical record. AMIA Annu Symp Proc 2005; 2005: 744-748
  • 26 Salvadores M, Alexander PR, Musen MA, Noy NF. BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant Web 2013; 4 (03) 277-284
  • 27 McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012; 22 (03) 276-282
  • 28 Henriksson A, Conway M, Duneld M, Chapman WW. Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records. AMIA Annu Symp Proc 2013; 2013: 600-609
  • 29 Finstad K. The usability metric for user experience. Interact Comput 2010; 22: 323-327
  • 30 Joey B, Trish M. Measuring Desirability: New methods for evaluating desirability in a usability lab setting. Proceedings of Usability Professionals Association. Accessed 2002 at: https://www.researchgate.net/publication/228721563_Measuring_Desirability_New_methods_for_evaluating_desirability_in_a_usability_lab_setting
  • 31 Davis Z, Khansa L. Evaluating the epic electronic medical record system: a dichotomy in perspectives and solution recommendations. Health Policy Technol 2016; 5: 65-73
  • 32 Bian J, Gao B, Liu T-Y. Knowledge-Powered Deep Learning for Word Embedding. In: Lecture Notes in Computer Science. 2014: 132-148
  • 33 Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining 2016 August 13; 855–864. Accessed 2016 at: https://dl.acm.org/doi/10.1145/2939672.2939754
  • 34 Peters ME, Neumann M, Iyyer M. et al. Deep contextualized word representations. arXiv preprint. Accessed 2018 at: https://arxiv.org/abs/1802.05365?ref=hackernoon.com
  • 35 Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Accessed 2018 at: https://arxiv.org/abs/1810.04805
  • 36 Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135-146
  • 37 The DeLone and McLean Model of Information Systems Success. A ten-year update. J Manage Inf Syst 2003; 19: 9-30