Keywords
Awards and prizes - decision making - computer-assisted - medical informatics/trends - natural language processing - semantics
Introduction
Clinical natural language processing (NLP) is defined as NLP applied to clinical texts or aimed at a clinical outcome. This encompasses NLP applied to texts in Electronic Health Records (EHRs), which is the case of the bulk of information extraction for decision support or clinical research. We also consider as clinically relevant applications and research addressing the analysis of patient-authored text or speech for public health or diagnosis purposes. This year’s survey paper reports on the increasing variety of texts used for clinical NLP, including social media [[1]]. The best papers selected this year address increasingly complex problems such as lexical semantics [[2]–[3]], coreference resolution [[4]], and discourse analysis [[5]]. In addition, they show how research results can effectively translate into freely available tools [[4]–[6]]. More specifically, the best papers offer a framework for abbreviation disambiguation [[6]] and coreference resolution [[4]], classification of clinically useful sentences [[2]], analysis of counseling conversations to improve support to patients with mental disorders [[5]], and grounding of gradable adjectives [[3]]. They provide a contribution that ranges from emerging original foundational methods [[3]] to transitioning solid established research results to a practical clinical setting [[6]].
Method
Papers were retrieved using the same search strategies as in 2016, relying on PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) and the Association of Computational Linguistics (ACL) Anthology Searchbench (http://aclasb.dfki.de/). The PubMed query used minimal metadata and free-text keywords: (English[LA] AND journal article[PT] AND 2016[dp] AND hasabstract[text]) AND ((medical OR clinical OR natural) AND “language processing”). The ACL Anthology query restricted our selection to the most selective journals (TACL, Computational Linguistics), conferences (ACL, EMNLP, NAACL, COLING), and workshops (ACL BioNLP). It used the free text keywords medical, clinical and health. The collection of papers using these queries brought back 335 titles and abstracts from MEDLINE and 33 from ACL Anthology, resulting in a total selection of 368 papers.
Then, we used the results of the 2016 selection to train a logistic regression classifier to automatically rank the selected papers from most relevant to select to less relevant [[7]]. One section editor (AN) then used the BibReview tool (https://pypi.python.org/pypi/BibReview) to classify the papers based on titles and abstracts into four categories: (1) Off Topic (OT) which identified papers that focused on topics outside of the scope of clinical NLP, such as biological natural language processing, knowledge representation, psycholinguistics, or even image processing; (2) No (N) for papers that did not provide a contribution to either NLP methodology or clinical outcome. Review papers and correspondence were included in this category in order to keep only original research contributions; (3) Maybe (M) for papers that offered a contribution to NLP aimed at a clinical outcome. Similarly to last year, papers reporting on participation to NLP challenges were included in this category because even though challenges provide valuable contributions to the field, challenge papers are usually polished working notes that report on work that has not reached the level of maturity expected from a “best paper”; (4) Yes (Y) for papers that did so outstandingly, or with high novelty.
The 31 papers initially selected in the Y category were grouped by broad topics and ranked by both section editors. The full text of the top 25 papers was then browsed to refine this selection in order to cover each of the topics while ensuring that the final selection comprised a variety of topics, authors, and venues. In the list of references provided at the end of the synopsis, a star (*) indicates papers that were in the final selection of the 15 candidate best papers.
Results
We present an overview of clinical NLP publications that cover the topics addressed by the research community in 2016. [Table 1] lists the five papers selected as the best papers; a summary of each paper appears in the appendix of this synopsis.
Table 1
Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2017 in the section ‘Clinical Natural Language Processing’. The articles are listed in alphabetical order of the first author’s surname.
Section
Clinical Natural Language Processing
|
-
Althoff, T, Clark K, Leskovec, J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist 2016 (4):463-76.
-
Kilicoglu, H, Demner-Fushman, D. Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text. PLoS One 2016 Mar 2;11(3):e0148538.
-
Morid, MA, Fiszman, M, Raja, K, Jonnalagadda, SR, Del Fiol, G. Classification of clinically useful sentences in clinical evidence resources. J Biomed Inform 2016 Apr;60:14-22.
-
Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM. Identification, characterization, and grounding of gradable terms in clinical text. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 2016:17-26.
-
Wu, Y, Denny, JC, Rosenbloom, ST, Miller, RA, Giuse, DA, Wang, L, Blanquicett, C, Soysal, E, Xu, J, Xu, H. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017 Apr 1;24(e1):e79-e86.
|
Applications of Clinical NLP are Increasing in Number and Diversity
Applications of Clinical NLP are Increasing in Number and Diversity
Applications of NLP by far outnumber research on foundational methods: 71% of clinical NLP articles reviewed reported on an application of NLP and 29% on foundational methods.
NLP tools ranging from basic keyword/ dictionary-based extraction to advanced concept extraction (e.g. using cTAKES, PyContext or MEDLEE) are used for direct analysis of text of clinical interest or as part of a pipeline extracting features for state-of-the-art classification tools. These solutions are overwhelmingly applied to radiology reports, which are an easier source for efficient NLP than for instance discharge summaries: the latter are much more diverse and extended in terms of vocabulary, language structures, and types of information [[8]]. Two reviews of NLP of radiology reports were published in 2016: Cai et al., [[9]] included a tutorial intended for the Radiographics journal audience, while Pons et al., [[10]] performed a systematic review of “publications describing NLP methods that support practical applications in radiology.”
We observe a continued diversification of the types of texts used for clinical NLP work, which extends this year to resident pages [[11]], text message conversations [[5]], consumer product reviews [[12]], while social media and Twitter in particular continue to emerge as a strong source for public health monitoring for a number of issues including drug abuse [[13]], alcohol [[14]], adverse drug effects [[15]], and drug repurposing [[16]], and also motivate work aiming at identifying tweets that generally report a personal health experience [[17]].
Another emerging trend of research seeks to use NLP methods towards patient empowerment by making health literature more accessible through improved readability [[18]] or text simplification [[19]], making progress in clinical question answering for patients [[20]–[21]]. Conversely, some efforts seek insight from patients’ experiences to improve the delivery of healthcare. For instance, Hawkins et al., and Ranard et al., automatically extracted patient-perceived quality of care in hospitals [[22]–[23]], while Bahk et al., [[24]] and Strekalova et al., [[25]], respectively, analyzed patient sentiment on vaccination and cancer care options.
Assessing the quality of EHR content is another topic where NLP plays a role. Scholte et al., [[26]] noted the need for NLP to process text fields to evaluate the quality of physiotherapy care. Ford et al., [[27]] studied the delay in coding rheumatoid arthritis and they observed that over one fifth of patients had text entries for disease-modifying anti-rheumatic drugs more than 14 days before rheumatoid arthritis was coded. Kaufman et al., [[28]] assessed the feasibility of using dictation followed by NLP to enter content into the EHR.
Text Classification and Information Extraction Remain Strong Applications
Text Classification and Information Extraction Remain Strong Applications
A large part of the reported research aims at categorizing a patient’s clinical record into predefined categories. This is often done for phenotyping: obesity [[29]], axial spondyloarthritis (AxSpA) [[30]], childhood obesity [[31]], stroke risk factor [[32]], multiple sclerosis [[33]], hypertension [[34]], sometimes state-wide and in “real time” as in [[35]] for diabetes mellitus, and sometimes to distinguish between finer-grained cancer information types as in [[36]]. Generic methods are designed to facilitate adaptation to new phenotypes [[37]].
More generally, text classification is used to identify patient records for the purpose of retrospective or prospective studies aiming at improving care pathways for a category of patients, or at characterizing patient cohorts. A wide range of topics were addressed in 2016: to predict the protocol and priority of MRI brain protocols [[38]], to identify heart failure patients with ineffective self-management status [[39]], to predict suicidal ideation and heightened psychiatric symptoms [[40]], to detect long-bone fractures [[41]], abdominal aortic aneurysms [[42]], liver cirrhosis [[43]], and the region containing an abnormality [[44]] in radiology reports, to predict the diagnosis of breast cancer in mammography reports [[45]], to identify pediatric traumatic brain injury in CT reports [[46]], hepatocellular cancer in pathology and radiology reports [[47]], cerebral aneurysms [[48]], first episode psychosis [[49]], non-alcoholic liver fatty disease [[50]], celiac disease [[51]], tree stand falls [[52]], as well as acute coronary syndrome from admission records in Chinese [[53]].
Classification is also applied to predict pre- or post-discharge events such as patients that will be medically ready for discharge from the neonatal intensive care unit in the subsequent 2–10 days [[54]], to predict suicide [[55]–[56]], to forecast the daily bed needs [[57]], to predict patients at high risk for high imaging utilization based on radiology reports [[58]], early psychiatric readmission [[59]], opioid abuse of patients considered for opioid therapy [[60]], and to produce a predictive risk report for hospitalized heart failure patients [[61]].
Compared to text categorization, information extraction focuses on specific information elements found in clinical text. This includes cancer stage in patients with lung cancer [[62]], average weekly doses of drugs [[63]], adverse events in robotic surgery [[64]], indwelling urinary catheter and urinary symptoms [[65]], liver tumor characteristics from radiology reports [[66]], left ventricular ejection fraction from echocardiography reports [[67]], wound information (wound type, pressure ulcer stage, wound size, anatomic location, and wound treatment) from free text clinical notes [[39]], and congestive heart failure medication information from Veteran Administration EHRs [[68]]. Clinical trials attract much attention for tasks that require specific information extraction for public health research, including comparative effectiveness research [[69]], mapping of disease research [[70]], categorizing adverse events by age and type [[71]], or characterizing cancer drug toxicity [[72]].
Several efforts addressing information extraction and text classification tasks benefit from an integrated approach. For instance, Botsis et al., developed a common decision support environment for medical product safety surveillance [[73]], while others used NLP to build databases of clinically useful information, such as mutation-disease associations [[74]], drug side-effects by parsing product labels [[75]], and genetic alteration information in cancer trials [[76]]. This includes work on annotated text corpora, e.g. a corpus of tweets reporting events related to the author’s own health [[17]], and specialized vocabularies, for instance to detect substance abuse terms [[77]].
Foundational Methods of Clinical NLP Take Both Innovative and Consolidating Directions
Foundational Methods of Clinical NLP Take Both Innovative and Consolidating Directions
In 2016, the efforts addressing foundational methods of clinical NLP continued to explore core topics that deserve sustained attention such as entity recognition and normalization or relation extraction. Emerging topics were also addressed, including discourse or lexical semantics [[3]–[78]].
Entity extraction is one core topic of clinical NLP that still represents a challenge at the level of mention extraction as well as entity linking or normalization for types of entity such as diseases [[79]]. The context of utterance of entities, such as speculation, continues to be explored including in languages other than English such as Chinese [[80]]. While personal health identifiers are entities which have received sustained interest, research directions in the field of de-identification are switching from entity recognition to revisiting evaluation methods [[81]–[82]] and annotation efforts optimizing de-identification efforts [[83]].
By linking together multiple mentions of the same entity or event, anaphora and coreference resolution may improve the quality of information extraction. Progress was made on anaphora resolution in MEDLINE abstracts [[84]], and led to a general toolbox for coreference resolution [[4]] – described in more detail below – tested on both MEDLINE abstracts and the clinical texts of the i2b2 2011 shared task. While the intrinsic performance of coreference resolution can be evaluated on the i2b2 clinical texts [[84]–[85]], its actual impact on an information extraction task is a more concrete test of its relevance [[66]].
The development of NLP tools is an active area fueled by continued research efforts on these hard problems: we welcome three new open-source tools for concept recognition [[86]], co-reference resolution [[4]], and temporal analysis [[87]], which are important components for biomedical language processing. Kaggal et al., [[88]] described an institutional implementation of a big data-empowered clinical NLP infrastructure.
The challenge of the lack of annotated data for training supervised systems was addressed in 2016 by distant supervision to help extract relevant sentences from articles to assist the systematic review process [[89]].
More work was performed to compute language-related scores based on patient speech using NLP and Machine Learning methods. Clark et al., [[90]] created novel scores that are better predictors of the evolution from mild cognitive impairment to Alzheimer’s disease. Takano et al., [[91]] identified linguistic features in recalled memories that helped distinguish between specific and non-specific memories in the Autobiographical Memory Test. Luo et al., [[92]] discriminated between adults with autism spectrum disorder and a control group. Althoff et al., [[5]] analyzed doctor-patient counseling interactions in text messages through a variety of techniques and evidence actionable conversation strategies that are liable to improve counseling practice. This work is described in more detail below.
Conclusion
Clinical Natural Language Processing continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. However, we note that fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English. The best papers of this year illustrate these trends.
Appendix: Content Summaries of Selected Best Papers for the 2017 IMIA Yearbook, Section Clinical Natural Language Processing
Appendix: Content Summaries of Selected Best Papers for the 2017 IMIA Yearbook, Section Clinical Natural Language Processing
Althoff, T, Clark K, Leskovec, J Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health Trans Assoc Comput Linguist 2016;4:463-47
This paper presents an analysis of counselor/patient interactions in text message conversations collected from a free hotline. The authors characterize counseling conversation relying on a wide range of methods including discourse analysis, statistical language modeling, and sentiment analysis. They evidence actionable conversation strategies that are associated with better conversation outcomes: successful counselors exhibit an ability to adapt creatively to each new counseling situation, their reaction to ambiguity with check questions and appreciation language is well received, they conduct conversations effciently by spending less time to understand the patient’s issue and more time in problem-solving, and they facilitate a change in perspective. These results may be used towards improving practice recommendations and counselor training.
Kilicoglu, H, Demner-Fushman, D Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text PLoS One 2016 Mar 2;11(3):e0148538
This paper addresses coreference resolution, which is a fundamental and challenging task in natural language processing. The authors offer a very instructive and informed definition of coreference as well as a comprehensive literature review of the state of the art in coreference resolution. They describe and evaluate a general method for biomedical coreference resolution called BioSCoRes, implemented in a publicly available toolkit. The evaluation broadly covers the biomedical domain by relying on two existing corpora (one clinical, one biological), as well as a newly developed and shared corpus of drug label inserts. It also offers a detailed performance report on each type of coreference. BioSCoRes obtains overall results that come close (on the clinical corpus) or exceed (on the other two corpora) the state-of-the art.
Morid, MA, Fiszman, M, Raja, K, Jonnalagadda, SR, Del Fiol, G Classification of clinically useful sentences in clinical evidence resources J Biomed Inform 2016 Apr;60:14–22
This paper presents a method for classifying sentences from a variety of evidence-based clinical decision support knowledge sources according to clinical usefulness. This work offers a specific defnition of actionable, clinically useful sentences. Then, it proceeds to explore advanced NLP methods to extract rich features for sentence classification. A feature ablation study supports the proposed feature-rich approach and shows that the system performs with an F-measure of at least 73% on different text genres. This work is exemplary in exploring fundamental approaches towards the practical goal of providing real-time clinical information at the point of care, while setting up a technical framework that will facilitate the integration of the research results in a clinical setting.
Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM Identification, characterization, and grounding of gradable terms in clinical text Proceedings of the 15th Workshop on Biomedical Natural Language Processing 2016:17–26
This paper presents an analysis of gradable adjectives found in clinical text to qualify medical findings. The authors use existing methods to automatically identify gradable adjectives in clinical corpora and estimate prevalence at about 30% of adjectives. Focusing on four clinical phenomena, the authors show that the gradable adjectives used to qualify these phenomena in a clinical corpus can be reliably associated to numerical value intervals using a probabilistic model. This very original work relies on a regular expression-based analysis of a clinical description of selected phenomena comprising a gradable adjective along with a numerical value. It offers a first step towards the interpretation of statements using gradable adjectives by grounding their meaning to value intervals, which would facilitate clinical decision-making.
Wu, Y, Denny, JC, Rosenbloom, ST, Miller, RA, Giuse, DA, Wang, L, Blanquicett, C, Soysal, E, Xu, J, Xu, H A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD) J Am Med Inform Assoc 2017 Apr 1; 24(e1):e79-e86
This paper presents an open source framework for clinical abbreviation recognition and disambiguation via entity linking to the Unified Medical Language System (UMLS). This work builds on a large body of research on different aspects of abbreviation resolution including recognition of abbreviated entities in clinical text, the extraction of possible long forms or senses from knowledge bases, and the disambiguation of a given entity that leverages context to identify the unique sense of a given abbreviated entity. The overall framework offers performance that exceeds the state-of-the-art on two shared datasets. In addition, the tool may be tailored to specific needs by allowing the use of customized resources.