Methods Inf Med 2008; 47(05): 425-434
DOI: 10.3414/ME0508
Original Article
Schattauer GmbH

Semantic Structuring of and Information Extraction from Medical Documents Using the UMLS

K. Denecke
1   Research Center L3S, Hanover, Germany
› Author Affiliations
Further Information

Publication History

Received: 04 October 2007

accepted: 27 February 2008

Publication Date:
20 January 2018 (online)

Summary

Objectives: This paper introduces SeReMeD (Semantic Representation of Medical Documents), a method for automatically generating knowledge representations from natural language documents. The suitability of the Unified Medical Language System (UMLS) as domain knowledge for this method is analyzed.

Methods: SeReMeD combines existing language engineering methods and semantic transformation rules for mapping syntactic information to semantic roles. In this way, the relevant content of medical documents is mapped to semantic structures. In order to extract specific data, these semantic structures are searched for concepts and semantic roles. A study is carried out that uses SeReMeD to detect specific data in medical narratives such as documented diagnoses or procedures.

Results: The system is tested on chest X-ray reports. In first evaluations of the system’s performance, the generation of semantic structures achieves a correctness of 80%, whereas the extraction of documented findings obtains values of 93% precision and 83% recall.

Conclusions: The results suggest that the methods described here can be used to accurately extract data from medical narratives, although there is also some potential for improving the results. The proposed methods provide two main benefits. By using existing language engineering methods, the effort required to construct a medical information extraction system is reduced. It is also possible to change the domain knowledge and therefore to create a more (or less) specialized system, capable of handling various medical sub-domains.

 
  • References

  • 1 Mamlin BW, Heinze DT, McDonald CJ. Automated Extraction and Normalization of Findings from Free-Text Radiology Reports. JAMIA (Symposium Supplement) Proceedings of the American Medical Informatics Association Annual Symposium, Fall 2003 pp 420-424.
  • 2 Rassinoux A-M, Baud RH, Scherrer J-R. A multilingual analyser for medical texts. Geneva.: http://mbi.dkfz-heidelberg.de/helios/doc/nlp/Rassinoux94b.html , 1994
  • 3 Mendonca EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. Journal of Biomedical Informatics 2005; 38 (04) 314-321.
  • 4 Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp 2000 pp 270-274.
  • 5 Hripcsak G, Kuperman GJ, Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med 1998; 37 (01) 1-7.
  • 6 Friedman C, Liu H, Shagina L, Johnson S, Hripcsak G. Evaluating the UMLS as a source of lexical knowledge for medical language processing. Proc AMIA Symp 2001 pp 189-193.
  • 7 Zhou X, Han H, Chankai I, Prestrud A, Brooks A. Approaches to text mining for clinical medical records. In: Proceedings of the 2006ACM Symposium on Applied Computing (Dijon, France). SAC ’06, ACM Press, New York, April 2006 pp 235-239.
  • 8 Heinze DT. et al. LifeCode – A Deployed Application for Automated Medical Coding. AI Magazine, Am Ass Artificial Intelligence, Menlo Park, 2001; 22 (02) 78-88.
  • 9 Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001; 34 (05) 301-310.
  • 10 Chapman W. et al. An Algorithm for Identifying Contextual Features from Clinical Text. BioNLP07. 2007 pp 81-88.
  • 11 Aronson A. Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. Proc AMIA. 2001 pp 17-21.
  • 12 Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ. Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc 2000; 7 (06) 593-604.
  • 13 Christensen LM, Haug PJ, Fiszman M. MPLUS: A Probabilistic Medical Language Understanding System. Proc Workshop NLP in the Biomedical Domain; 2002. pp 29-36.
  • 14 Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed Inform 2006; 39 (06) 589-599.
  • 15 Meystre S, Haug PJ. Evaluation of Medical Problem Extraction from Electronic Clinical Documents Using MetaMap Transfer (MMTx). Stud Health Technol Inform 2005; 116: 823-828.
  • 16 Hahn U. et al. Wissensbasiertes Text-Mining mit SyDiKATe (Knowledge based textmining with SynDiKATe). Künstliche Intelligenz; vol. 2/02. Bremen: arendtap Verlag; 2002
  • 17 Spyns P. Natural language processing in medicine: an overview. Methods Inf Med 1996; 35 4-5 285-301.
  • 18 Hobbs JR. Information Extraction from biomedical text. J Biomed Inform 2002; 35: 260-264.
  • 19 Cohen A, Hersh WR. A survey of current work in biomedical text mining. Briefings in Bioinformatics 2005; 6 (01) 57-71 (15).
  • 20 Denecke K, Kohlhof I, Bernauer J. Information Extraction Based on Multiaxial Indexing and Phrase Structure Analysis. 20th International Congress of the European Federation for Medical Informatics (MIE 2006). Maastricht, August 2006
  • 21 McCray AT. An upper level ontology for the biomedical domain. Comp Funct Genom 2003; 4: 80-84.
  • 22 McCray AT, Burgun A, Bodenreider O. Aggregating UMLS semantic types for reducing conceptual complexity. Medinfo 2001; 10 (01) 216-220.
  • 23 Wingert F. SNOMED Manual. Berlin, Heidelberg: Springer-Verlag; 1984
  • 24 Romacker M, Hahn U. Empirical Data for the Semantic Interpretation of Prepositional Phrases in Medical Documents. In: Proceedings of the 2001 AMIA Annual Symposium. Washington 2001 pp 563-567.
  • 25 Rindflesh TC. Integrating natural language processing and biomedical domain knowledge for increased information retrieval effectiveness. Proc 5th Annual Dual-use Technologies and Applications Conference 1995 pp 260-265.
  • 26 Denecke K. Using Semantic Links for Information Extraction and Semantic Representation. ISWC Workshop From Text to Knowledge: The Lexicon/Ontology Interface. Korea, Nov; 2007
  • 27 Sowa JF. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks Cole Publishing Co.; 2000
  • 28 Rindflesch TC, Aronson AR. Semantic processing in information retrieval. In: Proceedings of the 17th Annual Symposium on Computer Applications in Medical Care. 1993 pp 611-615.