Methods Inf Med 2008; 47(06): 541-548
DOI: 10.3414/ME9128
Original Article
Schattauer GmbH

Automatic DPC Code Selection from Electronic Medical Records

Text Mining Trial of Discharge Summary
T. Suzuki
1   Department of Medical Informatics and Management, Chiba University Hospital, Chiba, Japan
,
H. Yokoi
2   Department of Medical Informatics, Kagawa University Hospital, Kagawa, Japan
,
S. Fujita
3   Department of Welfare and Medical Intelligence, Chiba University Hospital, Chiba, Japan
,
K. Takabayashi
1   Department of Medical Informatics and Management, Chiba University Hospital, Chiba, Japan
› Author Affiliations
Further Information

Publication History

Publication Date:
18 January 2018 (online)

Summary

Objectives: We extracted index terms related to diseases recorded in hospital discharge summaries and examined the capability of the vector space model to select a suitable diagnosis with these terms.

Methods: By morphological analysis, we extracted index terms and constructed an original dictionary for the discharge summary analysis. We chose 125 different DPC (Japanese DRG system) codes for the diseases, each of which had more than 20 cases. We divided them into two groups. One group consisted of 5927 cases from 2004 fiscal year and was used to generate the document vector space according to the DPC. The other group of 3187 cases was collected to verify the automatic DPC selection by using data from 2005 fiscal year. The top 200 extracted index terms for each disease were used to calculate the weight of each disease.

Results: The DPC code obtained by the calculated similarity was compared with the original codes of patients for 125 DPCs of 3187 cases. Eighty percent of the cases matched the diagnosis of the DPC (first six digits) and 56% of the cases completely matched all 14 digits of the DPC.

Conclusions: We demonstrated that we could extract suitable terms for each disease and obtain characteristics, such as the diagnosis, from the calculated vectors. This technique can be used to measure the qualification of discharge summaries and to integrate discharge summaries among different facilities. By the text mining technique, we can characterize the contents of electronic discharge summaries and deduce diagnoses with the data.

 
  • References

  • 1 Howard T. Text retrieval in the legal world. Artificial Intelligence and Law 1995; 3: 5-54.
  • 2 Francesconi E. Peruginelli G. Searching and retrieving legal literature through automated semantic indexing, Proceedings of the 11th International Conference on Artificial Intelligence and Law. 2007 pp 131-139.
  • 3 Buckley C, Salton G, Allan J, Singhal A. Automatic Query Expansion Using SMART : TREC 3. Text Retrieval Conference. Harman DK. (ed). NIST Special Publication; 1995. pp 69-84.
  • 4 Srinivasan P. MeshMap: A textmining tool for Medline. Proceedings of the AMIA Symposium. 2001 pp 642-646.
  • 5 Mendonca EA, Cimino JJ. Automated knowledge extraction from MEDLINE citations. Proc AMIA Symp 2000; pp 575-579.
  • 6 Klar R. Selected impressions on the beginning of the electronic medical record and patient information. Methods Inf Med. 2004; 43: 543-552.
  • 7 Ono H, Takabayashi K, Suzuki T, Yokoi H, Imiya A, Satomura Y. Extraction of diagnosis related terminological information from discharge summary. Medinfo 2004 (CD) 1786
  • 8 Matsumoto Y, Kitauchi A, Yamashita T, Hirano Y, Matsuda H, Takaoka K, Asahara M. Japanese Morphological Analysis System ChaSen version 2.2.1. Nara Institute of Science and Technology;; 2000
  • 9 Fujita S. The interaction of the reason for encouter (ICPC-2) and standardized physical findings (PHYXAM). Proceedings of 2nd Annual Conference of Japan Association of Medical Informatics 2004; 1: 908-909.
  • 10 Salton G, Wong A, Yang CS. A Vector Space Model for Automatic Indexing. CACM 1975; 18: 613-620.
  • 11 Goldman JA, Chu WW, Parker DS, Goldman RM. Term domain distribution analysis: a data mining tool for text databases. Methods Inf Med 1999; 38: 96-101.
  • 12 Mamlin BW, Heinze DT, McDonald CJ. Automated Extraction and Normalization of Findings from Cancer-Related Free-Text Radiology Reports. AMIA Annu Symp Proc 2003 pp 420-424. javascript:PopUpMenu2_Set(Menu11825207);
  • 13 Takemura T, Matsui H, Kubota H, Yoshiharu Sukenobu, Ashida S. Trial of automating classification of radiation reports by knowledge extraction from natural language. Japan Journal of Medical Informatics 2003; 23: 95.
  • 14 Takemura T, Sato J, Kuroda T, Nagase K, Takada A, Tanaka K, Guo J, Yoshihara H. Development of the retrieval system of similar discharge summary in MML (Medical Markup Language) instance. Proceedings of 5th Annual Conference of Japan Association of Medical Informatics 2004; 1: 464-465.
  • 15 Erdal S, Kamal J. An indexing scheme for medical free text searches: a prototype. AMIA Annu Symp Proc 2006; p 918.
  • 16 Iwahashi Y, Ohe K. Trial of automating classification of incident reports. Proceedings of 2nd Annual Conference of Japan Association of Medical Informatics 2004; 1: 804-805.
  • 17 McCowan IA, Moore DC, Nguyen AN, Bowman RV, Clarke BE, Duhig EE, Fry MJ. Collection of Cancer Stage Data by Classifying Free-text Medical Reports. J Am Med Inform Assoc 2007; 14: 736-745.
  • 18 Pakhomov SV, Ruggieri A, Chute CG. Maximum entropy modeling for mining patient medication status from free text. Proc AMIA Symp 2002; pp 587-591.
  • 19 Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 2008; 15: 87-98.
  • 20 Müller M, Mark K, Daumke P, Paetzold J, Roesner A, Klar R. Biomedical data mining in clinical routine: expanding the impact of hospital information systems. Medinfo2007 Proceedings 2007; 129: 340-344.
  • 21 Heinze DT, Morsch ML, Holbrook J. Mining Free-Text Medical Reports. Proceedings of the AMIA Annual Symposium 2002 pp 254-258.