Methods Inf Med 2013; 52(01): 51-61
DOI: 10.3414/ME12-01-0040
Original Articles
Schattauer GmbH

An Easily Implemented Method for Abbreviation Expansion for the Medical Domain in Japanese Text

A Preliminary Study
E. Y. Shinohara
1   Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan
,
E. Aramaki
2   Center for Knowledge Structuring, The University of Tokyo, Tokyo, Japan
,
T. Imai
3   Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
,
Y. Miura
4   Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
,
M. Tonoike
4   Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
,
T. Ohkuma
4   Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
,
H. Masuichi
4   Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
,
K. Ohe
1   Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan
5   Graduate School of Medicine and Faculty of Medicine, The University of Tokyo, Tokyo, Japan
› Author Affiliations
Further Information

Publication History

received: 30 April 2012

accepted: 28 October 2012

Publication Date:
20 January 2018 (online)

Preview

Summary

Background: One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences. While the segmentation processing is often a source of problem, it is unknown whether word information is really requisite for abbreviation expansion.

Objectives: The present study examined and compared abbreviation expansion methods with and without the incorporation of word information as a preliminary study.

Methods: We implemented two abbreviation expansion methods: 1) a morpheme-based method that relied on word information and therefore required pre-processing, and 2) a character-based method that relied on simple character information. We compared the expansion accuracies for these two methods using eight medical abbreviations. Experimental data were automatically built as a pseudo-annotated corpus using the Internet.

Results: As a result of the experiment, accuracies for the character-based method were from 0.890 to 0.942 while accuracies for the morpheme-based method were from 0.796 to 0.932. The character-based method significantly outperformed the morpheme-based method for three of the eight abbreviations (p < 0.05). For the remaining five abbreviations, no significant differences were found between the two methods.

Conclusions: Character information may be a good alternative in terms of simplicity to morphological information for abbreviation expansion in English medical abbreviations appeared in Japanese texts on the Internet.