Methods Inf Med 2005; 44(04): 537-545
DOI: 10.1055/s-0038-1634005
Original Article
Schattauer GmbH

MorphoSaurus

Design and Evaluation of an Interlingua-based, Cross-language Document Retrieval Engine for the Medical Domain
K. Markó
1   Medical Informatics Department, Freiburg University Hospital, Freiburg, Germany
,
S. Schulz
1   Medical Informatics Department, Freiburg University Hospital, Freiburg, Germany
,
U. Hahn
2   Jena University Language and Information Engineering (JULIE) Lab, Jena, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: We propose an interlingua-based indexing approach to account for the particular challenges that arise in the design and implementation of cross-language document retrieval systems for the medical domain.

Methods: Documents, as well as queries, are mapped to a language-independent conceptual layer on which retrieval operations are performed. We contrast this approach with the direct translation of German queries to English ones which, subsequently, are matched against English documents.

Results: We evaluate both approaches, interlingua-based and direct translation, on a large medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based document retrieval using German queries on English texts is found, which amounts to 93% of the (monolingual) English baseline.

Conclusions: Most state-of-the-art cross-language information retrieval systems translate user queries to the language(s) of the target documents. In contradistinction to this approach, translating both documents and user queries into a language-independent, concept-like representation format is more beneficial to enhance cross-language retrieval performance.

 
  • References

  • 1 Hersh WR. Information Retrieval. A Health and Biomedical Perspective. New York: Springer; 2nd ed. 2002
  • 2 Berry MW, Dumais ST, O’Brien GW. Using linear algebra for intelligent information retrieval. SIAM Review 1995; 37 (04) 573-95.
  • 3 Fuhr N. Probabilistic models in information retrieval. Computer Journal 1992; 35 (03) 243-55.
  • 4 Schulz S, Hahn U. Morpheme-based, cross-lingual indexing for medical document retrieval. International Journal of Medical Informatics 2000; 59 (03) 87-99.
  • 5 Gonzalo J, Verdejo F, Chugur I. Using EUROWORDNET in a concept-based approach to cross-language text retrieval. Applied Artificial Intelligence 1999; 13 (07) 647-8.
  • 6 Ruiz M, Diekema A, Sheridan P. CINDOR conceptual interlingua document retrieval: TREC-8 evaluation. In Proceedings of the 8th Text REtrieval Conference (TREC-8). Gaithersburg MD. Nov 17-;19, 1999 597-606.
  • 7 Hersh WR, Buckley C, Leone TJ, Hickam DH. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: SIGIR’94 – Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, July 3-6, 1994 192-201.
  • 8 Medical Subject Headings. Bethesda, MD: National Library of Medicine 2004
  • 9 Markó K, Daumke P, Schulz S, Hahn U. Crosslanguage MeSH indexing using morpho-semantic normalization. In: AMIA’03 – Proceedings of the 2003 Annual Symposium of the American Medical Informatics Association. Washington, D.C: November 8-12, 2003 425-9.
  • 10 Unified Medical Language System. Bethesda, MD: National Library of Medicine 2003
  • 11 Porter MF. An algorithm for suffix stripping. Program 1980; 14 (03) 130-7.
  • 12 Oard DW, Diekema AR. Cross-language information retrieval. In Williams ME. editor Annual Review of Information Science and Technology (ARIST), Vol. 33: 1998. Medford, NJ: Information Today; 1998: 223-56.
  • 13 McCarley JS. Should we translate the documents or the queries in cross-language information retrieval? In. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MD, USA, June 20-26. 1999: 208-14.
  • 14 Rosemblat G, Gemoets D, Browne AC, Tse T. Machine translation-supported cross-language in information retrieval for a consumer health resource. In: AMIA’03 – Proceedings of the 2003 Annual Symposium of the American Medical Informatics Association. Washington, D.C: November 8-12; 2003: 564-8.
  • 15 Eichmann D, Ruiz ME, Srinivasan P. Cross-language information retrieval with the UMLS Metathesaurus. In: SIGIR’98 – Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, August 24-28 1998: 72-80.
  • 16 Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management 1988; 24 (05) 513-23.
  • 17 Grefenstette G. editor Cross-Language Information Retrieval. Boston: Kluwer; 1998
  • 18 Savoy J. Report of CLEF-2003 multilingual tracks. In Peters C, Gonzalo J, Braschler M, Kluck M. editors Comparative Evaluation of Multilingual Information Access Systems. Revised Selected Papers of the 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003. Trondheim, Norway, August 21-22. 2003 64-73.
  • 19 Yarowsky D, Wicentowski R. Minimally supervised morphological analysis by multimodal alignment. In: ACL’00 – Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Hong Kong, August 1-8. 2000: 207-16.
  • 20 Goldsmith J. Unsupervised learning of the morphology of a natural language. Computational Linguistics 2001; 27 (02) 153-93.
  • 21 Pirkola A, Hedlund T, Keskustalo H, Järvelin K. Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Information Retrieval 2001; 4 (3/4) 209-30.
  • 22 Pratt AW, Pacak MG. Identification and transformation of terminal morphemes in medical English. Methods Inf Med 1969; 8 (02) 84-90.
  • 23 Savoy J. Recherche d’information dans des corpus plurilingues. Ingénierie des Systèmes d’Information 2002; 7 (1/2) 63-92.
  • 24 Hersh WR, Donohue LC. The SAPHIRE International: A tool for cross-language information retrieval. In: AMIA’98 – Proceedings of the 1998 AMIA Annual Fall Symposium. Orlando, FL, November 7-11. 1998: 673-7.
  • 25 Pacak MG, Norton LM, Dunham GS. Morphosemantic analysis of “-itis” forms in medical language. Methods Inf Med 1980; 19 (02) 99-105.
  • 26 Norton LM, Pacak MG. Morphosemantic analysis of compound word forms denoting surgical procedures. Methods Inf Med 1983; 22 (01) 29-36.
  • 27 Wingert F. Morphosyntaktische Zerlegung von Komposita der medizinischen Sprache. Methods Inf Med 1977; 16 (04) 248-55.
  • 28 Wingert F. Morphologic analysis of compound words. Methods Inf Med 1985; 24 (03) 155-62.
  • 29 Baud RH, Lovis C, Rassinoux A-M, Scherrer J-R. Morpho-semantic parsing of medical expressions. In: AMIA’98 – Proceedings of the 1998 AMIA Annual Fall Symposium. Orlando, FL, November 7-11. 1998: 760-4.
  • 30 Baud RH, Rassinoux A-M, Ruch P, Lovis C, Scherrer J-R. The power and limits of a rule-based morpho-syntactic parser. In: AMIA’99 – Proceedings of the 1999 AMIA Annual Fall Symposium. Washington, D.C., November 6-10. 1999: 22-6.
  • 31 Rector AL, Bechhofer S, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL concept modeling language for medical terminology. Artif Intell Med 1997; 9: 139-71.
  • 32 Volk M, Ripplinger B, Vintar S, Buitelaar P, Raileanu D, Sacaleanu B. Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics 2002; 67 (1/3) 79-112.
  • 33 Clarke CLA, Cormack GV. Shortest-substring retrieval and ranking. ACM Transactions on Information Systems 2000; 18 (01) 44-78.
  • 34 Pirkola A. Morphological typology of languages for IR. Journal of Documentation 2001; 57 (03) 330-48.
  • 35 Salton G. editor The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice-Hall; 1971
  • 36 Schulz S, Markó K, Sbrissia E, Nohama P, Hahn U. Cognate mapping: A heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In: COLING Geneva 2004 – Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, August 23-27. 2004: 813-9.