Methods Inf Med 2008; 47(03): 241-250
DOI: 10.3414/ME0492
Original Article
Schattauer GmbH

Automatically Created Concept Graphs Using Descriptive Keywords in the Medical Domain

J. Diederich
1   L3S Research Center, Leibniz Universität, Hannover, Germany
,
W.-T. Balke
1   L3S Research Center, Leibniz Universität, Hannover, Germany
› Institutsangaben
Weitere Informationen

Publikationsverlauf

Received: 21. Juni 2007

accepted: 02. Oktober 2007

Publikationsdatum:
18. Januar 2018 (online)

Summary

Objectives: Besides keyword search, navigational search is an important means to find relevant information in digital object collections. Such navigation is often supported by categorization systems or thesauri, which provide a hierarchical view on a particular domain and allow for browsing digital collections. Existing categorization systems, however, require large and expensive efforts for the manual creation and maintenance. Our Semantic GrowBag algorithm fully automatically creates concept graphs, i.e. directed graphs similar to categorization systems but without strong subsumption semantics. This article sketches our algorithm and evaluates it for the medical domain.

Methods: Our Semantic GrowBag algorithm uses descriptive keywords and exploits higher-order cooccurrences between them to create concept graphs (so-called GrowBag graphs) from annotated object collections. In this study, we have automatically created more than 2000 GrowBag graphs based on the Medline data set to show the applicability of our algorithm in the medical domain. For the evaluation, we first compared our algorithm to a baseline algorithm that does not take higher-order co-occurrences into account, and then compared the resulting GrowBag graphs systematically against the manually crafted MeSH thesaurus.

Results: Our experiments revealed that the Semantic GrowBag approach essentially increases the number of relevant relationships in comparison to a baseline approach by about 50%. Furthermore, the identified relations usually correspond to and hardly ever contradict to relationships as stated by MeSH.

Conclusions: The Semantic GrowBag algorithm allows creating concept graphs fully automatically. While it does not systematically exploit specifics of a domain (such as the fundamental separation between ‘drugs’ and ‘therapy’ in MeSH), the resulting GrowBag graphs are nevertheless well-suited to support navigation in digital object collections. Moreover, they can also be used to help maintaining existing categorization systems based on the actual usage of categories.

 
  • References

  • 1 Begelman G, Keller P, Smadja F. Automated Tag Clustering: Improving Search and Exploration in the Tag Space. In: Collaborative Web Tagging Workshop at WWW2006 Edinburgh, UK: ACM Press; 2006
  • 2 Chuang SL, Chien LF. Taxonomy Generation for Text Segments: A practical web-based approach. ACM Trans Inf Syst 2005; 23 (04) 363-396.
  • 3 Cimiano P, Handschuh S, Staab S. Towards the self-annotating web. In Int. Conf. on the World Wide Web (WWW). New York, NY, USA: ACM; 2004. pp 462-471.
  • 4 Cimiano P, Völker J. Text2onto – a Framework for Ontology Learning and Data-driven Change Discovery. In: Int. Conf. on Applications of Natural Language to Information Systems (NLDB). Alicante, Spain: Springer; 2005. pp 227-238.
  • 5 Cimino JJ, Zhu X. The practical Impact of Ontologies on Biomedical Informatics. IMIA Yearbook of Medical Informatics. Methods Inf Med 2006; 45 (01) 124-135.
  • 6 Creinin MD, Vittinghoff E, Keder L, Darney PD, Tiller G. Methotrexate and misoprostol for early abortion: a multicenter trial. I. Safety and efficacy. Contraception 1996; 53: 321-327.
  • 7 Creinin MD, Vittinghoff E, Schaff E, Klaisle C, Darney PD, Dean C. Medical abortion with oral methotrexate and vaginal misoprostol. Obstet Gynecol 1997; 90: 611-616.
  • 8 Diederich J, Balke WT. The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems. In: Proc. of Europ. Conf. on Research and Advanced Technology for Digital Libraries (ECDL), Budapest, Hungary: Springer; 2007
  • 9 Diederich J, Balke WT, Thaden U. Demonstrating the Semantic GrowBag: Automatically Creating Topic Facets for Faceted DBLP. In: Proc. of ACM IEEE Joint Conf. on Digital Libraries (JCDL), Vancouver, Canada: IEEE; 2007
  • 10 Golder S, Huberman B. Usage patterns of collaborative tagging systems. J of Information Science 2006; 32 (02) 198-208.
  • 11 Hausknecht RU. Methotrexate and misoprostol to terminate early pregnancy. New England J Med 1995; 333: 537-540.
  • 12 Hearst MA. Automatic Acquisition of Hyponyms from Large Text Corpora. In: Int. Conf. on Computational Linguistics, Nantes, France: 1992. pp 539-545.
  • 13 Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med 1993; 32: 281-291.
  • 14 Page L, Brin S, Motwani R, Winograd T. The Pagerank Citation Ranking: Bringing Order to the Web. Tech. Report, Stanford University, 1998 (available at http://www.stanford.edu/~backrub pageranksub.ps last accessed on 10.8.2007).
  • 15 Sanderson M, Croft B. Deriving concept hierarchies from text. In: Proc. of Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. Berkeley, CA, USA: ACM; 1999. pp 206-213.
  • 16 Schütze H. Automatic word sense discrimination. Comput Linguist 1998; 24 (01) 97-123.
  • 17 Stuckenschmidt H, de Waard A, Bhogal R, Fluit C, Kampman A, van Buel J. et al. A topic-based browser for large online resources. In: Proc. of Int. Conf. on Knowledge Engineering and Knowledge Management (EKAW), Whittlebury Hall, UK: Springer; 2004. pp 433-448.
  • 18 http://www.L3S.de/growbag/Medline.
  • 19 http://www.nlm.nih.gov/pubs/factsheetsmedline.html (last accessed on 10.08.2007).
  • 20 http://www.nlm.nih.gov/pubs/factsheets/mesh. html (last accessed on 10.08.2007).
  • 21 http://dblp.L3S.de (last accessed on 10.08.2007). Updated weekly.
  • 22 Peleg M, Tu S. Decision support, knowledge representation and management in medicine. Methods Inf Med 2006; 45 (01) 72-80.
  • 23 Jaspers MW, Knaup P, Schmidt D. The computerized patient record: where do we stand?. Methods Inf Med 2006; 45 (01) 29-39.
  • 24 Kuhn KA, Giuse DA. From hospital information systems to health information systems. Problems, challenges, perspectives. Methods Inf Med 2001; 40 (04) 275-287.
  • 25 http://db.jhuccp.org/ics-wpd/popweb/keywords POPLINE_Keyword_Guide.pdf (last accessed on 10.08.2007).