Summary
Objectives:
Besides keyword search, navigational search is an important means to find relevant
information in digital object collections. Such navigation is often supported by categorization
systems or thesauri, which provide a hierarchical view on a particular domain and
allow for browsing digital collections. Existing categorization systems, however,
require large and expensive efforts for the manual creation and maintenance. Our Semantic
GrowBag algorithm fully automatically creates concept graphs, i.e. directed graphs
similar to categorization systems but without strong subsumption semantics. This article
sketches our algorithm and evaluates it for the medical domain.
Methods:
Our Semantic GrowBag algorithm uses descriptive keywords and exploits higher-order
cooccurrences between them to create concept graphs (so-called GrowBag graphs) from
annotated object collections. In this study, we have automatically created more than
2000 GrowBag graphs based on the Medline data set to show the applicability of our
algorithm in the medical domain. For the evaluation, we first compared our algorithm
to a baseline algorithm that does not take higher-order co-occurrences into account,
and then compared the resulting GrowBag graphs systematically against the manually
crafted MeSH thesaurus.
Results:
Our experiments revealed that the Semantic GrowBag approach essentially increases
the number of relevant relationships in comparison to a baseline approach by about
50%. Furthermore, the identified relations usually correspond to and hardly ever contradict
to relationships as stated by MeSH.
Conclusions:
The Semantic GrowBag algorithm allows creating concept graphs fully automatically.
While it does not systematically exploit specifics of a domain (such as the fundamental
separation between ‘drugs’ and ‘therapy’ in MeSH), the resulting GrowBag graphs are
nevertheless well-suited to support navigation in digital object collections. Moreover,
they can also be used to help maintaining existing categorization systems based on
the actual usage of categories.
Keywords
Navigational search - subject indexing - concept graphs