Methods Inf Med 2012; 51(06): 549-556
DOI: 10.3414/ME11-02-0022
Focus Theme – Original Articles
Schattauer GmbH

An Architecture for Diversity-aware Search for Medical Web Content

K. Denecke
1   University Medical Center, Göttingen, Germany
› Author Affiliations
Further Information

Publication History

received:31 July 2011

accepted:27 September 2012

Publication Date:
20 January 2018 (online)

Summary

Objectives: The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user.

Methods: We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher.

Results: The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures.

Conclusions: It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.

 
  • References

  • 1 Vanhecke T, Barnes M, Zimmerman J, Shoichet S. Pubmed vs. highwire press: a head-to-head comparison of two medical literature search engines. Comput Biol Med 2007; 37 (09) 1252-1258.
  • 2 Daumke P, MarkÓ K, Propat M, Schulz S, Klar R. Biomedical information retrieval across languages. Med Inform Internet Med 2007; 32 (02) 131-147.
  • 3 Daumke P. et al. Subword-based semantic retrieval of clinical and bibliographic documents. Methods Inf Med 2010; 49 (02) 141-147.
  • 4 Krallinger M, Valencia A. Text-mining and information retrieval services for molecular biology. Genome Biol 2005; 6 0224 1-8.
  • 5 Vit Novacek TG, Handschuh S. Coraal towards deep exploitation of textual resources in life sciences. In: Lecture Notes in Computer Science. Berlin/Heidelberg: Springer; 2009. 5651/2009 206-215.
  • 6 Anagnostopoulos A, Broder AZ, Carmel D. Sampling search-engine results. In: Proceedings of the 14th international conference on World Wide Web. New York, USA: 2005: 245-256.
  • 7 Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’96. New York, USA: 1996: 76-84.
  • 8 Radlinski F, Dumais S. Improving personalized web search using result diversification. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, USA: 2006: 691-692.
  • 9 Hwang CS, Lin SF. Hill climbing for diversity retrieval. In. Burgin M, Chowdhury MH, Ham CH, Ludwig SA, Su W, Yenduri S. eds CSIE (05), IEEE Computer Society. 2009: 154-158.
  • 10 Agrawal R, Gollapudi S, Halverson A, and Ieong S. Diversifying search results. In: WSDM’09: Proceedings of the Second ACM International Conference on Web Search and Data Mining. New York, USA: ACM: 2009: 5-14.
  • 11 Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web. New York, USA: 2009: 381-390.
  • 12 Minack E, Demartini G, Nejdl W. Current approaches to search result diversification. In: Proceedings of The First International Workshop on Living Web at the 8th International Semantic Web Conference (ISWC). Westfields Conference Center; Washington DC., USA: Oct 2009
  • 13 Darmoni SJ, Leroy JP, Baudic F, Douyere M, Piot J, Thrion B. CISMeF: a structured Health resource guide. Methods Inf Med 2000; 39 (01) 30-35.
  • 14 Dakka W, Ipeirotis PG. Automatic extraction of useful facet hierarchies from text databases. In: Proc. of ICDE ”08. Washington, DC, USA: 2008: 466-475.
  • 15 Hearst MA. Clustering versus faceted categories for information exploration. Communications of the ACM 2006; 49 (04) 59-61.
  • 16 Diederich J, Balke WT. Automatically created concept graphs using descriptive keywords in the medical domain. Methods Inf Med 2008; 47 (03) 241-250.
  • 17 Hliaoutakis A, Varelas G, Petrakis EGM, Milios E. Medsearch: A retrieval system for medical information based on semantic similarity. In: Proceedings of ECDL. 2006: 512-515.
  • 18 Denecke K. Diversity in medical social media data: Approaches, study and future challenges. International Journal of Computational Linguistics Research 2010; 1 (01) 1-11.
  • 19 McCray AT, Burgun A, Bodenreider O. Aggregating UMLS semantic types for reducing conceptual complexity. Stud Health Technol Inform 2001; 84 (01) 216-220.
  • 20 Luo G. Design and Evaluation of the iMed Intelligent Medical Search Engine. In: Proceedings of the 2009 IEEE International Conference on Data Engineering (ICDE ”09). IEEE Computer Society, Washington, DC, USA 2009; 1379-1390.
  • 21 Esuli A, Sebastiani F. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06). 2006: 417-422.
  • 22 Witten I, Frank E. Data mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann; 2005
  • 23 Mueller H, Boyer C, Gaudinat A, Hersh W, Geissbuhler A. Analyzing web log files of the health on the net HONmedia search engine to define typical image search tasks for image retrieval evaluation. Stud Health Technol Inform 2007; 129 (02) 1319-1323.
  • 24 Radlinski F, Craswell N. Comparing the Sensitivity of Information Retrieval Metrics. Proc. SIGIR. 2010: 667-674.
  • 25 Järvelin K, Kekäläinen J. IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2000: 41-48.
  • 26 Jonquet C, Shah NH, Musen MA. The open biomedical annotator. Summit on Translat Bioinforma. 2009: 56-60.
  • 27 Aronson AR, Bodenreider O, Demner-Fushman D, Fug KW, Lee VK, Mork JG, Névéol A, Peters L, Roger WJ. From indexing the biomedical literature to coding clinical text: experience with MIT and machine learning approaches. ACL; Workshop BioNLP: 2007