Methods Inf Med 2013; 52(04): 308-316
DOI: 10.3414/ME12-01-0029
Original Articles
Schattauer GmbH

Formative Evaluation of Ontology Learning Methods for Entity Discovery by Using Existing Ontologies as Reference Standards

K. Liu
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
,
K. J. Mitchell
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
,
W. W. Chapman
2   Division of Biomedical Informatics, University of california San Diego, San Diego, CA, USA
,
G. K. Savova
3   Childrens´ Hospital Boston and Harvard Medical School, Boston, MA, USA
,
N. Sioutos
4   Lockheed Martin Corporation, Fairfax, VA, USA
,
D. L. Rubin
5   Department of Radiology, Stanford University, Stanford, CA, USA
,
R. S. Crowley
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
6   Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
› Author Affiliations
Further Information

Publication History

received: 04 April 2012

accepted: 02 February 2013

Publication Date:
20 January 2018 (online)

Summary

Objective: Developing a two-step method for formative evaluation of statistical Ontology Learning (OL) algorithms that leverages existing biomedical ontologies as reference standards.

Methods: In the first step optimum parameters are established. A ‘gap list’ of entities is generated by finding the set of entities present in a later version of the ontology that are not present in an earlier version of the ontology. A named entity recognition system is used to identify entities in a corpus of biomedical documents that are present in the ‘gap list’, generating a reference standard. The output of the algorithm (new entity candidates), produced by statistical methods, is subsequently compared against this reference standard. An OL method that performs perfectly will be able to learn all of the terms in this reference stand ard. Using evaluation metrics and precision-recall curves for different thresholds and parameters, we compute the optimum parameters for each method. In the second step, human judges with exper tise in ontology development evaluate each candidate suggested by the algorithm con figured with the optimum parameters previously established. These judgments are used to compute two performance metrics developed from our previous work: Entity Suggestion Rate (ESR) and Entity Acceptance Rate (EAR).

Results: Using this method, we evaluated two statistical OL methods for OL in two medical domains. For the pathology domain, we obtained 49% ESR, 28% EAR with the Lin method and 52% ESR, 39% EAR with the Church method. For the radiology domain, we obtain 87% ESA, 9% EAR using Lin method and 96% ESR, 16% EAR using Church method.

Conclusion: This method is sufficiently general and flexible enough to permit comparison of any OL method for a specific corpus and ontology of interest.

 
  • References

  • 1 Liu K, Hogan WR, Crowley RS. Natural Language Processing methods and systems for biomedical ontology learning. Journal of Biomedical Informatics 2011; 44 (01) 163-179.
  • 2 Hearst MA. Automatic acquisition of hyponyms from large text corpora. Proceedings of the 12th Conference on Computational Linguistics; 1992. Nantes, France: 1992: 539-545.
  • 3 Berland M, Charniak E. Finding parts in very large corpora. Proceedings of the 37th Conference on Computational Linguistics; 1999. College Park, MD 1999: 57-64.
  • 4 Harris ZS. Mathematical structures of language. New York, NY, USA: Krieger Pub Co; 1968.
  • 5 Church KW, Hanks P. Word association norms, mutual information, and lexicography. Proceedings of 27th Annual Meeting of the Association for Computational Linguistics; 1989. Vancouver, BC, Canada: 1989: 76-83.
  • 6 Agirre E, Ansa O, Hovy E, Martínez D. Enriching very large ontologies using the WWW. Proceedings of the Ontology Learning Workshop of the European Conference of AI (ECAI); 2000. Berlin, Germany: 2000.
  • 7 Cimiano P, Pivk A, Schmidt-Thieme L, Stabb S. Learning taxonomic relations from heterogeneous sources of evidence. Proceeding of ECAI2004 Workshop on Ontology Learning and Evaluation, A Workshop at the 16th European Conference on Artificial Intelligence; 2004. Valencia, Spain: 2004.
  • 8 Gliozzo A, Gioliano C, Strapparava C. Domain kernels for word sense disambiguation. Proceedings of the 43rd Annual Meeting of the ACL; 2005. 2005: 403-410.
  • 9 Sánchez D, Moreno A. Web-scale taxonomy learning. Proceedings of the Workshop on Learning and Extending Lexical Ontologies by using Machine Learning; 2005. 2005.
  • 10 Hartrumpf S. Extending knowledge and deepening linguistic processing for question answering. In Peters C. (ed) Results of the CLEF 2005 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2005 Workshop. 2005: 361-369.
  • 11 Brewster C, Alani H, Dasmahapatra S, Wilks Y. Data Driven Ontology Evaluation. International Conference on Language Resources and Evaluation (LREC 2004); Lisbon, Portugal. May. 2004: 24-30.
  • 12 Navigli R, Velardi P, Cucchiarelli A, Neri F. Automatic Ontology Learning: Supporting a Per-Concept Evaluation by Domain Experts. Workshop on Ontology Learning and Population (ECAI 2004). Valencia, Spain: 2004.
  • 13 Brank J, Mladeniæ D, Grobelnik M. Gold standard based ontology evaluation using intance assigment. Proceedings of the 4th Workshop on Evaluating Ontologies for the Web (EON2006); 2006. Edinboro, Scotland: 2006.
  • 14 Dellschaft K, Staab S. On how to perform a gold standard based evaluation of ontology learning. In Cruz I. F, Decker S, Allemang D, Preist C, Schwabe D, Mika P. et al editors International Semantic Web Conference, Lecture Notes in Computer Science 2006. Springer: 2006: 228-241.
  • 15 Meadche A, Staab S. Measuring similarity between ontology. Proceedings of the European Conference on Knowledge Acquisition and Management. 2002: 251-263.
  • 16 Maynard D, Peters W, Li Y. Metrics for evaluation of ontology-based information extraction. Proceedings of the EON 2006 Workshop; 2006. Edinburgh, UK: 2006.
  • 17 Liu K, Chapman WW, Savova G, Chute CG, Sioutos N, Crowley RS. Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods Inf Med 2010; 49 (06) 397-407.
  • 18 Martins AL. Using grammar Iinference techniques in ontology learning. (Thesis). 2006.
  • 19 National Cancer Institute Thesaurus (NCIT) 2010 (updated 2010; cited). Available from. http://ncit.nci.nih.gov/.
  • 20 Mejino JLV, Rubin DL, Brinkley JF. FMA-RadLex: an application ontology of radiological anatomy derived from the Foundational Model of Anatomy reference ontology. Proceedings of the Annual Symposium of American Medical Informatics Association. Washington, DC: 2008: 465
  • 21 MetaMap Technology Transfer (MMTX). 2012 (updated 2012; cited). Available from. http://ii.nlm.nih.gov/MMTx.shtml.
  • 22 Dai M, Shah N, Xuan W, Musen M, Watson S, Athey B. et al An Efficient Solution for Mapping Free Text to Ontology Terms. Mgrep. AMIA Summit on Translational Bioinformatics. San Francisco, CA: 2008.
  • 23 Zou Q, Chu W. IndexFinder: A Knowledge-based Method for Indexing Clinical Texts (cited). Available from. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.841.
  • 24 Church KW, Hanks P. editors Word association norms, mutual information, and lexicography. Proceedings of 27th Annual Meeting of the ACL. 1989.
  • 25 Lin D. Automatic retrieval and clustering of similar words. COLING ’98 Proceedings of the 17th international conference on Computational linguistics; 1998. Montreal, Quebec, Canada: 1998: 768-774.
  • 26 Lin D. MINIPAR. 2012 (updated 2012; cited). Available from. http://webdocs.cs.ualberta.ca/~lindek/minipar.htm.
  • 27 Clinical Text Analysis and Knowledge Extraction System (cTakes). 2012 (updated 2012; cited). Available from. http://ohnlp.sourceforge.net/cTAKES/.
  • 28 Bodenreider O, Rindflesch TC, Burgun A. Unsupervised, corpus-based method for extending a biomedical terminology. Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain. 2002: 53-60.
  • 29 Kazamay Ji, Makinoz T, Ohta Y, Tsujiiy Ji. Tuning support vector machines for biomedical named entity recognition. Proceedings of the ACL workshop on Natural language processing in biomedicine. 2003: 1-8.
  • 30 Blaschke C, Valencia A. Automatic ontology construction from the literature. Genome Inform. 2002: 201-213.
  • 31 Ontology Development and Information Extraction (ODIE) 2012 (updated 2012; cited). Available from. https://bmir-gforge.stanford.edu/gf/project/odie.