Formative Evaluation of Ontology Learning Methods for Entity Discovery by Using Existing Ontologies as Reference Standards

K. Liu; K. J. Mitchell; W. W. Chapman; G. K. Savova; N. Sioutos; D. L. Rubin; R. S. Crowley

doi:10.3414/ME12-01-0029

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook X Linkedin Weibo

PDF herunterladen

Methods Inf Med 2013; 52(04): 308-316
DOI: 10.3414/ME12-01-0029

Original Articles

Schattauer GmbH

Formative Evaluation of Ontology Learning Methods for Entity Discovery by Using Existing Ontologies as Reference Standards

K. Liu

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

,

K. J. Mitchell

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

,

W. W. Chapman

²Division of Biomedical Informatics, University of california San Diego, San Diego, CA, USA

,

G. K. Savova

³Childrens´ Hospital Boston and Harvard Medical School, Boston, MA, USA

,

N. Sioutos

⁴Lockheed Martin Corporation, Fairfax, VA, USA

,

D. L. Rubin

⁵Department of Radiology, Stanford University, Stanford, CA, USA

,

R. S. Crowley

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

⁶Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

› Institutsangaben

Weitere Informationen

Publikationsverlauf

received: 04. April 2012

accepted: 02. Februar 2013

Publikationsdatum:
20. Januar 2018 (online)

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Summary

Objective: Developing a two-step method for formative evaluation of statistical Ontology Learning (OL) algorithms that leverages existing biomedical ontologies as reference standards.

Methods: In the first step optimum parameters are established. A ‘gap list’ of entities is generated by finding the set of entities present in a later version of the ontology that are not present in an earlier version of the ontology. A named entity recognition system is used to identify entities in a corpus of biomedical documents that are present in the ‘gap list’, generating a reference standard. The output of the algorithm (new entity candidates), produced by statistical methods, is subsequently compared against this reference standard. An OL method that performs perfectly will be able to learn all of the terms in this reference stand ard. Using evaluation metrics and precision-recall curves for different thresholds and parameters, we compute the optimum parameters for each method. In the second step, human judges with exper tise in ontology development evaluate each candidate suggested by the algorithm con figured with the optimum parameters previously established. These judgments are used to compute two performance metrics developed from our previous work: Entity Suggestion Rate (ESR) and Entity Acceptance Rate (EAR).

Results: Using this method, we evaluated two statistical OL methods for OL in two medical domains. For the pathology domain, we obtained 49% ESR, 28% EAR with the Lin method and 52% ESR, 39% EAR with the Church method. For the radiology domain, we obtain 87% ESA, 9% EAR using Lin method and 96% ESR, 16% EAR using Church method.

Conclusion: This method is sufficiently general and flexible enough to permit comparison of any OL method for a specific corpus and ontology of interest.

Keywords

Ontology learning from text - statistical ontology learning method - statistical ontology - learning algorithm - ontology enrichment - natural language processing - ontology evaluation

References
1 Liu K, Hogan WR, Crowley RS. Natural Language Processing methods and systems for biomedical ontology learning. Journal of Biomedical Informatics 2011; 44 (01) 163-179.

Crossref PubMed Suche in Google Scholar
2 Hearst MA. Automatic acquisition of hyponyms from large text corpora. Proceedings of the 12th Conference on Computational Linguistics; 1992. Nantes, France: 1992: 539-545.

Suche in Google Scholar
3 Berland M, Charniak E. Finding parts in very large corpora. Proceedings of the 37th Conference on Computational Linguistics; 1999. College Park, MD 1999: 57-64.

Suche in Google Scholar
4 Harris ZS. Mathematical structures of language. New York, NY, USA: Krieger Pub Co; 1968.

Suche in Google Scholar
5 Church KW, Hanks P. Word association norms, mutual information, and lexicography. Proceedings of 27th Annual Meeting of the Association for Computational Linguistics; 1989. Vancouver, BC, Canada: 1989: 76-83.

Suche in Google Scholar
6 Agirre E, Ansa O, Hovy E, Martínez D. Enriching very large ontologies using the WWW. Proceedings of the Ontology Learning Workshop of the European Conference of AI (ECAI); 2000. Berlin, Germany: 2000.

Suche in Google Scholar
7 Cimiano P, Pivk A, Schmidt-Thieme L, Stabb S. Learning taxonomic relations from heterogeneous sources of evidence. Proceeding of ECAI2004 Workshop on Ontology Learning and Evaluation, A Workshop at the 16th European Conference on Artificial Intelligence; 2004. Valencia, Spain: 2004.

Suche in Google Scholar
8 Gliozzo A, Gioliano C, Strapparava C. Domain kernels for word sense disambiguation. Proceedings of the 43rd Annual Meeting of the ACL; 2005. 2005: 403-410.

Suche in Google Scholar
9 Sánchez D, Moreno A. Web-scale taxonomy learning. Proceedings of the Workshop on Learning and Extending Lexical Ontologies by using Machine Learning; 2005. 2005.

Suche in Google Scholar
10 Hartrumpf S. Extending knowledge and deepening linguistic processing for question answering. In Peters C. (ed) Results of the CLEF 2005 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2005 Workshop. 2005: 361-369.

Suche in Google Scholar
11 Brewster C, Alani H, Dasmahapatra S, Wilks Y. Data Driven Ontology Evaluation. International Conference on Language Resources and Evaluation (LREC 2004); Lisbon, Portugal. May. 2004: 24-30.

Suche in Google Scholar
12 Navigli R, Velardi P, Cucchiarelli A, Neri F. Automatic Ontology Learning: Supporting a Per-Concept Evaluation by Domain Experts. Workshop on Ontology Learning and Population (ECAI 2004). Valencia, Spain: 2004.

Suche in Google Scholar
13 Brank J, Mladeniæ D, Grobelnik M. Gold standard based ontology evaluation using intance assigment. Proceedings of the 4th Workshop on Evaluating Ontologies for the Web (EON2006); 2006. Edinboro, Scotland: 2006.

Suche in Google Scholar
14 Dellschaft K, Staab S. On how to perform a gold standard based evaluation of ontology learning. In Cruz I. F, Decker S, Allemang D, Preist C, Schwabe D, Mika P. et al editors International Semantic Web Conference, Lecture Notes in Computer Science 2006. Springer: 2006: 228-241.

Suche in Google Scholar
15 Meadche A, Staab S. Measuring similarity between ontology. Proceedings of the European Conference on Knowledge Acquisition and Management. 2002: 251-263.

Suche in Google Scholar
16 Maynard D, Peters W, Li Y. Metrics for evaluation of ontology-based information extraction. Proceedings of the EON 2006 Workshop; 2006. Edinburgh, UK: 2006.

Suche in Google Scholar
17 Liu K, Chapman WW, Savova G, Chute CG, Sioutos N, Crowley RS. Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods Inf Med 2010; 49 (06) 397-407.

PubMed Suche in Google Scholar
18 Martins AL. Using grammar Iinference techniques in ontology learning. (Thesis). 2006.

Suche in Google Scholar
19 National Cancer Institute Thesaurus (NCIT) 2010 (updated 2010; cited). Available from. http://ncit.nci.nih.gov/.

PubMed
20 Mejino JLV, Rubin DL, Brinkley JF. FMA-RadLex: an application ontology of radiological anatomy derived from the Foundational Model of Anatomy reference ontology. Proceedings of the Annual Symposium of American Medical Informatics Association. Washington, DC: 2008: 465

Suche in Google Scholar
21 MetaMap Technology Transfer (MMTX). 2012 (updated 2012; cited). Available from. http://ii.nlm.nih.gov/MMTx.shtml.

PubMed
22 Dai M, Shah N, Xuan W, Musen M, Watson S, Athey B. et al An Efficient Solution for Mapping Free Text to Ontology Terms. Mgrep. AMIA Summit on Translational Bioinformatics. San Francisco, CA: 2008.

Suche in Google Scholar
23 Zou Q, Chu W. IndexFinder: A Knowledge-based Method for Indexing Clinical Texts (cited). Available from. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.841.

PubMed
24 Church KW, Hanks P. editors Word association norms, mutual information, and lexicography. Proceedings of 27th Annual Meeting of the ACL. 1989.

Suche in Google Scholar
25 Lin D. Automatic retrieval and clustering of similar words. COLING ’98 Proceedings of the 17th international conference on Computational linguistics; 1998. Montreal, Quebec, Canada: 1998: 768-774.

Suche in Google Scholar
26 Lin D. MINIPAR. 2012 (updated 2012; cited). Available from. http://webdocs.cs.ualberta.ca/~lindek/minipar.htm.

PubMed
27 Clinical Text Analysis and Knowledge Extraction System (cTakes). 2012 (updated 2012; cited). Available from. http://ohnlp.sourceforge.net/cTAKES/.

PubMed
28 Bodenreider O, Rindflesch TC, Burgun A. Unsupervised, corpus-based method for extending a biomedical terminology. Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain. 2002: 53-60.

Suche in Google Scholar
29 Kazamay Ji, Makinoz T, Ohta Y, Tsujiiy Ji. Tuning support vector machines for biomedical named entity recognition. Proceedings of the ACL workshop on Natural language processing in biomedicine. 2003: 1-8.

Suche in Google Scholar
30 Blaschke C, Valencia A. Automatic ontology construction from the literature. Genome Inform. 2002: 201-213.

Suche in Google Scholar
31 Ontology Development and Information Extraction (ODIE) 2012 (updated 2012; cited). Available from. https://bmir-gforge.stanford.edu/gf/project/odie.

PubMed

RSS-Feed abonnieren

Teilen / Bookmarken

Formative Evaluation of Ontology Learning Methods for Entity Discovery by Using Existing Ontologies as Reference Standards

Publikationsverlauf

Summary

Keywords

References