Subscribe to RSS
DOI: 10.3414/ME16-01-0085
Relating Complexity and Error Rates of Ontology Concepts
More Complex NCIt Concepts Have More Errors Funding Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Publication History
received:
13 July 2016
accepted in revised form:
19 January 2017
Publication Date:
24 January 2018 (online)
Summary
Objectives: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts.
Methods: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test- bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested.
Results: Our study was done on the NCIt’s Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts.
Conclusions: QA is an essential part of any ontology’s maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.
-
References
- 1 Min H, Perl Y, Chen Y, Halper M, Geller J, Wang Y. Auditing as part of the terminology design life cycle. J Am Med Inform Assoc. 2006; 13 (06) 676-690.
- 2 NCI Thesaurus [cited 2016 March 12]. Available from: https://ncit.nci.nih.gov/ncitbrowser/.
- 3 de Coronado S, Haber MW, Sioutos N, Tuttle MS, Wright LW. NCI Thesaurus: using science-based terminology to integrate cancer research results. Stud Health Technol Inform. 2004; 107 (Pt 1): 33-37.
- 4 de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW. et al. The NCI Thesaurus quality assurance life cycle. J Biomed Inform. 2009; 42 (03) 530-539.
- 5 Cui L. COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance. AMIA Annu Symp Proc. 2015; 2015: 456-465.
- 6 Mougin F, Bodenreider O. Approaches to eliminating cycles in the UMLS Metathesaurus: naive vs. formal. AMIA Annu Symp Proc. 2005: 550-554.
- 7 Gu H, Chen Y, He Z, Halper M, Chen L. Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. Methods Inf Med. 2016; 55 (02) 158-165.
- 8 Cimino JJ. Auditing the Unified Medical Language System with semantic methods. J Am Med Inform Assoc. 1998; 5 (01) 41-51.
- 9 Mougin F, Grabar N. Auditing the multiply-related concepts within the UMLS. J Am Med Inform Assoc. 2014; 21 e2 e185-193.
- 10 Xing G, Zhang GQ, Cui L. FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies. BioData Min. 2016; 9: 31.
- 11 Bodenreider O. Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names. Proceedings of the 6th International Conference on Biomedical Ontology (ICBO). 2016 Available from: https://mor.nlm.nih.gov/pubs/pdf/2016-icbo-ob.pdf.
- 12 Dentler K, Cornet R. Intra-axiom redundancies in SNOMED CT. Artif Intell Med. 2015; 65 (01) 29-34.
- 13 Agrawal A, Elhanan G. Contrasting lexical similarity and formal definitions in SNOMED CT: consistency and implications. J Biomed Inform. 2014; 47: 192-198.
- 14 Jiang G, Chute CG. Auditing the semantic completeness of SNOMED CT using formal concept analysis. J Am Med Inform Assoc. 2009; 16 (01) 89-102.
- 15 Mougin F. Identifying redundant and missing relations in the gene ontology. Stud Health Technol Inform. 2015; 210: 195-199.
- 16 Verspoor K, Dvorkin D, Cohen KB, Hunter L. Ontology quality assurance through analysis of term transformations. Bioinformatics. 2009; 25 (12) i77-84.
- 17 Ceusters W. Applying evolutionary terminology auditing to the Gene Ontology. J Biomed Inform. 2009; 42 (03) 518-529.
- 18 Kohler J, Munn K, Ruegg A, Skusa A, Smith B. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics. 2006; 7: 212.
- 19 Rogers JE. Quality assurance of medical ontologies. Methods Inf Med. 2006; 45 (03) 267-274.
- 20 Zhu X, Fan JW, Baorto DM, Weng C, Cimino JJ. A review of auditing methods applied to the content of controlled biomedical terminologies. J Biomed Inform. 2009; 42 (03) 413-425.
- 21 Geller J, Perl Y, Halper M, Cornet R. Special issue on auditing of terminologies. J Biomed Inform. 2009; 42 (03) 407-411.
- 22 Stearns MQ, Price C, Spackman KA, Wang AY. SNOMED clinical terms: overview of the development process and project status. AMIA Annu Symp Proc. 2001: 662-666.
- 23 U.S. Department of Veterans Affairs. National Drug File - Reference Terminology (NDF-RT™) Documentation February 2015 Version [cited 2016 July 11]. Available from: http://evs.nci.nih.gov/ftp1/NDF-RT.
- 24 Wei D, Bodenreider O. Using the abstraction network in complement to description logics for quality assurance in biomedical terminologies - a case study in SNOMED CT. Stud Health Technol Inform. 2010; 160 (Pt 2): 1070-1074.
- 25 Cohen B, Oren M, Min H, Perl Y, Halper M. Automated comparative auditing of NCIT genomic roles using NCBI. J Biomed Inform. 2008; 41 (06) 904-913.
- 26 Mougin F, Bodenreider O. Auditing the NCI thesaurus with semantic web technologies. AMIA Annu Symp Proc. 2008: 500-504.
- 27 de Coronado S, Tuttle MS, Solbrig HR. Using the UMLS Semantic Network to validate NCI Thesaurus structure and analyze its alignment with the OBO relations ontology. AMIA Annu Symp Proc. 2007: 165-170.
- 28 McCray AT, Nelson SJ. The representation of meaning in the UMLS. Methods Inf Med. 1995; 34 1-2 193-201.
- 29 McCray AT. An upper-level ontology for the biomedical domain. Comp Funct Genomics. 2003; 4 (01) 80-84.
- 30 Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI Thesaurus. Methods Inf Med. 2005; 44 (04) 498-507.
- 31 Schulz S, Schober D, Tudose I, Stenzhorn H. The Pitfalls of Thesaurus Ontologization - the Case of the NCI Thesaurus. AMIA Annu Symp Proc. 2010; 2010: 727-731.
- 32 Wang Y, Halper M, Wei D, Perl Y, Geller J. Abstraction of complex concepts with a refined partial- area taxonomy of SNOMED. J Biomed Inform. 2012; 45 (01) 15-29.
- 33 Wang Y, Halper M, Wei D, Gu H, Perl Y, Xu J. et al. Auditing complex concepts of SNOMED using a refined hierarchical abstraction network. J Biomed Inform. 2012; 45 (01) 1-14.
- 34 Ochs C, Geller J, Perl Y, Chen Y, Xu J, Min H. et al. Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies. J Am Med Inform Assoc. 2015; 22 (03) 507-518.
- 35 Ochs C, Geller J, Perl Y, Chen Y, Agrawal A, Case JT. et al. A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships. J Am Med Inform Assoc. 2015; 22 (03) 628-639.
- 36 Halper M, Wang Y, Min H, Chen Y, Hripcsak G, Perl Y. et al. Analysis of error concentrations in SNOMED. AMIA Annu Symp Proc. 2007: 314-318.
- 37 Luo L, Xu R, Zhang GQ. Dissecting the Ambiguity of FMA Concept Names Using Taxonomy and Partonomy Structural Information. AMIA Jt Summits Transl Sci Proc. 2013; 2013: 157-161.
- 38 Luo L, Mejino Jr. JL, Zhang GQ. An analysis of FMA using structural self-bisimilarity. J Biomed Inform. 2013; 46 (03) 497-505.
- 39 Zhang GQ, Bodenreider O. Large-scale, Exhaustive Lattice-based Structural Auditing of SNOMED CT. AMIA Annu Symp Proc. 2010; 2010: 922-926.
- 40 Zhang GQ, Bodenreider O. Using SPARQL to Test for Lattices: application to quality assurance in biomedical ontologies. Semant Web ISWC. 2010; 6497: 273-288.
- 41 Fragoso G, de Coronado S, Haber M, Hartel F, Wright L. Overview and utilization of the NCI thesaurus. Comp Funct Genomics. 2004; 5 (08) 648-654.
- 42 Wang H, Yatawara M, Huang SC, Dudley K, Szekely C, Holden S. et al. The integrated proactive surveillance system for prostate cancer. Open Med Inform J. 2012; 6: 1-8.
- 43 Shah NH, Rubin DL, Supekar KS, Musen MA. Ontology-based annotation and query of tissue microarray data. AMIA Annu Symp Proc. 2006: 709-713.
- 44 Jiang G, Sohn S, Zimmermann MT, Wang C, Liu H, Chute CG. Drug Normalization for Cancer Therapeutic and Druggable Genome Target Discovery. AMIA Jt Summits Transl Sci Proc. 2015; 2015: 72-76.
- 45 Donfack Guefack V, Bertaud Gounot V, Duvauferrier R, Bourde A, Morelli J, Lasbleiz J. Ontology driven decision support systems for medical diagnosis - an interactive form for consultation in patients with plasma cell disease. Stud Health Technol Inform. 2012; 180: 108-112.
- 46 Kahn MG, Bailey LC, Forrest CB, Padula MA, Hirschfeld S. Building a common pediatric research terminology for accelerating child health research. Pediatrics. 2014; 133 (03) 516-525.
- 47 Reed TL, Kaufman-Rivi D. FDA adverse Event Problem Codes: standardizing the classification of device and patient problems associated with medical device use. Biomed Instrum Technol. 2010; 44 (03) 248-256.
- 48 Chen S-B, Hsu C-Y. The TCR cancer registry repository for annotating cancer data. 2nd IEEE International Conference on Emergency Management and Management Sciences. 2011: 297-300.
- 49 Halper M, Gu H, Perl Y, Ochs C. Abstraction networks for terminologies: Supporting management of “big knowledge”. Artif Intell Med. 2015; 64 (01) 1-16.
- 50 Wang Y, Halper M, Min H, Perl Y, Chen Y, Spackman KA. Structural methodologies for auditing SNOMED. J Biomed Inform. 2007; 40 (05) 561-581.
- 51 Goodrich MT, Tamassia R. Divide-and-Conquer. Algorithm Design: Foundations, Analysis, and Internet Examples.. 1st ed. New York: John Wiley & Sons, Inc; 2001: 263-273.
- 52 Ochs C, Geller J, Perl Y, Musen MA. A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. J Biomed Inform. 2016; 62: 90-105.
- 53 Morrey CP, Geller J, Halper M, Perl Y. The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS. J Biomed Inform. 2009; 42 (03) 468-489.
- 54 Good PI. Permutation, Parametric, and Bootstrap Tests of Hypotheses: A Practical Guide to Resampling.. 3rd ed. New York: Springer; 2005
- 55 NCIt Download [cited 2016 March 12]. Available from: https://evs.nci.nih.gov/ftp1/NCI_Thesaur-us/.