Subscribe to RSS
DOI: 10.1055/s-0038-1634121
The Common Data Elements for Cancer Research: Remarks on Functions and Structure
Publication History
Received 12 December 2004
accepted 20 February 2006
Publication Date:
08 February 2018 (online)
![](https://www.thieme-connect.de/media/10.1055-s-00035037/200606/lookinside/thumbnails/10-1055-s-0038-1634121-1.jpg)
Summary
Objectives: The National Cancer Institute (NCI) has developed the Common Data Elements (CDE) to serve as a controlled vocabulary of data descriptors for cancer research, to facilitate data interchange and inter-oper-ability between cancer research centers. We evaluated CDE’s structure to see whether it could represent the elements necessary to support its intended purpose, and whether it could prevent errors and inconsistencies from being accidentally introduced. We also performed automated checks for certain types of content errors that provided a rough measure of curation quality.
Methods: Evaluation was performed on CDE content downloaded via the NCI’s CDE Browser, and transformed into relational database form. Evaluation was performed under three categories: 1) compatibility with the ISO/IEC 11179 metadata model, on which CDE structure is based, 2) features necessary for controlled vocabulary support, and 3) support for a stated NCI goal, set up of data collection forms for cancer research.
Results: Various limitations were identified both with respect to content (inconsistency, insufficient definition of elements, redundancy) as well as structure – particularly the need for term and relationship support, as well as the need for metadata supporting the explicit representation of electronic forms that utilize sets of common data elements.
Conclusions: While there are numerous positive aspects to the CDE effort, there is considerable opportunity for improvement. Our recommendations include review of existing content by diverse experts in the cancer community; integration with the NCI thesaurus to take advantage of the latter’s links to nationally used controlled vocabularies, and various schema enhancements required for electronic form support.
-
References
- 1 National Cancer Institute. Cancer Bioinformatics Grid. 2004 http://cabig.nci.nih.gov Last accessed: 11/25/04.
- 2 Marco D. Building and Managing the Metadata Repository. New York: Wiley; 2000
- 3 National Library of Medicine. Medical Subject Headings - Home Page. 2004 www.nlm.nih.gov/mesh/meshhome.html Last accessed: 11/25/04.
- 4 Regenstrief Institute. LOINC home page. 2002 www.regenstrief.org/loinc/ Last accessed: 7/8/02
- 5 College of American Pathologists. SNOMED Clinical Terms (SNOMED CT). 2002 www.snomed.org Last accessed: 10/2/02.
- 6 Gene Ontology Consortium. An Introduction to Gene Ontology. 2004 www.geneontology.org/GO.doc.html Last accessed: 11/26/04.
- 7 Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med 1993; 32: 281-91.
- 8 National Cancer Institute. Terminology Resources: NCI Thesaurus and Enterprise Vocabulary Services (EVS). 2004 www.nci.nih.gov/cancertopics/terminologyresources Last accessed: 11/25/04.
- 9 De Coronado S, Haber MW, Sioutos N, Tuttle MS, Wright LW. NCI Thesaurus: Using Science-based Terminology to Integrate Cancer Research Results. Medinfo 2004; 2004: 33-7.
- 10 National Cancer Institute. Cancer Data Standards Repository (caDSR). 2004 http://ncicb.nci.nih. gov/core/caDSR Last accessed: 11/25/04.
- 11 Mayo Clinic Biomedical Informatics Group. Semantic Structures for Patient Data Retrieval. 2004 http://mayoresearch.mayo.edu/mayo/research/bmi/grant_sspdr_suppl_2_full.cfm Last accessed: 12/4/04.
- 12 Meadows B, Abrams J, Christian M, Silva J, Pifer C, Valmonte C. et al The Common Data Elements Dictionary - A Standard Nomenclature for the Reporting of Phase 3 Cancer Clinical Trials Data. In: 14th IEEE Symposium on Computer-Based Medical Systems; 2001. Bethesda, MD: IEEE Press, Los Alamitos, CA; 2001
- 13 International Standards Organization. ISO/IEC 111179, Information Technology - Metadata Registries. 2004 http://metadata-stds.org/11179/ Last accessed: 11/05/04.
- 14 Booch G, Rumbaugh J, Jacobson I. The Unified Modeling Language User Guide. Reading, MA: Addison-Wesley; 1998
- 15 National Cancer Institute. CDE Browser. 2004 http://cdebrowser.nci.nih.gov/CDEBrowser/ Last accessed: 11/25/04.
- 16 Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998; 37 4-5. 394-403.
- 17 Cimino JJ. Formal descriptions and adaptive mechanisms for changes in controlled medical vocabularies. Methods Inf Med 1996; 35 (03) 202-10.
- 18 Bodenreider O. Circular Hierarchical Relationships in the UMLS: Etiology, Diagnosis, Treatment, Complications and Prevention. In: Proceedings of the AMIA Fall Symposium;. 2001. Washington DC: Hanley & Belfus; 2001. pp 57-61.
- 19 Hersh WR, Campbell EH, Evans DA, Brownlow ND. Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. In: Proceedings/AMIA Annual Fall Symposium 1996; pp 159-63.
- 20 White TM, Hauan MJ. Extending the LOINC Conceptual Schema to Support Standardized Assessment Instruments. J Am Med Inform Assoc 2002; 9: 586-99.
- 21 de Graeff A, de Leeuw JR, Ros WJ, Hordijk GJ, Blijham GH, Winnubst JA. Sociodemographic factors and quality of life as prognostic indicators in head and neck cancer. Eur J Cancer 2001; 37 (03) 332-9.
- 22 Katz MR, Kopek N, Waldron J, Devins GM, Tomlinson G. Screening for depression in head and neck cancer. Psychooncology 2004; 13 (04) 269-80.
- 23 van Ginneken AM. Considerations for the representation of meta-data for the support of structured data entry. Methods Inf Med 2003; 42 (03) 226-35.
- 24 Nadkarni PM, Brandt CA, Marenco L. WebEAV: Automatic Metadata-driven Generation of Web Interfaces to Entity-Attribute-Value Databases. Journal of the American Medical Informatics Association 2000; 7 (07) 343-56.
- 25 Nadkarni PM, Brandt C, Frawley S, Sayward F, Einbinder R, Zelterman D. et al Managing attribute- value clinical trials data using the ACT/DB client-server database system. Journal of the American Medical Informatics Association 1998; 5 (02) 139-51.
- 26 Brandt C, Nadkarni P, Marenco L, Karras B, Lu C, Schacter L. et al Reengineering a database for clinical trials management: lessons for system architects. Controlled Clinical Trials 2000; 21 (05) 440-61.
- 27 Solbrig H. Metadata and the Reintegration of Clinical Information: ISO 1179. MD Computing 2000 May-June; 25-8
- 28 Curtis T. Common Data Elements (CDEs) Harmonization. 2004 http://cabig.nci.nih.gov/workspaces/VCDE/Documents/Useful_Presentations/CDEs/caBIGIntegratedCancerResearchHarmonization%20082404FINAL.pdf. Last accessed: 12/4/04.
- 29 Hammond W. Introduction to HL7. 2003 www.hl7.cz/doc/EasternEuropeTutorial.ppt Last accessed: 4/14/04.
- 30 Solbrig HR, Chute CG. Terminology Access Methods Leveraging LDAP Resources. Medinfo 2004; 11: 545-9.
- 31 Savova GK, Becker D, Harris M, Chute CG. Combining Rule-based Methods and Latent Semantic Analysis for Ontology Structure Construction. Medinfo 2004 2004. (CD) 1848
- 32 Solbrig HR, Armbrust DC, Chute CG. The Open Terminology Services (OTS) project. AMIA Annu Symp Proc 2003 p 1011
- 33 Schulz S, Romacker M, Hahn U. Part-whole reasoning in medical ontologies revisited - introducing SEP triplets into classification-based description logics. Proc AMIA Symp 1998; pp 830-4.
- 34 Hahn U, Schulz S. Towards a broad-coverage biomedical ontology based on description logics. Pac Symp Biocomput 2003; pp 577-88.
- 35 Rector AL, Rogers JE, Zanstra PE, Van Der Haring E. OpenGALEN: open source medical terminology and tools. AMIA Annu Symp Proc 2003; p 982
- 36 Rector AL, Bechhofer S, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL concept modelling language for medical terminology. Artif Intell Med 1997; 9 (02) 139-71.