CC BY-NC-ND 4.0 · Methods Inf Med 2024; 63(01/02): 052-061
DOI: 10.1055/s-0044-1786839
Original Article

Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations

Sarah Riepenhausen
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Max Blumenstock
2   Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
,
Christian Niklas
2   Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
,
Stefan Hegselmann
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Philipp Neuhaus
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Alexandra Meidt
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Cornelia Püttmann
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Michael Storck
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Matthias Ganzinger
2   Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
,
Julian Varghese
1   Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
,
Martin Dugas
2   Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
3   European Research Center for Information Systems (ERCIS), Münster, Nordrhein-Westfalen, Germany
› Author Affiliations
Funding This work was supported by German Research Foundation (Deutsche Forschungsgemeinschaft, DFG grants DU 352/11-1, DU 352/11-2, DU 352/14-4).

Abstract

Background Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.

Objective To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).

Methods The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.

Results The most frequent keyword is “clinical trial” (n = 18,777), and the most frequent disease-specific keyword is “breast neoplasms” (n = 1,943). Most data items are available in English (n = 545,749) and German (n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.

Conclusion To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.

Authors' Contribution

S.R.: manuscript writing, statistics, revision, (supervision of) data model creation and annotation, research of available data models. M.B.: software development, revision, export of metadata and code. C.N.: revision, supervision of data model creation and annotation, research of available data models. S.H.: software architecture and development. P.N.: software development and supervision thereof, revision, export of metadata and code. A.M.: project management, dissemination concept, writing, and revision. C.P.: data model creation and annotation, research of available data models. M.S.: software development and supervision thereof. M.G.: software development and supervision thereof. J.V.: software development, writing and revision, (supervision of) data model creation and annotation, research of available data models. M.D.: Principal Investigator of MDM portal, conceptualization, selection of data models, supervision of software development, manuscript writing.




Publication History

Received: 21 October 2021

Accepted: 29 March 2024

Article published online:
13 May 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Stuttgart · New York

 
  • References

  • 1 Dugas M, Neuhaus P, Meidt A. et al. Portal of medical data models: information infrastructure for medical research and healthcare. Database (Oxford) 2016; 2016: bav121
  • 2 Wilkinson MD, Dumontier M, Aalbersberg IJ. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3: 160018
  • 3 Dugas M, Jöckel KH, Friede T. et al. Memorandum “Open Metadata”. Open access to documentation forms and item catalogs in healthcare. Methods Inf Med 2015; 54 (04) 376-378
  • 4 Völzke H, Alte D, Schmidt CO. et al. Cohort profile: the study of health in Pomerania. Int J Epidemiol 2011; 40 (02) 294-307
  • 5 Kentgen M, Varghese J, Samol A, Waltenberger J, Dugas M. Common data elements for acute coronary syndrome: analysis based on the unified medical language system. JMIR Med Inform 2019; 7 (03) e14107
  • 6 Holz C, Kessler T, Dugas M, Varghese J. Core data elements in acute myeloid leukemia: a unified medical language system-based semantic analysis and experts' review. JMIR Med Inform 2019; 7 (03) e13554
  • 7 von Martial S, Brix TJ, Klotz L. et al. EMR-integrated minimal core dataset for routine health care and multiple research settings: a case study for neuroinflammatory demyelinating diseases. PLoS One 2019; 14 (10) e0223886
  • 8 Vengadeswaran A, Neuhaus P, Hegselmann S, Storf H, Kadioglu D. Semantically Annotated Metadata: Interconnecting Samply.MDR and MDM-Portal. Stud Health Technol Inform 2019; 267: 86-92
  • 9 Soto-Rey I, Neuhaus P, Bruland P. et al. Standardising the development of ODM converters: the ODMToolBox. Stud Health Technol Inform 2018; 247: 231-235
  • 10 Hegselmann S, Storck M, Geßner S, Neuhaus P, Varghese J, Dugas M. A web service to suggest semantic codes based on the MDM-Portal. Stud Health Technol Inform 2018; 253: 35-39
  • 11 Dugas M. ODM2CDA and CDA2ODM: tools to convert documentation forms between EDC and EHR systems. BMC Med Inform Decis Mak 2015; 15: 40
  • 12 Dugas M, Meidt A, Neuhaus P, Storck M, Varghese J. ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository. BMC Med Res Methodol 2016; 16: 65
  • 13 Hegselmann S, Gessner S, Neuhaus P, Henke J, Schmidt CO, Dugas M. Automatic conversion of metadata from the study of health in Pomerania to ODM. Stud Health Technol Inform 2017; 236: 88-96
  • 14 Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clin Epidemiol 2018; 10: 961-970
  • 15 Hegselmann S, Storck M, Gessner S. et al. Pragmatic MDR: a metadata repository with bottom-up standardization of medical metadata through reuse. BMC Med Inform Decis Mak 2021; 21 (01) 160
  • 16 Amos L, Anderson D, Brody S, Ripple A, Humphreys BL. UMLS users and uses: a current overview. J Am Med Inform Assoc 2020; 27 (10) 1606-1611
  • 17 Varghese J, Sandmann S, Dugas M. Web-based information infrastructure increases the interrater reliability of medical coders: quasi-experimental study. J Med Internet Res 2018; 20 (10) e274
  • 18 Varghese J, Dugas M. Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf Med 2015; 54 (01) 83-92
  • 19 National Center for Biotechnology Information & U.S. National Library of Medicine. Home - MeSH - NCBI. MeSH – NCBI. Accessed March 18, 2024 at: https://www.ncbi.nlm.nih.gov/mesh/
  • 20 Dugas M. Medical data models. Mendeley Data 2020;
  • 21 Storck M, Krumm R, Dugas M. ODMSummary: a tool for automatic structured comparison of multiple medical forms based on semantic annotation with the unified medical language system. PLoS One 2016; 11 (10) e0164569
  • 22 Reichenpfader D, Glauser R, Dugas M, Denecke K. Assessing and improving the usability of the medical data models portal. Stud Health Technol Inform 2020; 271: 199-206
  • 23 Deleger L, Li Q, Lingren T. et al. Building gold standard corpora for medical natural language processing tasks. AMIA Annu Symp Proc 2012; 2012: 144-153
  • 24 World Health Organisation. ICD-10 Version: 2019. International Statistical Classification of Diseases and Related Health Problems 10th Revision. Accessed March 26, 2024 at: https://icd.who.int/browse10/2019/en
  • 25 Vanderbilt University. REDCap Shared Library. REDCap. Accessed March 16, 2024 at: https://redcap.vanderbilt.edu/consortium/library/search.php
  • 26 National Institute of Neurological Disorders and Stroke. NINDS Common Data Elements. Accessed March 16, 2024 at: https://www.commondataelements.ninds.nih.gov/
  • 27 Clinical Information Modeling Initiative | HL7 International. Health Level Seven International. Accessed March 16, 2024 at: https://www.hl7.org/Special/Committees/cimi/
  • 28 openEHR Foundation. Clinical Knowledge Manager. OpenEHR - Open industry specifications, models and software for e-health. Accessed March 16, 2024 at: https://www.openehr.org/ckm/