Appl Clin Inform 2024; 15(05): 1056-1065
DOI: 10.1055/s-0044-1791487
Research Article

Evolution of a Graph Model for the OMOP Common Data Model

Mengjia Kang
1   Division of Pulmonary and Critical Care Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States
,
Jose A. Alvarado-Guzman
2   Neo4j, Inc., San Mateo, California, United States
,
Luke V. Rasmussen
3   Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
,
Justin B. Starren
3   Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
4   University of Arizona Health Sciences, Tucson, Arizona, United States
› Institutsangaben
Funding This work was supported by grant 5U19AI135964 from the National Institute of Allergy and Infectious Disease of the National Institutes of Health.
Zoom Image

Abstract

Objective Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions.

Methods We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts.

Results Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges.

Discussion To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed.

Conclusion We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.

Protection of Human and Animal Subjects

This study was conducted in accordance with the ethical standards of the institutional review board (IRB). All procedures involving human participants were reviewed and approved by the IRB of Northwestern University (STU00204868 for SCRIPT study and STU00212016 for CRITICAL study).




Publikationsverlauf

Eingereicht: 23. August 2022

Angenommen: 27. August 2024

Artikel online veröffentlicht:
04. Dezember 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany