Subscribe to RSS
DOI: 10.1055/s-0042-1757763
TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
Funding Ministerio de Economía y Competitividad Instituto de Salud Carlos III PI18/00981 PI18/01047 PI18CIII/00019Abstract
Background During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.
Objectives This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.
Methods The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.
Results First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.
Conclusions This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
Keywords
electronic health record - FAIR Principles - data reusability - real-world data - standardsPublication History
Received: 25 March 2022
Accepted: 05 July 2022
Article published online:
11 October 2022
© 2022. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Häyrinen K, Saranto K, Nykänen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform 2008; 77 (05) 291-304
- 2 Safran C, Bloomrosen M, Hammond WE. et al; Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007; 14 (01) 1-9
- 3 Richesson RL, Krischer J. Data standards in clinical research: gaps, overlaps, challenges and future directions. J Am Med Inform Assoc 2007; 14 (06) 687-696
- 4 Parra-Calderón CL, Sanz F, McIntosh LD. The challenge of the effective implementation of FAIR principles in biomedical research. Methods Inf Med 2020; 59 (4-05): 117-118
- 5 Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J. et al. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 2021; 115: 103697
- 6 Michaels M, Syed S, Lober WB. Blueprint for aligned data exchange for research and public health. J Am Med Inform Assoc 2021; 28 (12) 2702-2706
- 7 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (01) 144-151
- 8 Makady A, de Boer A, Hillege H, Klungel O, Goettsch W. (on behalf of GetReal Work Package 1). What is real-world data? A review of definitions based on literature and stakeholder interviews. Value Health 2017; 20 (07) 858-865
- 9 Brat GA, Weber GM, Gehlenborg N. et al. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020; 3: 109
- 10 EHDEN Consortium. Accessed March 14, 2022, at: https://www.ehden.eu/
- 11 Pedrera-Jimenez M, Garcia-Barrio N, Hernandez-Ibarburu G. et al. Building an i2b2-based population repository for COVID-19 research. Stud Health Technol Inform 2022; 294: 287-291
- 12 ISARIC Clinical Characterisation Group. The value of open-source clinical science in pandemic response: lessons from ISARIC. Lancet Infect Dis 2021; 21 (12) 1623-1624 [published correction appears in Lancet Infect Dis 2021 Dec;21(12):e363]
- 13 Mehra MR, Desai SS, Ruschitzka F, Patel AN. RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet 2020; S0140-6736 (20) 31180-6
- 14 Mehra MR, Desai SS, Kuy S, Henry TD, Patel AN. Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med 2020; 382 (25) e102 [retracted in: N Engl J Med 2020 Jun 4]
- 15 Kohane IS, Aronow BJ, Avillach P. et al; Consortium For Clinical Characterization Of COVID-19 By EHR (4CE). What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021; 23 (03) e22219
- 16 Pedrera M, Garcia N, Rubio P, Cruz JL, Bernal JL, Serrano P. Making EHRs reusable: a common framework of data operations. Stud Health Technol Inform 2021; 287: 129-133
- 17 FAIR Principles. Accessed March 14, 2022, at: https://www.go-fair.org/fair-principles/
- 18 Force11. Accessed March 14, 2022, at: https://force11.org/
- 19 European Commission. Cost of Not Having FAIR Research Data - Cost-Benefit Analysis for FAIR Research Data. Brussels: European Commission; 2018
- 20 Pedrera M, Garcia N, Blanco A. et al. Use of EHRs in a tertiary hospital during COVID-19 pandemic: a multi-purpose approach based on standards. Stud Health Technol Inform 2021; 281: 28-32
- 21 Blobel B. Advanced and secure architectural EHR approaches. Int J Med Inform 2006; 75 (3–4): 185-190
- 22 Beale T. Archetypes: Constraint-based domain models for future-proof information systems. Eleventh OOPSLA Workshop on Behavioral Semantics: Serving the Customer (Seattle, Washington, USA, November 4, 2002). Edited by Kenneth Baclawski and Haim Kilov. Northeastern University, Boston, 2002, pp. 16-32
- 23 ISO 13606 Standard, Part 1: Reference model. Accessed March 14, 2022, at: https://www.iso.org/standard/67868.html
- 24 ISO 13606 Standard, Part 2: Archetype model. Accessed March 14, 2022, at: https://www.iso.org/standard/62305.html
- 25 Muñoz A, Somolinos R, Pascual M. et al. Proof-of-concept design and development of an EN13606-based electronic health care record service. J Am Med Inform Assoc 2007; 14 (01) 118-129
- 26 Goossen W. Representing knowledge, data and concepts for EHRS using DCM. Stud Health Technol Inform 2011; 169: 774-778
- 27 Maldonado JA, Moner D, Boscá D, Fernández-Breis JT, Angulo C, Robles M. LinkEHR-Ed: a multi-reference model archetype editor based on formal semantics. Int J Med Inform 2009; 78 (08) 559-570
- 28 Lozano-Rubí R, Muñoz Carrero A, Serrano Balazote P, Pastor X. OntoCR: a CEN/ISO-13606 clinical repository based on ontologies. J Biomed Inform 2016; 60: 224-233
- 29 Health Ministry of Spain, Clinical Modeling Resources. Accessed March 14, 2022, at: https://www.sanidad.gob.es/profesionales/hcdsns/areaRecursosSem/Rec_mod_clinico_arquetipos.htm
- 30 Pedrera M, Serrano P, Terriza A. et al. Defining a standardized information model for multi-source representation of breast cancer data. Stud Health Technol Inform 2020; 270: 1243-1244
- 31 Coyle JF, Mori AR, Huff SM. Standards for detailed clinical models as the basis for medical data exchange and decision support. Int J Med Inform 2003; 69 (2–3): 157-174
- 32 Donnelly K. SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inform 2006; 121: 279-290
- 33 McDonald CJ, Huff SM, Suico JG. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003; 49 (04) 624-633
- 34 Electronic Health Records Archetypes of Hospital Universitario 12 de Octubre. Accessed March 14, 2022, at: https://www.safecreative.org/work/2102196969593-h12o-covid-19-observations-archetypes
- 35 Murphy SN, Weber G, Mendis M. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130
- 36 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578
- 37 Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42 (02) 377-381
- 38 CDISC. -. Foundational. Accessed March 14, 2022, at: https://www.cdisc.org/standards/foundational
- 39 ISARIC-WHO CRF for COVID-19. Accessed March 14, 2022, at: https://isaric.org/research/covid-19-clinical-research-resources/covid-19-crf/
- 40 Structured Query Language (SQL). Accessed March 14, 2021, at: https://www.w3schools.com/sql/
- 41 Project R. Accessed March 14, 2022, at: https://www.r-project.org/
- 42 Archetype Query Language (AQL). Accessed March 14, 2022, at: https://specifications.openehr.org/releases/QUERY/latest/AQL.html
- 43 Ramos M, Sánchez-de-Madariaga R, Barros J. et al. An archetype query language interpreter into MongoDB: managing NoSQL standardized electronic health record extracts systems. J Biomed Inform 2020; 101: 103339
- 44 eXtensible Markup Language (XML). Accessed March 14, 2022, at: https://www.w3schools.com/xml/
- 45 H12O Data Science software repository. Accessed March 14, 2022, at: https://github.com/DataDoce/EHR-Data-Operations
- 46 Pedrera-Jimenez M, Garcia-Barrio N, Rubio-Mayo P. et al. Making EHRs trustable: a quality analysis of EHR-derived datasets for COVID-19 research. Stud Health Technol Inform 2022; 294: 164-168
- 47 Lim Choi Keung SN, Zhao L, Rossiter J. et al. Detailed clinical modelling approach to data extraction from heterogeneous data sources for clinical research. AMIA Jt Summits Transl Sci Proc 2014; 2014: 55-59
- 48 Maldonado JA, Marcos M, Fernández-Breis JT, Giménez-Solano VM, Legaz-García MDC, Martínez-Salvador B. CLIN-IK-LINKS: a platform for the design and execution of clinical data transformation and reasoning workflows. Comput Methods Programs Biomed 2020; 197: 105616
- 49 Sun H, Depraetere K, De Roo J. et al. Semantic processing of EHR data for clinical research. J Biomed Inform 2015; 58: 247-259
- 50 Pacaci A, Gonul S, Sinaci AA, Yuksel M, Laleci Erturkmen GB. A semantic transformation methodology for the secondary use of observational healthcare data in postmarketing safety studies. Front Pharmacol 2018; 9: 435
- 51 Ong TC, Kahn MG, Kwan BM. et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 2017; 17 (01) 134
- 52 ICHOM.. Standard sets. Accessed March 14, 2022, at: https://www.ichom.org/healthcare-standardization/
- 53 European Medicines Agency. European Medicines Regulatory Network data standardisation strategy. Accessed March 14, 2022, at: https://www.ema.europa.eu/en/documents/other/european-medicines-regulatory-network-data-standardisation-strategy_en.pdf
- 54 CDISC. Global regulatory requirements. Accessed March 14, 2022, at: https://www.cdisc.org/resources/global-regulatory-requirements