Methods of Information in Medicine

nicht eingeloggt Login
- Benutzername oder E-Mail-Adresse:
  
  Passwort:
  
  Zugangsdaten vergessen? Neu registrieren OpenAthens/Shibboleth Login

Jahre (Archiv)

2022

Ausgaben

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

CC BY-NC-ND 4.0 · Methods Inf Med 2022; 61(S 02): e89-e102
DOI: 10.1055/s-0042-1757763

Original Article

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

Miguel Pedrera-Jiménez

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

²ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain

,

Noelia García-Barrio

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

,

Paula Rubio-Mayo

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

,

Alberto Tato-Gómez

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

,

Juan Luis Cruz-Bermúdez

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

,

José Luis Bernal-Sobrino

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

,

Adolfo Muñoz-Carrero

³Digital Health Research Unit, Instituto de Salud Carlos III, Madrid, Spain

,

Pablo Serrano-Balazote

¹Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain

› Institutsangaben Funding Ministerio de Economía y Competitividad Instituto de Salud Carlos III PI18/00981 PI18/01047 PI18CIII/00019

› Weitere Informationen

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Abstract

Background During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.

Objectives This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.

Methods The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.

Results First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.

Conclusions This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.

Keywords

electronic health record - FAIR Principles - data reusability - real-world data - standards

Publikationsverlauf

Eingereicht: 25. März 2022

Angenommen: 05. Juli 2022

Artikel online veröffentlicht:
11. Oktober 2022

© 2022. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Häyrinen K, Saranto K, Nykänen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform 2008; 77 (05) 291-304

Crossref PubMed Suche in Google Scholar
2 Safran C, Bloomrosen M, Hammond WE. et al; Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007; 14 (01) 1-9

Crossref PubMed Suche in Google Scholar
3 Richesson RL, Krischer J. Data standards in clinical research: gaps, overlaps, challenges and future directions. J Am Med Inform Assoc 2007; 14 (06) 687-696

Crossref PubMed Suche in Google Scholar
4 Parra-Calderón CL, Sanz F, McIntosh LD. The challenge of the effective implementation of FAIR principles in biomedical research. Methods Inf Med 2020; 59 (4-05): 117-118

Thieme Connect PubMed Suche in Google Scholar
5 Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J. et al. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 2021; 115: 103697

Crossref PubMed Suche in Google Scholar
6 Michaels M, Syed S, Lober WB. Blueprint for aligned data exchange for research and public health. J Am Med Inform Assoc 2021; 28 (12) 2702-2706

Crossref PubMed Suche in Google Scholar
7 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (01) 144-151

Crossref PubMed Suche in Google Scholar
8 Makady A, de Boer A, Hillege H, Klungel O, Goettsch W. (on behalf of GetReal Work Package 1). What is real-world data? A review of definitions based on literature and stakeholder interviews. Value Health 2017; 20 (07) 858-865

Crossref PubMed Suche in Google Scholar
9 Brat GA, Weber GM, Gehlenborg N. et al. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020; 3: 109

Crossref PubMed Suche in Google Scholar
10 EHDEN Consortium. Accessed March 14, 2022, at: https://www.ehden.eu/

PubMed
11 Pedrera-Jimenez M, Garcia-Barrio N, Hernandez-Ibarburu G. et al. Building an i2b2-based population repository for COVID-19 research. Stud Health Technol Inform 2022; 294: 287-291

PubMed Suche in Google Scholar
12 ISARIC Clinical Characterisation Group. The value of open-source clinical science in pandemic response: lessons from ISARIC. Lancet Infect Dis 2021; 21 (12) 1623-1624 [published correction appears in Lancet Infect Dis 2021 Dec;21(12):e363]

Crossref PubMed Suche in Google Scholar
13 Mehra MR, Desai SS, Ruschitzka F, Patel AN. RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet 2020; S0140-6736 (20) 31180-6

PubMed Suche in Google Scholar
14 Mehra MR, Desai SS, Kuy S, Henry TD, Patel AN. Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med 2020; 382 (25) e102 [retracted in: N Engl J Med 2020 Jun 4]

Crossref PubMed Suche in Google Scholar
15 Kohane IS, Aronow BJ, Avillach P. et al; Consortium For Clinical Characterization Of COVID-19 By EHR (4CE). What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021; 23 (03) e22219

Crossref PubMed Suche in Google Scholar
16 Pedrera M, Garcia N, Rubio P, Cruz JL, Bernal JL, Serrano P. Making EHRs reusable: a common framework of data operations. Stud Health Technol Inform 2021; 287: 129-133

PubMed Suche in Google Scholar
17 FAIR Principles. Accessed March 14, 2022, at: https://www.go-fair.org/fair-principles/

PubMed
18 Force11. Accessed March 14, 2022, at: https://force11.org/

PubMed
19 European Commission. Cost of Not Having FAIR Research Data - Cost-Benefit Analysis for FAIR Research Data. Brussels: European Commission; 2018

Suche in Google Scholar
20 Pedrera M, Garcia N, Blanco A. et al. Use of EHRs in a tertiary hospital during COVID-19 pandemic: a multi-purpose approach based on standards. Stud Health Technol Inform 2021; 281: 28-32

PubMed Suche in Google Scholar
21 Blobel B. Advanced and secure architectural EHR approaches. Int J Med Inform 2006; 75 (3–4): 185-190

Crossref PubMed Suche in Google Scholar
22 Beale T. Archetypes: Constraint-based domain models for future-proof information systems. Eleventh OOPSLA Workshop on Behavioral Semantics: Serving the Customer (Seattle, Washington, USA, November 4, 2002). Edited by Kenneth Baclawski and Haim Kilov. Northeastern University, Boston, 2002, pp. 16-32

PubMed
23 ISO 13606 Standard, Part 1: Reference model. Accessed March 14, 2022, at: https://www.iso.org/standard/67868.html

PubMed
24 ISO 13606 Standard, Part 2: Archetype model. Accessed March 14, 2022, at: https://www.iso.org/standard/62305.html

PubMed
25 Muñoz A, Somolinos R, Pascual M. et al. Proof-of-concept design and development of an EN13606-based electronic health care record service. J Am Med Inform Assoc 2007; 14 (01) 118-129

Crossref PubMed Suche in Google Scholar
26 Goossen W. Representing knowledge, data and concepts for EHRS using DCM. Stud Health Technol Inform 2011; 169: 774-778

PubMed Suche in Google Scholar
27 Maldonado JA, Moner D, Boscá D, Fernández-Breis JT, Angulo C, Robles M. LinkEHR-Ed: a multi-reference model archetype editor based on formal semantics. Int J Med Inform 2009; 78 (08) 559-570

Crossref PubMed Suche in Google Scholar
28 Lozano-Rubí R, Muñoz Carrero A, Serrano Balazote P, Pastor X. OntoCR: a CEN/ISO-13606 clinical repository based on ontologies. J Biomed Inform 2016; 60: 224-233

Crossref PubMed Suche in Google Scholar
29 Health Ministry of Spain, Clinical Modeling Resources. Accessed March 14, 2022, at: https://www.sanidad.gob.es/profesionales/hcdsns/areaRecursosSem/Rec_mod_clinico_arquetipos.htm

PubMed
30 Pedrera M, Serrano P, Terriza A. et al. Defining a standardized information model for multi-source representation of breast cancer data. Stud Health Technol Inform 2020; 270: 1243-1244

PubMed Suche in Google Scholar
31 Coyle JF, Mori AR, Huff SM. Standards for detailed clinical models as the basis for medical data exchange and decision support. Int J Med Inform 2003; 69 (2–3): 157-174

Crossref PubMed Suche in Google Scholar
32 Donnelly K. SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inform 2006; 121: 279-290

PubMed Suche in Google Scholar
33 McDonald CJ, Huff SM, Suico JG. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003; 49 (04) 624-633

Crossref PubMed Suche in Google Scholar
34 Electronic Health Records Archetypes of Hospital Universitario 12 de Octubre. Accessed March 14, 2022, at: https://www.safecreative.org/work/2102196969593-h12o-covid-19-observations-archetypes

PubMed
35 Murphy SN, Weber G, Mendis M. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130

Crossref PubMed Suche in Google Scholar
36 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578

PubMed Suche in Google Scholar
37 Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42 (02) 377-381

Suche in Google Scholar
38 CDISC. -. Foundational. Accessed March 14, 2022, at: https://www.cdisc.org/standards/foundational

PubMed
39 ISARIC-WHO CRF for COVID-19. Accessed March 14, 2022, at: https://isaric.org/research/covid-19-clinical-research-resources/covid-19-crf/

PubMed
40 Structured Query Language (SQL). Accessed March 14, 2021, at: https://www.w3schools.com/sql/

PubMed
41 Project R. Accessed March 14, 2022, at: https://www.r-project.org/

PubMed
42 Archetype Query Language (AQL). Accessed March 14, 2022, at: https://specifications.openehr.org/releases/QUERY/latest/AQL.html

PubMed
43 Ramos M, Sánchez-de-Madariaga R, Barros J. et al. An archetype query language interpreter into MongoDB: managing NoSQL standardized electronic health record extracts systems. J Biomed Inform 2020; 101: 103339

Crossref PubMed Suche in Google Scholar
44 eXtensible Markup Language (XML). Accessed March 14, 2022, at: https://www.w3schools.com/xml/

PubMed
45 H12O Data Science software repository. Accessed March 14, 2022, at: https://github.com/DataDoce/EHR-Data-Operations

PubMed
46 Pedrera-Jimenez M, Garcia-Barrio N, Rubio-Mayo P. et al. Making EHRs trustable: a quality analysis of EHR-derived datasets for COVID-19 research. Stud Health Technol Inform 2022; 294: 164-168

PubMed Suche in Google Scholar
47 Lim Choi Keung SN, Zhao L, Rossiter J. et al. Detailed clinical modelling approach to data extraction from heterogeneous data sources for clinical research. AMIA Jt Summits Transl Sci Proc 2014; 2014: 55-59

PubMed Suche in Google Scholar
48 Maldonado JA, Marcos M, Fernández-Breis JT, Giménez-Solano VM, Legaz-García MDC, Martínez-Salvador B. CLIN-IK-LINKS: a platform for the design and execution of clinical data transformation and reasoning workflows. Comput Methods Programs Biomed 2020; 197: 105616

Crossref PubMed Suche in Google Scholar
49 Sun H, Depraetere K, De Roo J. et al. Semantic processing of EHR data for clinical research. J Biomed Inform 2015; 58: 247-259

Crossref PubMed Suche in Google Scholar
50 Pacaci A, Gonul S, Sinaci AA, Yuksel M, Laleci Erturkmen GB. A semantic transformation methodology for the secondary use of observational healthcare data in postmarketing safety studies. Front Pharmacol 2018; 9: 435

Crossref PubMed Suche in Google Scholar
51 Ong TC, Kahn MG, Kwan BM. et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 2017; 17 (01) 134

Crossref PubMed Suche in Google Scholar
52 ICHOM.. Standard sets. Accessed March 14, 2022, at: https://www.ichom.org/healthcare-standardization/

PubMed
53 European Medicines Agency. European Medicines Regulatory Network data standardisation strategy. Accessed March 14, 2022, at: https://www.ema.europa.eu/en/documents/other/european-medicines-regulatory-network-data-standardisation-strategy_en.pdf

PubMed
54 CDISC. Global regulatory requirements. Accessed March 14, 2022, at: https://www.cdisc.org/resources/global-regulatory-requirements

PubMed