Application of an Externally Developed Algorithm to Identify Research Cases and Controls from EHR Data: Trials and Triumphs

Nelly Estefanie Garduno-Rapp; Simone Herzberg; Henry H. Ong; Cindy Kao; Christoph U. Lehmann; Srushti Gangireddy; Nitin B Jain; Ayush Giri

doi:10.1055/a-2524-5216

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Download PDF

CC BY 4.0 · Appl Clin Inform 2025; 16(02): 314-326
DOI: 10.1055/a-2524-5216

Research Article

Application of an Externally Developed Algorithm to Identify Research Cases and Controls from EHR Data: Trials and Triumphs

Authors

Nelly Estefanie Garduno-Rapp^*

¹Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
Simone Herzberg^*

²Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States

³Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, Tennessee, United States
Henry H. Ong

⁴Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Cindy Kao

¹Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
Christoph U. Lehmann

¹Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States
Srushti Gangireddy

⁴Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Nitin B Jain

⁵Department of Physical Medicine and Rehabilitation, University of Michigan, Ann Arbor, Michigan, United States
Ayush Giri

²Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States

⁶Division of Quantitative and Clinical Sciences, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, Tennessee, United States

Funding This work received funding from the U.S. Department of Health and Human Services, National Institutes of Health, National Center for Advancing Translational Sciences (grant no.: UL1TR003163), National Institutes of Health, and National Institute of Arthritis and Musculoskeletal and Skin Diseases ( grant no.: R01AR074989).

Further Information

Also available at

Permissions and Reprints

Abstract

Background

The use of electronic health records (EHRs) in research demands robust and interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., genome-wide association studies). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process.

Objectives

This study aimed to implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear (RCT) nested within a tertiary medical center in North Texas and to assess the algorithm's performance.

Methods

We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in a case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged the international classification of diseases and current procedural terminology codes to identify case and control status for degenerative RCT. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers.

Results

Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tuning and correcting errors in our gold standard dataset, we calculated a sensitivity of 0.94 and a specificity of 0.76. The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies.

Conclusion

Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.

Keywords

phenotypic algorithms - data validation - clinical research informatics

Protection of Human and Animal Subjects

Our study received approval from the Institutional Review Board center STU-2020-0689. Only patients who provided informed consent at UTSW were included in the data query. To ensure confidentiality, all patient information was de-identified and securely managed.

^* These authors contributed equally.

Publication History

Received: 06 October 2024

Accepted: 15 January 2025

Accepted Manuscript online:
24 January 2025

Article published online:
26 March 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc 2017; 24 (06) 1142-1148

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Henke E, Zoch M, Peng Y, Reinecke I, Sedlmayr M, Bathelt F. Conceptual design of a generic data harmonization process for OMOP common data model. BMC Med Inform Decis Mak 2024; 24 (01) 58

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Kiourtis A, Nifakos S, Mavrogiorgou A, Kyriazis D. Aggregating the syntactic and semantic similarity of healthcare data towards their transformation to HL7 FHIR through ontology matching. Int J Med Inform 2019; 132: 104002

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Garza M, Del Fiol G, Tenenbaum J, Walden A, Zozus MN. Evaluating common data models for use with a longitudinal community registry. J Biomed Inform 2016; 64: 333-341

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Kumar G, Basri S, Imam AA, Khowaja SA, Capretz LF, Balogun AO. Data harmonization for heterogeneous datasets: a systematic literature review. Appl Sci (Basel) 2021; 11 (17) 8275

Crossref Search in Google Scholar
Download RIS citation
6 Sedlakova J, Daniore P, Horn Wintsch A. et al. University of Zurich Digital Society Initiative (UZH-DSI) Health Community. Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review. PLOS Digit Health 2023; 2 (10) e0000347

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Peng Y, Henke E, Reinecke I, Zoch M, Sedlmayr M, Bathelt F. An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM. Int J Med Inform 2023; 169: 104925

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Rosenbloom ST, Carroll RJ, Warner JL, Matheny ME, Denny JC. Representing knowledge consistently across health systems. Yearb Med Inform 2017; 26 (01) 139-147

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
9 Bick AG, Metcalf GA, Mayo KR. et al. All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 2024; 627 (8003): 340-346

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Abecasis GR, Altshuler D, Auton A. et al. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010; 467 (7319): 1061-1073

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Marees AT, de Kluiver H, Stringer S. et al. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int J Methods Psychiatr Res 2018; 27 (02) e1608

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Tashjian RZ, Kim SK, Roche MD, Jones KB, Teerlink CC. Genetic variants associated with rotator cuff tearing utilizing multiple population-based genetic resources. J Shoulder Elbow Surg 2021; 30 (03) 520-531

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Castro VM, Apperson WK, Gainer VS. et al. Evaluation of matched control algorithms in EHR-based phenotyping studies: a case study of inflammatory bowel disease comorbidities. J Biomed Inform 2014; 52: 105-111

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Thomas SV, Suresh K, Suresh G. Design and data analysis case-controlled study in clinical research. Ann Indian Acad Neurol 2013; 16 (04) 483-487

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Bucalo M, Gabetta M, Chiudinelli L. et al. i2b2 to optimize patients enrollment. Stud Health Technol Inform 2021; 281: 506-507

PubMed Search in Google Scholar
Download RIS citation
16 Prebay ZJ, Ostrovsky AM, Buck M, Chung PH. A TriNetX registry analysis of the need for second procedures following index anterior and posterior urethroplasty. J Clin Med 2023; 12 (05) 2055

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Chamberlin SR, Bedrick SD, Cohen AM. et al. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. JAMIA Open 2020; 3 (03) 395-404

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21 (02) 368-394

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Teixeira PL, Wei WQ, Cronin RM. et al. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc 2017; 24 (01) 162-171

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Ganz DA, Esserman D, Latham NK. et al. Validation of a rule-based ICD-10-CM algorithm to detect fall injuries in medicare data. J Gerontol A Biol Sci Med Sci 2024; 79 (07) glae096

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Yu J, Pacheco JA, Ghosh AS. et al. Under-specification as the source of ambiguity and vagueness in narrative phenotype algorithm definitions. BMC Med Inform Decis Mak 2022; 22 (01) 23

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Hruby GW, Boland MR, Cimino JJ. et al. Characterization of the biomedical query mediation process. AMIA Jt Summits Transl Sci Proc 2013; 2013: 89-93

PubMed Search in Google Scholar
Download RIS citation
23 Herzberg S, Garduno-Rapp NE, Ong H. et al. Standardizing phenotypic algorithms for the classification of degenerative rotator cuff tear from electronic health record systems. medRxiv . Accessed 2024 at:

Crossref
Download RIS citation
24 Harris PA, Taylor R, Minor BL. et al. REDCap Consortium. The REDCap consortium: building an international community of software platform partners. J Biomed Inform 2019; 95: 103208

Crossref PubMed Search in Google Scholar
Download RIS citation
25 Shewade HD, Vidhubala E, Subramani DP. et al. Open access tools for quality-assured and efficient data entry in a large, state-wide tobacco survey in India. Glob Health Action 2017; 10 (01) 1394763

Crossref PubMed Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Application of an Externally Developed Algorithm to Identify Research Cases and Controls from EHR Data: Trials and Triumphs

Authors

Abstract

Background

Objectives

Methods

Results

Conclusion

Keywords

Protection of Human and Animal Subjects

Publication History

References