Subscribe to RSS
DOI: 10.1055/a-2524-5216
Application of an Externally Developed Algorithm to Identify Research Cases and Controls from Electronic Health Record Data: Failures and Successes
Supported by: National Center for Advancing Translational Sciences UL1TR003163Supported by: National Institute of Arthritis and Musculoskeletal and Skin Diseases R01AR074989.
Background: The use of Electronic Health Records (EHRs) in research demands robust, interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., Genome-Wide Association Studies (GWAS)). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process. Objective: To implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear nested within a tertiary medical center in North Texas and to assess the algorithm's performance. Methods: We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged ICD (International Classification of Diseases) and CPT (Current Procedural Terminology) codes to identify case and control status for degenerative rotator cuff tears. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers. Results: Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tunning and corrections of errors in our gold standard dataset, we calculated a sensitivity of 0.94 and specificity of 0.76. Discussion: The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies. Conclusion: Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.
Publication History
Received: 06 October 2024
Accepted after revision: 15 January 2025
Accepted Manuscript online:
24 January 2025
© . The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany