Introduction
Annually, approximately 1,000,000 new cases of fatal or nonfatal pulmonary embolism (PE) occur in the United States and Europe.[1 ]
[2 ]
[3 ]
[4 ]
[5 ]
[6 ] Traditional cohort studies and registries continue to inform the epidemiology, prognosis, and outcomes of PE.[7 ]
[8 ]
[9 ]
[10 ]
[11 ]
[12 ]
[13 ]
[14 ]
[15 ] In turn, randomized controlled trials (RCTs) have informed the safety and efficacy of interventions, such as type and dose of anticoagulation and the utility of advanced therapies.[7 ]
[8 ]
[9 ] However, many questions about PE epidemiology and comparative effectiveness of health interventions remain unanswered. Despite the merits of traditional cohort studies and RCTs for informing PE epidemiology and effectiveness of PE treatment options, individual patient screening and enrollment with traditional methods are resource-intensive. Prospective enrollment at large scales such as a national level is also burdensome and often unfeasible. Therefore, more efficient ways are needed to identify patients with PE.
Electronic databases such as electronic health records (EHRs) or large administrative databases are advantageous for patient selection in retrospective studies. EHRs are also helpful for case selection in prospective observational studies, or for case selection in RCTs, as they can be screened fairly quickly. Querying the EHRs is more efficient than prospective manual screening of clinical practices.
The most common way to identify patients with PE through electronic databases is by using the International Classification of Diseases (ICD) codes. In recent years, ICD codes were revised to 10th modification (ICD-10). These codes make it possible for investigators to query individual hospitals or health system records, or to analyze large insurance databases, such as assessment of regional or national practice patterns, or trends in PE incidence and outcomes.[16 ]
[17 ]
[18 ]
[19 ]
[20 ] The American Heart Association uses the codes for the annual Heart Disease and Stroke Statistics .[1 ]
[21 ] The Agency for Healthcare Research and Quality uses the PE ICD-10 codes to track perioperative quality of care.[21 ] Observational comparative effectiveness studies have used these codes to share routine practice perspectives complementing RCT results and providing insights in contexts in which an RCT is unfeasible.[22 ]
[23 ] Recently, ICD codes have had novel uses such as patient screening and successful inclusion in pragmatic RCTs for cardiovascular diseases.[24 ]
Natural language processing (NLP), a branch of artificial intelligence, uses computers to transform unstructured data into analyzable variables.[25 ]
[26 ]
[27 ]
[28 ]
[29 ]
[30 ]
[31 ] NLP has received growing attention in biomedical research.[32 ] NLP is attractive for identification of patients with PE since it can potentially use various sections of the medical records including imaging reports for computed tomography pulmonary angiography (CTPA) or ventilation-perfusion imaging to confirm the diagnosis of PE, or even to automate additional features for screening or risk stratification.
However, there are important knowledge gaps related to the optimal approach for case selection of patients with PE. The existing studies using ICD-10 ([Table 1 ] with codes, [Table 2 ] with studies)[33 ]
[34 ]
[35 ]
[36 ]
[37 ]
[38 ]
[39 ]
[40 ] or NLP ([Table 3 ])[27 ]
[28 ]
[29 ]
[30 ]
[31 ]
[38 ]
[39 ] have had limitations including small number or being from a single center, lack of sharing sufficient details including about the location of the codes (in the principal discharge diagnosis position vs. secondary discharge diagnosis position), or limited cross-validation. The PE-EHR+ study has been designed to address these gaps in knowledge and to validate efficient tools for identification of patients with PE in electronic databases.
Table 1
ICD-10 codes for pulmonary embolism[a ]
ICD-10 Codes
Definition
I26
Pulmonary embolism
I26.0
Pulmonary embolism with acute cor pulmonale
I26.02
Saddle embolus of pulmonary artery with acute cor pulmonale
I26.09
Other pulmonary embolism with acute cor pulmonale
I26.9
Pulmonary embolism without acute cor pulmonale
I26.92
Saddle embolus of pulmonary artery without acute cor pulmonale
I26.93
Single subsegmental pulmonary embolism without acute cor pulmonale[a ]
I26.94
Multiple subsegmental pulmonary emboli without acute cor pulmonale[a ]
I26.99
Other pulmonary embolism without acute cor pulmonale
O88.2
Obstetric thromboembolism
Z86.711
Personal history of pulmonary embolism
Abbreviation: ICD-10, International Classification of Diseases 10th revision.
a Note that the codes can be placed in the discharge records as a Principal Discharge Diagnosis or Secondary Discharge Diagnosis, and that for research studies, either or both these locations can be queried, with tradeoffs between sensitivity and specificity. These issues will be investigated in depth in the PE-EHR+ study. Cases of amniotic fluid embolism or fat embolism, if identified by the PE codes, will be flagged. Although the code I82 and its sub-categories denote venous embolism and thrombosis, the subcodes are mostly related to deep vein thrombosis and were not included in the current study. If false negatives are identified in PE-EHR + , we will assess if a subset of them includes this code.
a Subsegmental PE is a challenging diagnosis.[50 ] Independent validation of the diagnosis in this subset will be attempted if the resources allow.
Table 2
Existing studies that assessed the accuracy of ICD-10 codes for PE[a ]
Study
ICD-10 codes assessed
Metrics assessed
Summary of findings
Comments
Burles et al[33 ]
I26.0
I26.9
Sensitivity
Specificity
PPV
NPV
Using data from 4 emergency departments in Alberta, California, the authors reported the accuracy of codes for detecting PE against chart review. Sensitivity was 91.1%, specificity was 99.9%, PPV was 82.3%, and NPV was 99.9%. No distinction was made between primary vs. secondary codes.
Among 479,937 visits, 1,453 patients with PE codes we found. The authors ran keyword search of the physician discharge diagnosis field among patients without PE codes to identify false negatives.
Casez et al[34 ]
I26.0
I26.9
O88.2
Sensitivity
Among 1,375 patients with suspected DVT/PE, ICD-10 codes were compared with diagnosis based on imaging studies. Sensitivity for PE was 88.9%. Specificity could not be assessed.
The authors assessed codes placed in Principal or secondary discharge position. Sufficient details about the breakdown were not provided.
Alotaibi et al[35 ]
I26.0
I26.9
Sensitivity
Specificity
PPV
NPV
The authors sampled 1,361 patients with probable VTE: 147 had a PE and 105 had a DVT. Predefined ICD codes were applied to the 1,361 patients to see who were coded correctly and who should not have been coded. Sensitivity for PE was 74.83%, specificity was 95.77%, PPV was 70.51%, and NPV was 93.35%.
Study from emergency departments in Canada. The ICD-10 PE codes were used in any position. Sufficient details about the breakdown were not provided.
Lawrence et al[36 ]
I26.02
I26.09
I26.92
I26.99
Sensitivity
Specificity
PPV
NPV
Charts of 487 patients receiving anticoagulation in a single institution were reviewed. For ICD-10 PEs, sensitivity was 100%, specificity was 79.3%, PPV was 17.1%, and NPV was 100%.
The authors assessed codes placed in Principal or secondary discharge position. Sufficient details about the breakdown were not provided.
Prat et al[37 ]
I26.0
I26.9
Sensitivity
Specificity
PPV
In a study of 970 patients who had a CTPA,
ICD-10 codes and NLP were compared with manual review (13% of patients had PE). Sensitivity of ICD-10 codes for PE was 92.9%, specificity was 91.0% and PPV was 60.6%.
Compared NLP to ICD-10 codes.
Compared NLP and ICD-10 codes for saddle PE and for subsegmental PE.
Johnson et al[38 ]
I26, I26.01, I26.02, I26.09, I26.0, I26.90, I26.92, I26.93, I26.94, I26.99, I26.9, I27.24, I27.82, Z86.711
Sensitivity
Specificity
PPV
NPV
In a study of 1000 random hospitalizations, NLP algorithms, and ICD-10 codes were compared with manual review. Sensitivity of ICD-10 codes for PE was 63%, specificity was 99%, PPV was 70%, and NPV was 99%.
The authors assessed ICD-10 codes in any position and did not assess the codes in Principal Discharge position, separately. NLP tools were also assessed in this study. See [Table 3 ].
Verma et al[39 ]
I26, O88.2
Sensitivity
Specificity
PPV
NPV
In a study from 5 hospitals in Canada, the authors reported the accuracy of an NLP algorithm that they developed, compared with simpleNLP and ICD-10 codes. For PE, they reported sensitivity of 57%, specificity of 1, PPV of 0.92, and NPV of 0.99.
The study also assessed accuracy of codes and NLP for DVT. However, detailed information about cohort breakdown for PE was not provided. Information not available for location of codes.
Andersson et al[40 ]
I26.0–I26.9
PPV
In a study of 559 patients with ICD-10 codes for PE from Sweden, chart review confirmed acute PE in 435 patients (PPV 78.9%). In 11 patients the codes were completely incorrect, and in another 47, the codes indicated prior diagnosis of PE but not acute PE.
The study did not provide sufficient discrimination between primary vs. secondary ICD-10 codes and did not assess sensitivity, specificity, or negative predictive values.
Abbreviations: CTPA, computed tomography pulmonary angiography; DVT, deep vein thrombosis, NPV, negative predictive value; PE, pulmonary embolism, PPV, positive predictive value.
a Data are based on a systematic search and review of the literature. See [Supplementary Material ] (available in the online version) for the search query.
Table 3
Natural language processing (NLP) algorithms used for assessment of PE in prior studies
Study
NLP method used
NLP
performance metrics
NLP technique and methods summary
Comments
Pham et al[27 ]
Generate ML features by using N-gram and manual annotation with Brat.
Precision
Recall
F-measure
CT angiography reports from 573 patients in a single French institution were used. An NLP algorithm was designed, trained with 100 reports, and tested in the remaining reports. There was 99% precision for PE. Details about positive predictive value and sensitivity were not mentioned.
The study was from France. Applicability to charts in English is uncertain.
Raja et al[28 ]
General Architecture for Text Engineering
Sensitivity
Specificity
PPV
NPV
General Architecture for Text Engineering (GATE) tool was applied to 179 CT angiography reports to identify PE, and compared against manual review. Sensitivity and positive predictive value of the NLP algorithm were, both, 91.3%. Specificity and NPV were, both, 98.7%.
Sample size was fairly small.
Tian et al[29 ]
Symbolic NLP classifiers
Sensitivity
Specificity
PPV
Using the imaging reports in a Canadian health system, the authors derived and validated an NLP algorithm for PE against manual review of the radiology reports. NLP achieved 94% sensitivity and 80% positive predictive value for PE and 96% specificity.
Selby et al[30 ]
Bag of words, N-gram
Sensitivity
Specificity
PPV
NPV
In a study using radiology reports and the WEKA machine learning toolkit, an NLP tool for detection of postoperative PE was developed. Among 703 patients in the validation set, sensitivity for PE was 90%, specificity was 98.7%, PPV was 81..8%, and NPV was 99.3%.
The study focused on postoperative PE.
Chen et al[31 ]
Convolutional Neural Network (CNN)
Sensitivity
Specificity
Accuracy
In a single-center study, convolutional neural network with unsupervised learning using TensorFlow (a deep learning library) and an NLP algorithm (PeFinder) were compared against imaging reports. TensorFlow had a sensitivity of 95.2%, specificity of 90.5%, and accuracy of 92.1%. PeFinder had a sensitivity of 94.5%, a specificity of 92.9%, and an accuracy of 93.5%.
Positive predictive values were not reported.
Johnson et al[38 ]
Rule-based NLP
Sensitivity
Specificity
PPV
NPV
In a study of 1,000 random hospitalizations, NLP algorithms, “simpleNLP” tool, and ICD-10 codes were compared to manual review. Sensitivity of NLP was 96.0% and specificity was 97.7%. Positive and negative predictive values were 86.3 and 99.4%, respectively.
ICD-10 codes were also assessed in this study. See [Table 2 ]. The authors identified better discrimination for saddle PE and for subsegmental PE with NLP, compared with ICD-10 codes.
Verma et al[39 ]
Rule-based NLP
PPV?
In a study from 5 hospitals in Canada, the authors reported the accuracy of an NLP algorithm that they developed, compared with simpleNLP and ICD-10 codes.
The study also assessed accuracy of codes and NLP for DVT. However, detailed information about cohort breakdown for PE was not provided. ICD-10 codes were also assessed in this study. See [Table 2 ].
Abbreviations: CT, computed tomography; ICD-10, International Classification of Diseases, 10th revision; NLP, natural language processing; NPV, negative predictive value; PPV, positive predictive value.
Note: Other abbreviations as in [Table 2 ]. See [Table 2 ] for the study by Johnson et al.[38 ]
Methods
General Design Features and Data Sources
The PE-EHR+ study has three distinct and complementary goals: (1) to validate ICD-10 codes, including the location and subtype of codes for selection of patients with PE through EHRs; (2) to validate an efficient NLP algorithm for selection of patients with PE in EHRs that have electronic versions of the imaging reports available; and (3) as a practical application of the codes, we will use the ICD-10 codes to report the trends in PE hospitalization and outcomes via validated ICD-10 codes in a national database of patients with PE in the United States ([Fig. 1 ]).
Fig. 1 Graphical summary of the goals of the PE-EHR+ study.
For the first aim, we will use data from the Mass General Brigham (MGB) Health System, in Massachusetts, United States. MGB includes several community hospitals and two large referral hospitals. It has been also prespecified to screen and explore an additional subset of charts from another large health system from the United States. (the Yale-New Haven Health System). The Institutional Review Board at Brigham and Women's Hospital (BWH) reviewed the study protocol and approved it, waiving the need for informed consent (IRB #2022P001226). For chart review from other sites, related Institutional Review Board approval will be obtained. The study will be performed at the Thrombosis Research Group at BWH, in close collaboration with the Medical Text Extraction, Reasoning, and Mapping System (MTERMS) laboratory at BWH, and the Yale-New Haven Hospital/Yale Center for Outcomes Research and Evaluation (CORE).
The initial study protocol was used as a platform for generation of the list for patient identification by two authors (Y.C.L. and B.B.). We selected the patient cohort from Enterprise Data Warehouse of MGB by using the following criteria: (1) patient age equal to or greater than 18 years and (2) inpatient encounter (hospitalization) with diagnosis date between January 1, 2016 and December 31, 2021. In the process of patient selection (see below) in addition to obtaining data related to presence or absence and position of the ICD-10 discharge codes for PE, we collected information such as age, sex, admission diagnosis, admission date, and discharge date for further reviewing purpose.
Study Samples
Three distinct groups of patients will be identified: (1) patients with ICD-10 Principal Discharge Diagnosis (primary codes) for PE, (2) patients with Secondary Discharge Diagnosis for PE (but no PE codes in the primary position), and (3) patients in whom no ICD-10 PE codes were mentioned during the index hospitalization event, either in the primary or in the secondary positions. A list of ICD-10 PE codes and their definitions are summarized in [Table 1 ]. [Supplementary Table S1 ] (available in the online version) summarizes the search query for identification of prior studies.
ICD-10 codes were introduced into practice in the United States. since the fourth quarter of 2015. Considering a potential learning curve in the health systems, we set the period for inclusion of patients and their hospitalization events from January 1, 2016 through December 31, 2021. If a given patient had multiple hospitalizations with similar patterns of codes in the study period (e.g., multiple hospitalizations with secondary discharge diagnosis of PE), only one hospitalization was selected randomly.
As an exploratory goal, if resources allow, we will also explore the accuracy of ICD-10 codes for chronic thromboembolic pulmonary hypertension (I27.24).
Exposure Variable and Data Extraction for the ICD-10 Code Analysis
The main exposure variable is the presence of ICD-10 codes for PE in the primary position, secondary position, or none at all in the discharge records in the ICD-10 code analysis.
The reference standard for identification of PE will be chart review by two trained independent clinician abstractors using standardized definitions ([Table 4 ]). The data abstraction form will be created and piloted in five charts per group. Once the form is finalized, the study protocol will be made available to abstractors. The abstractors will then review the patient charts, including imaging studies, discharge summaries, and other records, to verify the diagnosis of PE. For review of each individual chart, the abstractors will have full access to electronic medical records, but not the designated ICD-10 codes in the research database, to provide unbiased assessment of each chart. Discrepancies between the two abstractors' findings will be discussed and, if unresolved, will be decided by input from the Principal Investigator. In the unlikely event that PE ascertainment is not feasible for a given chart, that chart will be excluded (see statistical analysis).
Table 4
Operational definitions for the assessment of the accuracy of ICD-10 codes for PE, subsegmental PE, and cor pulmonale according to chart review[a ]
Condition by the ICD-10 codes
Definition according to chart review
Comment
PE[b ]
Mentioning of PE in medical notes such as discharge summary, verified by sufficient confirmatory findings for PE in radiology reports from the index hospitalization (such as reports for filling defect in CTPA, high-probability V/Q scan, direct verification of pulmonary thrombi/emboli in invasive angiography, or presence of new proximal DVT in conjunction with symptoms and signs of PE).
The abstractors will be blinded to the ICD-10 code results.
Subsegmental PE[b ]
Report of subsegmental filling defects consistent with the diagnosis of PE in radiology reports, without involvement of segmental, lobar, or central pulmonary arteries.
A sub-component of the PE-HER study plans to assess 50 CTPA studies with an initial radiology report for subsegmental PE by a core laboratory.
Acute cor pulmonale[b ]
[c ] in the setting of PE
Evidence of newly identified RV dysfunction evidenced by at least one of the following:
• Radiology report indicating RV/LV ratio ≥1.0[d ], or enlarged RV, or bowing of the interventricular septum, or the term “RV strain,” or a combination of these.
• Echocardiographic report indicating RV/LV ratio ≥0.7[d ], or enlarged RV, or bowing of the interventricular septum, or the term “RV strain,” or TAPSE <16, or RV-free wall hypokinesis, or the term McConnell sign, or newly identified elevated RVSP (>30 mmHg) without another cause, or a combination of these.
• Elevation of cardiac troponins above the normal assay values.[e ]
Several of the ICD-10 codes refer to cor pulmonale. However, major expert guidelines do not use this terminology, and there is no universal definition for the term exists. In the PE-EHR we considered acute cor pulmonale if there was evidence of newly identified RV dysfunction.
Abbreviations: CTPA, computed tomography pulmonary angiography; ICD-10, International Classification of Diseases, 10th revision; PE, pulmonary embolism, RV, right ventricular; RVSP, right ventricular systolic pressure; V/Q scan, ventilation/perfusion scan.
a The main goal of this study is not to re-adjudicate the initially identified events during routine clinical care, but rather to assess the success of ICD-10 codes to accurately capture the information related to PE as occurred in the index routine care hospitalization. Therefore, routine core laboratory assessment of individual imaging studies is not considered. For a subset of patient, core laboratory assessment may be considered as a supplemental goal of the project. See the text for details.
b If patients are transferred from other facilities and there is no existing report for their original CTPA or V/Q scan, the study Principal Investigator will attempt to verify the diagnosis of PE from the original imaging studies. However, further attempt assessment for subsegmental PE or acute cor pulmonale will not be made to keep the assessment criteria uniform.
c Since S1 Q3 T3 pattern is nonspecific, it was not considered.
d Different cutoffs have been used for CTPA assessment and echocardiographic assessment of RV/VL ratio. A higher threshold is associated with higher specificity for identification of RV dysfunction as a prognosticator of adverse clinical outcomes. In echocardiographic assessment, RV/LV ratios >0.6 have been assessed in some studies. Since in PE-EHR+ there is no a priori plan to independently re-measure the values, but rather to rely on reports of CTPA and echocardiography, to facilitate the process, the abstractors will be advised to look for an RV/LV ratio cutoff >0.9 in the CTPA or echocardiographic reports.
e For patients with estimated creatinine clearance <60 mL/min, troponin levels may be chronically elevated. At least a 20% elevation compared to the prior recorded troponin would be required. Fifth generation (high-sensitivity) troponin assays detect very modest elevations in troponin. However, the clinical significance of very modest elevations in troponin (undetected by fourth-generation assays) in patients with PE remains uncertain. By consensus among coauthors (B.B., D.J., G.P.), high-sensitivity troponin values beyond 30 ng/L not explained by another cause will be considered positive in the PE-EHR+ study.
Exposure Variable and Data Extraction for the NLP Analyses
The main exposure variable in the NLP analysis will be the presence of PE based on NLP automated review of radiology reports. The reference standard for identification of PE will be chart review by trained clinician abstracts, as summarized above.
EHRs provide large amounts of data for research. While data elements such as laboratory tests are structured, medical notes or imaging reports are created as free text without predefined structured data elements.[26 ]
[41 ]
[42 ]
[43 ] Natural language, such as words in medical charts, are not typically “coded ” or conducive to computations for case selection or statistical analyses in research studies. The resource-intense nature of manual chart review to abstract data from free-text fields precludes timely or large-scale analyses.
NLP re-encodes free-text notes (natural language) into structured format that facilitates data extraction and analysis. Briefly, EHR-based NLP techniques can be grouped into three categories: (1) keyword searches or rule-based systems; (2) supervised learning systems; and (3) unsupervised learning systems. The development of a successful NLP algorithm entails multiple steps including tokenization, word stemming, lemmatization, and others ([Table 5 ]).[25 ] NLP can handle synonyms, acronyms, and typos that are added in the system (e.g., embolsim instead of embolism ). Once the algorithm is derived (training set) and validated (testing set), with satisfiable performance, it can conduct the disease identification task automatically.
Table 5
Basic definitions related to natural language processing as they relate to identification of pulmonary embolism in medical charts
Concept
Definition
Corpus
The unstructured large body of text. Examples include medical notes or imaging reports.
Tokenization
A token represents linguistic units, including single words and spaces. Tokens can be combined to form larger units including phrases. Examples include pulmonary and embolism.
Stop words
Stop words are some of the most common used words in the free text. They may be prepositions, pronouns, conjunction, … etc. Stop words are typically removed during the data preprocessing stage of NLP since they do not frequently contribute additive information to the text. Examples include “the ,” “is ,” and “and .”
Acronyms/abbreviations
The same acronym may have different meanings in the chart. PE can denote pulmonary embolism, but may be used to refer to physical examination. Others may use the acronym “PTE” to refer to pulmonary thromboembolism. However, PTE can be used to refer to pulmonary thromboendarterectomy.
Word stemming
This process groups the tokens with similar root meanings. Examples include embolism and embolic for which the stem is “emboli.”
Lemmatization
This process converts words to dictionary forms. The lemma for better and best is “good.”
Polysemy/word sense disambiguation
Multiple meanings from the same word. A general example is “cold.” It can refer to the viral illness, or cold temperature.
Lexicon
It is a collection of information about the words and the lexical categories to which they belong (noun, verb, adjective, adverb, preposition). To avoid missing the concept of pulmonary embolism in the clinical text, we will need a dictionary (i.e., Lexicon) to store the possible ways pulmonary embolism is described in the clinical text (e.g., pulmonary embolism, pulmonary emboli, pulmonary thromboembolism, filling defect in the pulmonary artery). Subsequently, the ones deemed relevant can be programmed to be identified.
N-gram (Bigram)
To better capture the exact terminology we are looking for, N-gram will be used to identify the contiguous sequence of N items. When the N is equal to two, we will call it as bigram. An example is to look for bigram “pulmonary
embolism ” rather than “Bilateral pulmonary infiltrates in the lower lobes. No evidence of paradoxical embolism.”
Negation
Handling of negation is a common task for NLP process and is quite important in clinical notes since the negation statement is often used in the differential diagnosis process. By considering the context of a sentence, the NLP algorithm can distinguish the concept is truly existing or not in the sentence. An example is to avoid misclassification of
“No pulmonary embolism ” or “pulmonary embolism not present ” as pulmonary embolism.
Abbreviations: NLP, negative predictive value; PE, pulmonary embolism.
Outcome Variables
The main outcomes will be the sensitivity, specificity, positive and negative predictive values of the ICD-10 codes for determining PE compared with medical chart review. These will be based on standard epidemiological definitions. In addition, we will determine the accuracy of these codes (defined as true positive plus true negative, divided by the combination of true positive, true negative, false positive, and false negative) ([Table 6 ]). Outcomes for the NLP analyses will be similar.
Table 6
Outcome variables for the assessment of the accuracy of the ICD-10 codes[a ]
Outcome measure
General definition
Operational definition in PE-EHR + [b ]
Sensitivity
Probability of a patient with the outcome of interest being correctly classified as having the outcome
The number of patients correctly identified as having PE according to the test (codes) divided by the entire number of patients who had PE according to manual chart review.
Specificity
Probability of a patient without the outcome of interest of being correctly identified as not the outcome
The number of patients correctly identified as not having PE according to the test (codes) divided by the entire number of patients who did not have PE according to manual chart review.
PPV
Proportion of patients identified as having the outcome according to the test that did, in fact, have the outcome
The number of patients correctly identified as having PE according to the test (codes) divided by the entire number of patients for whom the test (codes) called a PE.
NPV
Proportion of patients identified as not having the outcome of interest that did not, in fact, have the outcome
The number of patients correctly identified as not having PE according to the test (codes) divided by the entire number of patients for whom there was no code for PE.
Accuracy
Proportion of the total number of cases examined that were correctly identified as having or not having the outcome of interest
The number of patients correctly identified as having PE plus the number of patients correctly identified as not having PE according to the test (codes) divided by the entire pool of patients.
Abbreviations: ICD-10, International Classification of Diseases, 10th revision; NPV, negative predictive value; PE, pulmonary embolism; PPV, positive predictive value.
a A similar approach will be used for assessing the accuracy of NLP tools.
b The main analyses will be performed on a weighted sample, in which patients with ICD-10 codes for PE and patients without ICD-10 codes for PE are weighed according to the actual frequency of the codes in the entire database. In a sensitivity analysis, we will assess the accuracy metrics only in the studied sample, without weighting.
Statistical Analysis
With respect to sample size estimates, we will select an equal number of patients with and without ICD-10 codes for PE to facilitate the assessment of both sensitivity and specificity of the codes for PE. With a two-sided α of 0.05 and confidence interval width of 10%, a sample of 550 per group (550 with ICD-10 codes and 550 without) provides 80% power to detect a positive predictive value of 80% for the PE-related ICD-10 codes compared with manual chart review. To assess patients who had a secondary discharge diagnosis ICD-10 PE codes, a separate set of 550 charts will be selected. Assuming a need to exclude 5% of the charts, 578 charts will be planned for review (total of 1,734 charts). Once the review of these charts is completed, to approximate the true incidence of PE, weighting will be applied to the completed database.
The total number of hospitalized patients with ICD-10 Principal Discharge Diagnosis of PE in the MGB in the aforementioned period (January 1, 2016 through December 31, 2021) is 4,878. The number of patients hospitalized with ICD-10 Secondary Discharge Diagnosis of PE is 3,224, whereas 373,540 adult patients did not have any codes for PE during their hospitalization. These are relatively similar to estimates from prior studies.[18 ]
[44 ]
[45 ] To be able to provide accurate estimates for not only sensitivity and specificity, but also other measures of test performance which may depend on prevalence of the studied conditions, we will be weighing the results of the three 550-patient groups of patients proportionate to their actual size, before measures of test performance are calculated for ICD-10 codes in the primary discharge position, or secondary discharge position. A similar approach will be pursued to determine the measures of test performance for NLP compared with manual chart review.
Categorical variables will be reported with frequency counts and percentages. Test characteristics will be reported with their respective 95% confidence interval estimates. Weighting will not affect the sample size estimate for specificity.
Sensitivity Analysis and Subgroup Analyses
We will conduct exploratory analyses in which a combination of thrombosis-related diagnostic (e.g., CTPA) or therapeutic procedure codes (e.g., fibrinolytic therapy or vena cava filter placement, [Supplementary Table S2 ] [available in the online version]), or present-on-admission codes, will be added to the ICD-10 discharge codes to assess whether they improve the accuracy for patient identification compared with the ICD-10 codes alone.
Further, we will conduct analyses to assess the validity of specific subgroups of PE codes. For example, some PE codes indicate hemodynamic consequences (e.g., I26.0: PE with acute cor pulmonale). As the availability of subgroup-specific samples allows, the validity of the code subsets for classifying patient status will be compared against manual medical chart review with reference to definitions from the international clinical guidelines.[7 ]
[8 ] Consistency of the results across the participating hospitals will be assessed. Consistency of the codes' accuracy will be also checked for patients included before versus after the coronavirus disease 2019 (COVID-19) pandemic.[46 ]
[47 ]
[48 ]
[49 ] In addition, if the resources allow, we may check the accuracy of the codes in the subgroup of patients with active cancer (diagnosed within prior 5 years and on treatment, palliative care, or close surveillance) and will investigate the trends in accuracy of codes over time.
In addition, the diagnosis of subsegmental PE has been a subject of intense debate.[50 ] We have prespecified to validate the reports of subsegmental PE by independent verification of the diagnosis by two independent certified radiologists among 50 to 100 patients.
Practical Implementation of ICD-10 Codes
Finally, as a practical part of the PE-EHR study, the validated ICD-10 codes will be used to identify patients with PE in a 100% sample of patients in the Medicare Fee-For-Service database to report the trends in PE hospitalizations and mortality rates. Such analyses will be complemented by trend analyses from the Registro Informatizado de Pacientes con Enfermedad TromboEmbólica (RIETE) registry.[14 ]
Discussion
The PE-EHR+ study provides a unique opportunity to validate the tools for efficient identification of patients with PE via EHRs using ICD-10 codes and NLP algorithms ([Fig. 2 ]). With respect to ICD-10 code validation, PE-EHR+ has several strengths compared with the existing investigations and will complement their findings.[33 ]
[34 ]
[35 ]
[36 ]
[37 ]
[38 ]
[39 ]
[40 ] Unlike several other studies, PE-EHR+ has a prespecified power calculation. In addition, discharge records will be reviewed from both community hospitals and large referral hospitals with a diverse patient population. Further, we will separately assess the accuracy of the codes in the Principal Discharge Diagnosis versus Secondary Discharge Diagnosis positions. From one end, it is conceivable that PE codes in the Principal Discharge Diagnosis position have a higher specificity and positive predictive value for patient identification. In contrast, Principal Discharge Diagnosis codes may underestimate the PE burden, since PE events in some situations may be a complication of the hospitalization but not severe enough to warrant designation as the Principal Discharge Diagnosis. Coders who focus only on discharge summaries may miss radiology reports that would identify PE diagnoses.[51 ] PE codes placed as Secondary Discharge Diagnosis may be more sensitive but are prone to false positive findings. This is because PE may be coded in secondary discharge positions in patients with prior events that were relevant for the clinical care delivered in the index hospitalization, but were not acute events that occurred in that index hospitalization. An important strength of the PE-EHR+ study is that it includes not only hospitalization records for patients with claims codes for PE, but also hospitalization records for patients without PE claims codes. This gives the opportunity to ascertain the specificity and positive predictive value of the codes, but also the possibility of false negative results, and sensitivity of the codes. The predefined weighting criteria will be helpful in this process, as well. With respect to NLP algorithms for identification of PE,[27 ]
[28 ]
[29 ]
[30 ]
[31 ] the PE-EHR+ study has the opportunity to validate those results in a large database of patients from diverse hospital settings and may modify the existing algorithms, as needed.
Fig. 2 Methods for identification of patients with pulmonary embolism in electronic databases and their tradeoffs.
A prespecified plan to validate the subgroups of the codes that may capture higher risk is also of particular interest. Many questions about the epidemiology and durable outcomes for contemporary patients with intermediate-risk PE and high-risk PE remain unanswered. If the ICD-10 codes or NLP are proven to be efficient and reliable for patient screening, they may facilitate patient selection in future epidemiological or comparative effectiveness studies. Similarly, the ancillary goal to assess the accuracy of the codes against the original reports for subsegmental PE, and to also validate the original diagnosis of subsegmental PE by review of images by two independent radiologists, will provide important novel data.
The components of the project related to validation of ICD-10 codes and NLP algorithms are meant to complement but not supplant each other. For example, some data sources (such as national administrative data) do not include radiology reports or medical notes, and as such, NLP will not be feasible in those data sources. In turn, in EHRs, use of NLP might be advantageous or even further, in databases that have access to both NLP and ICD-10 codes, a hybrid approach that incorporates both ICD-10 codes and NLP might yield the highest accuracy.
We did not prespecify a particular threshold to consider a high enough accuracy (defined as combination of true positives and true negatives divided by all observations). Although an ideal test has both high sensitivity and positive predictive value (and therefore accuracy), it is possible that no single permutation of codes is able to achieve both goals, but that different combinations of codes would be required for maximizing sensitivity versus positive predictive value.
The limitations of the PE-EHR+ study should be kept in mind for appropriate context and interpretation. First, this study will be focusing on PE. The available resource will not lend support to expand to other thrombotic conditions. As such, efficient and reliable tools will be similarly needed for identification of patients with deep vein thrombosis, or arterial thrombotic events such as acute myocardial infarction, ischemic stroke, and acute limb ischemia. Second, the reference standard for verification of PE in this study is review of medical records for presence of PE in the chart, but not independent re-assessment of the testing modalities that led to the diagnosis of the PE events in every case. Considering that the study is based on existing chart records, this can potentially be associated with certain limitations. However, prospective enrollment of such a large sample would require several years and enormous resources. In most cases with initial radiologist confirmation of PE in larger branches or the main pulmonary arteries, a false positive diagnosis is very unlikely.[52 ]
[53 ] Subsegmental PEs may be an area of potential concern. To mitigate that, we have made a priori plans to do independent validation of the diagnosis for 50 to 100 patients with subsegmental PE according to the imaging reports. Third, we should acknowledge that the original phase of the PE-EHR+ study will only include data from several centers in the United States. While the overall structure of the PE codes are similar around the world, minor differences with respect to granular subgroups of codes may exist. With several international investigators in the Steering Committee of the PE-EHR + , we envision to test the optimized algorithms identified through PE-EHR+ in future studies of non-U.S. data sources to ascertain the consistency of the findings. Fourth, implementation of NLP algorithms for chart screening and automated abstraction is a complex resource-intensive undertaking. Therefore, the main focus will be on radiology reports, which are more structured and desirable for NLP. Further, we will perform external validation of the existing NLP algorithms used in studies for thrombotic diseases.[27 ]
[28 ]
[29 ]
[30 ]
[31 ] If their accuracy is suboptimal, modifications will be planned to optimize them. The teams at MTERMS and CORE have ample expertise to provide guidance for accomplishment of the project goals related to NLP. Finally, COVID-19 is associated with excess risk of venous thromboembolism[46 ]
[47 ]
[48 ] and may potentially impact PE presentation or how the codes were used, even among non-COVID-19 patients.[49 ] Therefore, we will do a sensitivity analysis for the codes, restricting the results to the prepandemic period.
In conclusion, the PE-EHR study will help validate efficient tools for identification of patients with PE in EHRs. These include ICD-10 codes in the Principal Discharge Diagnosis or Secondary Discharge Diagnosis positions, and NLP algorithms based on assessment of imaging reports. These validated tools will facilitate the timely use of EHRs for case selection for observational studies or randomized trials of patients with PE.