Keywords
system improvement - disease management - clinical research informatics - allergy
and immunology - clinical data management
Background and Significance
Background and Significance
Asthma exacerbation, characterized by an acute worsening of asthma symptoms, is quite
prevalent, with 1.6 million emergency department (ED) visits a year in the United
States.[1]
[2] These are associated with significant morbidity in terms of abrupt decline in physical
function, lung function, and health-related quality of life, including missed school
and work days, increased health care utilization, and mortality.[3]
[4]
[5]
[6]
[7]
[8] Multiple predictors for future risk of asthma exacerbation have been found but recent
exacerbation history, especially leading to ED visit or hospitalization, is the strongest
predictor of future asthma exacerbations.[9]
[10]
[11] Evidence suggests that currently available therapy prevents exacerbation during
treatment.[12]
[13] Therefore, asthma exacerbation leading to ED visit is a potentially preventable
clinical outcome and is of great interest both to health services research as an indicator
of quality of care to the patient, and to epidemiologic studies.
It has been previously shown that less than a fifth of the asthma population was responsible
for more than four-fifths of the total direct costs emphasizing the need for better
management of a high-risk cohort.[14] For the purposes of quality improvement, we need performance measures that are easily
identified using clear administrative or clinical (electronic medical record [EMR]) criteria, actionable based on evidence-based guidelines without ambiguity for most patients,
and reliable. Commonly used Healthcare Effectiveness Data and Information Set asthma
quality metrics pertain to controller medication filling and controller-to-total asthma
medication ratio.[15] However, the use and response to asthma medication is variable unlike other chronic
diseases and these process measures are inadequate to identify the risk of health
care utilization.[16] Asthma exacerbation leading to ED visit is a good performance metric in the asthma
population as it is the outcome that needs to be identified, managed, and avoided
for the previously mentioned reasons. As performance measures are definitive standards
of care against which the care provided is judged, extensive care needs to be taken
in defining such a measure.[17]
Identification of asthma-related ED visits in studies has been conventionally done
by International Classification of Diseases, Ninth Revision (ICD-9) discharge codes,
chief complaints, and medication use in EMR and claims data.[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29] A systematic review of validated methods to capture acute bronchospasm using administrative
or claims data showed that there are only two studies with validation characteristics
among the 38 studies using an algorithm.[30] An EMR algorithm based on chief complaints implemented in the Southern U.S. showed
sensitivity of 45%, specificity of 92%, positive predictive value (PPV) of 79%, and
negative predictive value (NPV) of 70%.[31] A study based on claims data showed that claims failed to identify 29% of encounters
with asthma diagnoses and 45% of nebulization procedures administered during encounters;
30% of documented asthma prescriptions were not associated with filed claims and vice
versa.[32] Another EMR study performed in the Midwest since the systematic review, based on
Bayesian network which used electronic information available at the time of ED visit
like age, respiratory rate, chief complaint, oxygen saturation, and acuity level,
and historical data like past medical history, medications, and billing codes, yielded
a PPV of 69.9%.[33] Severe chronic lower respiratory disease exacerbations which combined asthma and
other respiratory diseases as measured by discharge diagnosis code had a predictive
value of 85 to 95% based on the threshold for gold standard.[34]
In spite of the lack of data to correctly identify ED visit related to asthma exacerbation,
the above conventional methods are used by health systems to evaluate their performance
using EMR data, by health plans to track utilization using claims data, and by researchers
to assess outcome in epidemiologic studies using either data.[35]
[36]
[37]
[38]
[39] Routine operation of health care systems provides tremendous electronic information
at patient level, but this needs to be harnessed and adapted for research and administrative
purposes. Developing algorithms using clinical and administrative data elements help
to identify various health outcomes of interest. Given the need for the use of asthma
exacerbation-related ED visit as a performance measure for systems-based practice
and for epidemiologic studies based on health care databases, there is an urgent need
to develop an algorithm with adequate performance metrics, based on any available
data source—EMR or claims.
Objective
We propose to utilize multiple data elements available in the EMR and claims database
to create separate algorithms with high validity for clinical and research purposes
to identify asthma exacerbation-related ED visit among the general population.
Methods
Study Overview
This is a retrospective study of 1,000 ED visits randomly selected out of all eligible
visits, for patients in a single health system in Pennsylvania with asthma, with a
subset of these visits also having claims data. Chart review was performed to classify
ED visits into those related to asthma exacerbation or otherwise, which was the gold
standard. Different data elements, like demographics, chief complaints, vital signs,
medications, and discharge diagnoses as available in different databases, were obtained
to create separate asthma exacerbation-related ED visit algorithms in EMR and claims
by comparing against the gold standard.
Study Population and Subject Selection
Geisinger is a large integrated health system which serves more than 400,000 primary
care patients across a 45-county area in Pennsylvania, United States, with this population
being representative of the general population of central and northeastern Pennsylvania.[40] It utilizes EMR software by Epic Systems Corporation. Geisinger Health Plan (GHP)
is a full-service regional plan covering approximately 500,000 people.[41] Our inclusion criteria were ED visit to a Geisinger ED from January 1, 2006, to
October 28, 2013, of patients who were of age 4 to 40 years with asthma in the problem
list (ICD-9 code 493.xx). Exclusion criteria were chronic obstructive pulmonary disease
(COPD) (ICD-9 codes 491, 492, 496) or coronary artery disease/heart failure (ICD-9
codes 410–414, 428) or cystic fibrosis (ICD-9 code 277.xx) or bronchiectasis (ICD-9
code 494) in the problem list or ED visits resulting in hospital admission. These
were comorbidities which can present with lower respiratory symptoms, making it difficult
to attribute an exacerbation to one specific disease condition.
Given that not all patients who visit the Geisinger ED have GHP insurance, we enriched
our study sample for ED visits with GHP insurance, providing claims data. We also
enriched for ED visits with classifiers of interest, as asthma patients visit ED for
asthma-related and nonasthma-related concerns. A total of 4,708 eligible ED visits
with both EMR and claims data were identified and we aimed for a total of 1,000 ED
visits. The visits were stratified into 15 strata according to the presence or absence
of classifiers of interest (primary diagnosis, secondary diagnosis, respiratory complaint,
bronchodilator use) in isolation or in combination. First, from these strata, random
sample of visits of predetermined proportion (as determined by the size of the stratum
in relation to other strata and the probable value of that stratum in predicting asthma
exacerbation), or all visits, if the strata were small, were taken for a total n = 780 ED visits. An additional 220 ED visits were randomly identified from similar
strata in the eligible 20,379 ED visits in the Geisinger population without claims
data, for a total of 1,000 ED visits. This stratified sampling methodology was done
to increase the probability of presence of classifiers of interest in the study data
set.
Gold Standard Generation
The research team of allergist, informaticist, and primary care physicians developed
a structured chart review abstraction tool by consensus decision-making, which evaluated
both the exclusion criteria and the diagnosis of asthma exacerbation in the ED visit.
The abstraction tool included various data elements from problem list, past medical
history, history of present illness, physical exam, assessment and plan in the ED
notes, medication administration, and progress notes, and was used to conclude the
presence or absence of asthma exacerbation during ED visit. Two reviewers were trained
on chart abstraction and performed chart reviews. A subset of charts was randomly
selected from the study sample for validation of the exacerbation classification by
another independent reviewer. The asthma exacerbation status in the ED visit by chart
review was considered as the gold standard in algorithm development.
Definitions of Predictor Variables
Principal and secondary diagnoses of asthma exacerbation were identified by clinically
coded ICD-9 code of 493.xx as listed in the encounter or claims. If an ED visit had
asthma as both principal and secondary diagnosis, it was only counted for principal
diagnosis. Respiratory complaints were identified by the terms “upper respiratory
infection,” “cough,” “flu,” “congestion,” “wheezing,” “respiratory distress,” “short
of breath,” “asthma,” “asthma attack,” “chest discomfort,” “chest pressure,” “pneumonia,”
“chest pain,” “bronchitis,” “hyperventilating,” “hemoptysis,” “cold symptoms,” “airway
obstruction,” or “chest tightness” listed in the chief complaint. Bronchodilator use
in the EMR data was identified by any acute bronchodilator use (via metered dose inhaler
or nebulizer) and systemic steroids by medication class of corticosteroids. Nebulization
codes used in the claims data were “aerosol or vapor inhalations” and “airway inhalation
treatment.” All the above variables were coded as binary variables. Vital signs used
for analysis were the initial readings at the time of the ED visit.
Statistical Analysis
Separate analyses were done for EMR and claims data elements. The EMR data set included
the ED visit subset with claims data, but the data elements for analysis were from
EMR. We compared the predictor variables for those with asthma exacerbation and those
without in the ED visit using t-test for continuous variables and chi-square test for categorical variables. We performed
unadjusted and adjusted survey logistic regression analyses to identify predictors
of asthma exacerbation in an ED visit. Continuous variables were modeled using both
linear and quadratic terms to assess for nonlinearity after centering, and the quadratic
term was dropped if there was no quadratic association. Model building was done separately
for EMR and claims data, by adding covariates (age, sex, race, season, insurance,
respiratory rate, pulse rate, temperature, body mass index [BMI], smoking, principal
and secondary diagnoses of asthma, respiratory complaint, bronchodilator use, and
corticosteroid use) to the model and looking for conditional significance. The final
base model included respiratory complaints listed in chief complaints, principal diagnosis,
any short-acting bronchodilator, and steroid (oral or intravenous) use in the ED for
EMR data; and principal and secondary diagnoses, nebulization, and steroid (oral or
intravenous) use in the ED for claims data.
For analysis using survey methodology, some strata were combined to prevent single
sampling unit as recommended.[42] These resulted in 15 strata for EMR data and 11 strata for claims data during analysis
([Supplementary Tables S1] and [S2], available in the online version). Weight for each stratum was calculated based
on the principle of inverse probability weighting (weight = 1/proportion of patients
in the study population for that particular stratum).[40]
[43]
[44] Extreme weight was predetermined to be truncated to the next highest weight or to
a preset weight of 10.[45]
[46] Based on this, one extreme weight for EMR data was truncated to 10.02, and to 10
for claims data for regression analysis. However, untruncated weights were used for
prevalence estimates. Finite population correction was used as large proportions were
sampled. We calculated Cohen's kappa statistic to evaluate the interrater agreement
on the chart review conclusion and for assessing agreement between EMR and claims-based
measurement of predictor variables.
We were interested in assessing the performance of currently used algorithm elements,
both in isolation and combination. We evaluated the performance of 5 models in EMR
and 4 models in claims. Model performance discrimination was evaluated using the area
under the curve (AUC), which was calculated by nonparametric trapezoidal approximation
to the estimated false-positive and true-positive rate points and compared with each
other to choose the final predictive model.[47] AUC interpretation for the usefulness of the classifier was considered with the
following cut-offs: > 0.9: high accuracy; 0.7 to 0.9: moderate accuracy; and 0.5 to
0.7: low accuracy.[48]
[49] Nomograms were constructed for the final predictive models to provide visual approach
for calculating probabilities of ED exacerbation.[50] We compared EMR models to the claims models for the sample with overlapping data.
Models were redone for EMR algorithm using this restricted sample for the sake of
comparability and AUC was compared.
We performed sensitivity analyses by using different definitions of the predictor
variables. For EMR, we utilized only nebulizer administration in the ED instead of
any short-acting bronchodilator. For claims, we utilized additional claims submitted
on the same day as ED visit claims, which provided separate principal and secondary
diagnoses and medications. We combined both the same-day claim files and ED visit
claim files and performed the analysis. A p-value of ≤ 0.05 was considered statistically significant. All analyses were done
in the Stata statistical software, version 14.2 (StataCorp LP, Texas, United States).
Results
Validation of Chart Review
An independent reviewer evaluated 66 charts previously reviewed by the initial reviewers
(29 for one and 37 for another). Agreement between reviewers was 95.45% (63/66) and
the kappa statistic was 0.91. The discrepancies were in missing cardiomyopathy which
was an exclusion criterion by one; missing asthma exacerbation in the setting of a
fall; and allergic reaction to nuts considered as asthma exacerbation, by another.
Characteristics of the Study Sample
There were 966 eligible ED visits in the total population and 731 in the GHP population
([Fig. 1]) after those excluded by chart review (no ED notes, n = 17; COPD, n = 11; coronary artery disease, n = 1; heart failure, n = 4; hospitalization, n = 1). The prevalence of asthma exacerbation in the source population meeting the
inclusion criteria, calculated using the EMR data, was 10.84% (95% confidence interval:
10.47–11.22). Mean age of the EMR sample was 22 years, mostly white (93%), never smoked
(56%), and female (64%) ([Table 1]). There were seasonal variations in the ED visits studied, with the most common
season being the fall, followed by spring. Overall claims sample was similar to the
EMR sample in age, sex, race, season, BMI, and smoking status.
Table 1
Selected demographic variables by asthma exacerbation status for EMR and claims data
Characteristics
|
Total (n = 966)
|
Asthma exacerbation (n = 497)
|
No asthma exacerbation (n = 469)
|
p-Value[a]
|
EMR data
|
Age at ED visit (mean, SD in y)
|
22.04 (10.56)
|
21.08 (10.97)
|
23.05 (10.03)
|
0.004
|
Age at ED visit (n, column %)
|
≤ 18 y
|
374 (38.72)
|
211 (42.45)
|
163 (34.75)
|
0.014
|
> 18 y
|
592 (61.28)
|
286 (57.55)
|
306 (65.25)
|
Male (n, %)
|
351 (36.34)
|
195 (39.24)
|
156 (33.26)
|
0.05
|
White (n, %)
|
898 (92.96)
|
461 (92.76)
|
437 (93.18)
|
0.80
|
Season (n, column %)
|
Summer
|
203 (21.01)
|
82 (16.50)
|
121 (25.80)
|
0.003
|
Fall
|
334 (34.58)
|
176 (35.41)
|
158 (33.69)
|
Winter
|
172 (17.81)
|
101 (20.32)
|
71 (15.14)
|
Spring
|
257 (26.60)
|
138 (27.77)
|
119 (25.37)
|
Has GHP insurance at ED visit (n, %)
|
751 (77.74)
|
371 (74.65)
|
380 (81.02)
|
0.02
|
Asthma principal diagnosis (n, %)
|
296 (30.64)
|
267 (53.72)
|
29 (6.18)
|
< 0.001
|
Asthma secondary diagnosis (n, %)
|
326 (33.75)
|
116 (23.34)
|
210 (44.78)
|
< 0.001
|
Respiratory complaint (n, %)
|
803 (83.13)
|
490 (98.59)
|
313 (66.74)
|
< 0.001
|
Bronchodilator use (n, %)
|
459 (47.52)
|
366 (73.64)
|
93 (19.83)
|
< 0.001
|
Corticosteroids (n, %)
|
298 (30.85)
|
262 (52.72)
|
36 (7.68)
|
< 0.001
|
Bronchodilator use or corticosteroids (n, %)
|
560 (57.97)
|
440 (88.53)
|
120 (25.59)
|
< 0.001
|
Initial respiratory rate at ED visit[b] (mean, SD, /min)
|
19.13 (3.60)
|
20.05 (3.82)
|
18.14 (3.06)
|
< 0.001
|
Initial pulse rate at ED visit[c] (mean, SD, /min)
|
97.99 (20.69)
|
102.33 (21.07)
|
93.30 (19.21)
|
< 0.001
|
Initial temperature at ED visit[d] (mean, /min)
|
98.38 (0.99)
|
98.46 (1.07)
|
98.31 (0.89)
|
0.02
|
BMI[e] (kg/m2), mean (SD)
|
28.28 (10.05)
|
28.23 (10.08)
|
28.33 (10.02)
|
0.88
|
Smoking[f] (n, column %)
|
Never smoker
|
520 (55.97)
|
291 (60.75)
|
229 (50.89)
|
0.009
|
Past smoker
|
150 (16.15)
|
67 (13.99)
|
83 (18.44)
|
Current smoker
|
259 (27.88)
|
121 (25.26)
|
138 (30.67)
|
Claims data
|
Total (
n
= 731)
|
Asthma exacerbation (
n
= 367)
|
No asthma exacerbation (
n
= 364)
|
p
-Value[a]
|
Age at ED visit (mean, SD in y)
|
22.24 (10.46)
|
20.85 (10.77)
|
23.65 (9.96)
|
< 0.001
|
Age at ED visit (n, column %)
|
≤ 18 y
|
283 (38.71)
|
162 (44.14)
|
121 (33.24)
|
0.002
|
> 18 y
|
448 (61.29)
|
205 (55.86)
|
243 (66.76)
|
Male (n, %)
|
260 (35.57)
|
138 (37.60)
|
122 (33.52)
|
0.25
|
White (n, %)
|
682 (93.30)
|
343 (93.46)
|
339 (93.13)
|
0.86
|
Season (n, column %)
|
Summer
|
161 (22.02)
|
61 (16.62)
|
100 (27.47)
|
0.002
|
Fall
|
246 (33.65)
|
123 (33.51)
|
123 (33.79)
|
Winter
|
116 (15.87)
|
64 (17.44)
|
52 (14.29)
|
Spring
|
208 (28.45)
|
119 (32.43)
|
89 (24.45)
|
Asthma principal diagnosis (n, %)
|
229 (31.33)
|
226 (61.58)
|
3 (0.82)
|
< 0.001
|
Asthma secondary diagnosis (n, %)
|
108 (14.77)
|
62 (16.89)
|
46 (12.64)
|
0.11
|
Nebulization (n, %)
|
315 (43.09)
|
273 (74.39)
|
42 (11.54)
|
< 0.001
|
Corticosteroids (n, %)
|
213 (29.14)
|
185 (50.41)
|
28 (7.69)
|
< 0.001
|
Nebulization or steroids (n, %)
|
372 (50.89)
|
310 (84.47)
|
62 (17.03)
|
< 0.001
|
BMI[g] (kg/m2), mean (SD)
|
28.27 (9.75)
|
28.12 (9.88)
|
28.42 (9.64)
|
0.69
|
Smoking[h] (n, column %)
|
Never smoker
|
411 (57.89)
|
229 (63.97)
|
182 (51.70)
|
0.003
|
Past smoker
|
116 (16.34)
|
54 (15.08)
|
62 (17.61)
|
Current smoker
|
183 (25.77)
|
75 (20.95)
|
108 (30.68)
|
Abbreviations: BMI, body mass index; ED, emergency department; EMR, electronic medical
record; GHP, Geisinger Health Plan; SD, standard deviation.
a
t-Test for continuous variables and chi-square test for categorical variables comparing
asthma exacerbation and no asthma exacerbation.
b Sixteen missing values.
c Twenty-two missing values.
d Fourteen missing values.
e Ninety-eight missing values.
f Thirty-seven missing values.
g Seventy-one missing values.
h Twenty-one missing values.
Fig. 1 Study flowchart.
The asthma exacerbation group in the EMR sample compared with the no asthma exacerbation
group was younger, less likely to have the ED visit in the summer, and more likely
to be a never smoker. The distribution of those with asthma exacerbation was slightly
different for EMR and claims sample, with those in claims being younger, female, with
more spring visits, and less current smokers compared with EMR. Asthma principal diagnosis
was more likely in the asthma exacerbation EMR sample while secondary diagnosis was
more likely in the no asthma exacerbation EMR sample. While 6% of EMR sample with
asthma principal diagnosis did not have asthma exacerbation, less than 1% of claims
sample with principal diagnosis did not have asthma exacerbation. In the EMR sample,
most of those with asthma exacerbation had a respiratory complaint (99%), while three-quarters
of them (74%) had bronchodilator use and half of them had corticosteroid use (53%)
and principal diagnosis (54%). The initial respiratory and pulse rates and temperature
were higher in the asthma exacerbation group (p < 0.05 for all). In the claims sample, three-quarters of the asthma exacerbation
group had nebulization, while two-thirds had principal diagnosis and half of them
had corticosteroid use.
Concordance between EMR and Claims Data
We evaluated the concordance in documentation between EMR and claims data among the
overlapping 731 ED visits ([Supplementary Table S3], available in the online version). There was 96% concordance for principal diagnosis
(perfect agreement using κ statistic); 38% (n = 10) of those with principal diagnosis present in EMR but missed by claims had asthma
exacerbation while 100% (n = 6) for vice versa. There was 77% concordance for secondary diagnosis (moderate
agreement); 30% (n = 50) of those with secondary diagnosis in EMR and missed by claims had asthma exacerbation
while 100% (n = 4) for vice versa. There was 84% concordance for nebulization (substantial agreement);
74% (n = 50) of those with nebulization use in EMR but missed by claims had asthma exacerbation
while 92% (n = 44) for vice versa. There was 91% concordance for corticosteroids (substantial
agreement); 87% (n = 39) of those with steroids use in EMR but missed by claims had asthma exacerbation
while 75% (n = 15) for vice versa.
Unadjusted and Adjusted Associations of Predictors
All unadjusted predictors evaluated that showed an association with asthma exacerbation-related
ED visit in both claims and EMR data are shown in [Table 2], with most of them showing a positive association. In the adjusted analysis, principal
diagnosis had the highest odds ratio (OR) among the different predictors in EMR and
claims data, followed by bronchodilator use. Among the vital signs, the odds of asthma
exacerbation-related ED visit increased by 12% for every breath/minute increase in
the initial respiratory rate (OR = 1.12 [1.05, 1.19]).
Table 2
Unadjusted and adjusted associations[a] by survey logistic regression[b] of selected predictor variables with asthma exacerbation-related ED visit
Characteristics–EMR data
|
Unadjusted association OR (95% CI)
|
Adjusted association OR (95% CI)
|
Asthma principal diagnosis
|
46.19 (31.61, 67.50)
|
29.16 (14.97, 56.81)
|
Asthma secondary diagnosis
|
0.70 (0.58, 0.83)
|
0.85 (0.53, 1.38)
|
Respiratory complaint
|
68.37 (34.92, 133.87)
|
6.09 (2.30, 16.14)
|
Bronchodilator use
|
41.05 (30.73, 54.84)
|
23.67 (12.99, 43.11)
|
Corticosteroids
|
21.24 (13.92, 32.40)
|
6.06 (3.34, 11.01)
|
Initial respiratory rate
|
1.29 (1.22, 1.37)
|
1.12 (1.05, 1.19)
|
Initial pulse rate
|
1.03 (1.02, 1.04)
|
1.01 (1.00, 1.02)
|
Initial temperature
|
1.22 (1.05, 1.41)
|
1.24 (0.98, 1.57)
|
Smoking
|
Never smoker
|
Reference
|
Reference
|
Past smoker
|
0.66 (0.44, 0.98)
|
0.57 (0.26, 1.23)
|
Current smoker
|
0.56 (0.41, 0.77)
|
0.66 (0.41, 1.05)
|
Characteristics–Claims data
|
Unadjusted association OR (95% CI)
|
Adjusted association OR (95% CI)
|
Asthma principal diagnosis
|
401.88 (228.34, 707.31)
|
305.56 (158.11, 590.53)
|
Asthma secondary diagnosis
|
1.86 (1.47, 2.35)
|
6.63 (4.53, 9.70)
|
Nebulization
|
45.92 (38.55, 54.69)
|
32.88 (25.39, 42.58)
|
Corticosteroids
|
19.75 (15.16, 25.73)
|
4.39 (2.98, 6.46)
|
Abbreviations: CI, confidence interval; ED, emergency department; EMR, electronic
medical record; OR, odds ratio.
a Adjusted associations are for each variable in the final model or added to the final
model. Final model variables were respiratory complaints (like cough, congestion,
wheezing, short of breath, etc.), principal diagnosis, any short-acting bronchodilator,
and steroid (oral or intravenous) use in the ED for EMR data; and principal and secondary
diagnoses, nebulization, and steroid (oral or intravenous) use in the ED for claims
data.
b Survey logistic regression was used for the analysis; weighted based on sampling
proportions; accounted for stratified survey sampling.
Receiver Operating Characteristic Curves
EMR Data
We evaluated the AUC for five receiver operating characteristic (ROC) curves with
different combinations of commonly used predictors of asthma exacerbation-related
ED visit in the EMR, and the final base model chosen was respiratory complaint, primary
diagnosis, any short-acting bronchodilator use, and oral or intravenous steroids ([Fig. 2]). The model with initial respiratory rate in addition to the base model had statistically
significantly better performance than the base model (p = 0.04) with ROC for the former being 0.93 (0.92, 0.95) versus 0.93 (0.91, 0.94)
for the latter. At a probability cut-off of 0.55 for the former model, sensitivity
was 83.98%, specificity was 82.93%, PPV was 84.15%, and NPV was 82.75%. However, given
that the gains in AUC were minimal when compared with the difficulty of obtaining
initial vital sign for routine implementation purposes when an algorithm needs to
be used in a large scale, we decided to use the simpler latter model as the final
model. Different probability cut-offs were evaluated for the final model to optimize
the sensitivity and specificity ([Table 3]). Probability cut-off of 0.5 had the highest sensitivity of 95.57% and 0.7 had the
highest PPV of 94.15%. The regression equation for the final model was: probability
of asthma exacerbation =
Table 3
Sensitivity, specificity, positive predictive value, and negative predictive value
for the final models in EMR for different asthma exacerbation probability cut-offs
Classification function
|
Probability cut-offs for EMR data final model
|
0.5
|
0.6
|
0.7
|
Sensitivity
|
95.57%
|
70.62%
|
61.57%
|
Specificity
|
73.56%
|
90.62%
|
95.95%
|
Positive predictive value
|
79.30%
|
88.86%
|
94.15%
|
Negative predictive value
|
94.01%
|
74.43%
|
70.20%
|
Abbreviation: EMR, electronic medical record.
Fig. 2 Receiver operating characteristic (ROC) curves for electronic medical record data.
A nomogram was constructed to calculate the predicted probability of asthma exacerbation-related
ED visit ([Fig. 3]). To interpret the nomogram, first the score for each variable in the model should
be calculated by lining up the value for the variable with the score scale in the
upper part of the nomogram, and then all the scores should be summed. The resulting
total score should be matched to the probability of asthma exacerbation in the lower
part of the nomogram.[51] Binary probability cut-off to indicate asthma exacerbation can be chosen for each
study based on the desired test statistics of sensitivity, specificity, and predictive
values as in [Table 3]. Each patient's probability of asthma exacerbation obtained from the nomogram can
finally be converted to “yes or no” for asthma exacerbation based on the predetermined
probability cut-off.
Fig. 3 Nomogram for electronic medical record data.
Claims Data
Multiple models conventionally used in studies were evaluated for claims data also,
and the final base model chosen was principal and secondary diagnoses, bronchodilator,
and oral or intravenous steroid use ([Fig. 4]). An extended model with centered age, squared centered age, race, and season as
nonordinal variable in addition to the base model was the best model (p = 0.046), with ROC for final model being 0.94 (0.93, 0.96) versus 0.95 (0.93, 0.96)
for the extended model. Again, for the sake of simplicity for large-scale implementation,
we did not choose the extended model for the final model. Different probability cut-offs
were evaluated for the final model, with 0.5 being the most sensitive model at 91%
and 0.6 having the highest PPV at 95% ([Table 4]). Nomogram was constructed to calculate the predicted probability of asthma exacerbation-related
ED visit ([Fig. 5]).
Fig. 4 Receiver operating characteristic (ROC) curves for claims data.
Fig. 5 Nomogram for claims data.
Table 4
Sensitivity, specificity, positive predictive value, and negative predictive value
for the final models in claims for different asthma exacerbation probability cut-offs
Classification function
|
Probability cut-offs for claims data final model
|
0.5
|
0.55
|
0.6
|
Sensitivity
|
91.28%
|
88.83%
|
73.57%
|
Specificity
|
87.36%
|
88.19%
|
96.15%
|
Positive predictive value
|
87.93%
|
88.35%
|
95.07%
|
Negative predictive value
|
90.86%
|
88.67%
|
78.30%
|
The regression equation for the final model was: probability of asthma exacerbation =
Sensitivity Analysis
Sensitivity analysis did not change the algorithm performance. Using nebulizer administration
instead of any short-acting bronchodilator use for EMR data, reduced the performance
of the EMR model nonsignificantly with AUC of 0.92 versus 0.93 (p = 0.14). Using same-day claim files along with ED visit claims did not improve the
performance of the claims algorithm—AUC of 0.94 for both algorithms, p = 0.91 for the difference.
Comparison of EMR and Claims Data Algorithms
We compared the final models from EMR and claims data and there was no statistically
significant difference, p = 0.54 ([Fig. 6]). AUC for the model with primary diagnosis only was better for claims data (0.80,
95% confidence interval [0.78, 0.83]) compared with EMR data (0.79 [0.76, 0.81]),
p = 0.03. AUC for the model with primary and secondary diagnoses only was also better
for claims data (0.84 [0.81, 0.87]) compared with EMR data (0.77 [0.74, 0.80]), p ≤ 0.001. There was no statistically significant difference in AUC for the model with
bronchodilator and steroid only based on EMR (0.88 [0.85, 0.90]) and claims (0.87
[0.84, 0.89]), p = 0.45. Similarly, no difference was found for model with bronchodilator only. However,
AUC for the model with steroid only was better for EMR (0.74 [0.72, 0.77]) than claims
(0.71 [0.68, 0.74]), p = 0.004.
Fig. 6 Comparison of receiver operating characteristic (ROC) curves of the final model for
electronic medical record (EMR) and claims data.
Discussion
We present the largest study to our knowledge validating algorithms to identify asthma
exacerbation-related ED visit in pediatric and adult population with separate algorithms
for EMR and claims data. These algorithms use readily available inputs like chief
complaints, diagnosis codes, and medication use. Depending on the proposed utility
of the algorithm, different probability cut-offs can be set. While population-based
epidemiology studies and quality improvement projects require high sensitivity, performance
metrics for health system and genetic epidemiology studies require algorithms with
high PPV and specificity. We have provided nomograms which can be utilized with different
probability cut-offs for these different purposes. As examples, genetic association
study of asthma exacerbation or a performance metric for asthma care will need a high
PPV of at least 90%[52] and the possible probability cut-offs for the algorithms would be 0.7 for EMR and
0.6 for claims data to identify asthma exacerbation; however, for a quality improvement
project involving escalation of care for asthmatics with an asthma exacerbation leading
to ED visit, high sensitivity would be the priority so that most of the eligible patients
would receive the needed care; in this situation, probability cut-off of 0.5 might
be more appropriate for EMR and claims data. The algorithm could also be integrated
into a clinical decision support with a probability cut-off of 0.5, for managing asthma
patients with respiratory complaints in an outpatient setting to initiate aggressive
management, as a patient with a previous history of asthma exacerbation is at high
risk of another exacerbation requiring ED visit.
The only previous algorithms to identify asthma exacerbation leading to ED visit were
both performed in pediatric population[31]
[33] and ours involves both adult and pediatric age ranges. Regarding the performance
of our algorithm, as it pertains to generalizability, sensitivity and specificity
are fixed test characteristics. However, the predictive values are dependent on the
prevalence of asthma exacerbation, which is variable by the region of the United States,
ranging from 43.2% in the Northeast, 44.7% Midwest to 48.8% in the South and West.[53] Our study is the first in the Northeast, and has better predictive value for the
same sensitivity (94% vs. 79%) even though our asthma attack prevalence is the lowest
of the regions of the United States.[31] Our health system also encounters predominantly rural[54] and white population,[9] which has been shown in previous studies to have a lower prevalence of asthma exacerbation
and, therefore, could lower the PPV of our algorithm. Despite the availability of
effective preventive therapy and guidelines, asthma management and control in the
United States is unsatisfactory and costs associated with asthma are increasing.[55]
[56] It has been shown that decision support tools, feedback and audit, and clinical
pharmacy support were crucial to encourage adherence to asthma guidelines.[57] Our algorithm with acceptable PPV helps to identify the high-risk patients, making
these tools possible, and has the potential to improve asthma management.
EMR and claims data each have their advantages and disadvantages, while the former
is able to provide better clinical data like vital signs in asthma exacerbation, the
latter is able to provide a complete picture of a patient's care. The concordance
between EMR- and claims-based variables for asthma exacerbation identification was
variable, with > 90% concordance for primary diagnosis and corticosteroids. When we
analyzed how predictive a characteristic that was present in one of the data sets
(EMR or claims) but missed by the other was, claim-positive, EMR-negative subset had
higher rates of asthma exacerbation for principal diagnosis, secondary diagnosis,
and bronchodilator, while EMR-positive, claims-negative subset, had for steroids.
Secondary asthma diagnosis was not predictive in the EMR data but predictive in the
claims data. Previous studies have compared EMR to claims data and have found variable
concordance based on medication or disease condition.[58]
[59] Models that use diagnostic codes only to predict asthma exacerbation performed better
in claims data than EMR, while steroid-only model performed better in EMR data, albeit
the AUC of such models were inferior to our final model. This shows the strengths
and weaknesses of data elements from EMR and claims data. The final model from EMR
and claims were not statistically different. The AUC for the final model for EMR improved
from 0.93 (0.91, 0.94) to 0.95 (0.93, 0.96) when the sample used to derive the models
changed to a restricted sample with overlapping data. Even though the AUC changed,
the change is within acceptable limits of random variability as assessed by the 95%
confidence intervals which showed overlapping intervals.
Strengths of our study include separate algorithms for EMR and claims data, validation
against a large gold standard population, inclusion of both pediatric and adult population,
using appropriate statistical methods to refer to the source population to calculate
all performance metrics, and creation of nomograms for use by others. Our study is
not without limitations in that the diagnostic codes we used for the algorithm were
ICD-9 codes, which at the current time makes our algorithm relevant only for retrospective
work, if used as studied. Given the current standard of using ICD-10 codes, the asthma
code ICD-10 code J45.xx should be used instead of ICD-9 code 493.xx when using the
algorithm, to make it relevant for prospective work. Also, our gold standard did not
use peak flow data to confirm asthma exacerbation but we used extensive clinical data
to classify our gold standard and validated this.[34] We also oversampled ED visits with classifiers of interest, but we were able to
calculate estimates of classification function in the source population, making our
predictive values generalizable to an ED asthma population similar to ours, which
is predominantly rural and white. We also did not explore data mining and analytics-based
techniques for classification such as decision trees, Support Vector Machines, or
newer advanced techniques that may provide better performance, and these could be
evaluated in future studies. The agreement between EMR and claims data for various
data elements could have been influenced by (1) our EMR system, as billing is highly
integrated with EMR for Epic Systems Corporation-based software, and (2) by our organization
structure which is an integrated health system providing both health care and insurance,
influencing generalizability as it pertains to comparison of EMR and claims data elements.
Conclusion
We were able to create an algorithm based on respiratory complaints listed in chief
complaints, principal diagnosis, any short-acting bronchodilator, and steroid (oral
or intravenous) use in the ED for EMR data, with an excellent PPV of 94%; and another
based on principal and secondary diagnoses, nebulization, and steroid (oral or intravenous)
use in the ED for claims data, with an excellent PPV of 95%. We also provide different
probability cut-offs to provide a sensitive or specific algorithm based on the needs
of a research study or quality improvement program.
Clinical Relevance Statement
Clinical Relevance Statement
Asthma exacerbation leading to ED visit is prevalent and preventable. Identification
of these asthma exacerbations using EMR and claims data has the potential to improve
morbidity in asthma by integrated care and better research. This study provides separate
algorithms to identify asthma-related ED visit using EMR and claims data with high
positive predictive value using readily available data elements like chief complaints,
diagnostic codes, and medications.
Multiple Choice Question
What was the purpose of nomogram in this study?
-
To provide a mathematical formula for calculating predicted probability of asthma
exacerbation in the ED.
-
To provide a visual approach for calculating predicted probability of asthma exacerbation
in the ED.
-
To provide a mathematical formula for calculating area under the curve for the final
model to predict asthma exacerbation in the ED.
-
To provide a visual approach for calculating area under the curve for the final model
to predict asthma exacerbation in the ED.
Correct Answer: The correct answer is option b, to provide a visual approach for calculating predicted
probability of asthma exacerbation in the ED. Nomograms were constructed in this study
to provide a visual approach for calculating predicted probabilities of asthma exacerbation-related
ED visit, although traditionally they have used in medicine to estimate prognosis.
The advantage of nomogram is the ability to calculate probabilities based on individual
patient's characteristics of different predictor variables. A nomogram can also be
integrated into clinical decision support tools and utilized for clinical care. To
interpret the nomogram, first the score for each variable in the model should be calculated
by lining up the value for the variable with the score scale in the upper part of
the nomogram, and then all the scores should be summed. The resulting total score
should be matched to the probability of asthma exacerbation in the lower part of the
nomogram. Binary probability cut-off to indicate asthma exacerbation can be chosen
for each project based on the desired test statistics of sensitivity, specificity,
and predictive values. Each patient's probability of asthma exacerbation obtained
from the nomogram can finally be converted to “yes or no” for asthma exacerbation
based on the predetermined probability cut-off.