Keywords
electronic health record - asthma - clinical decision support
Background and Significance
Background and Significance
Asthma is a major chronic health condition in the pediatric population, associated with substantial morbidity and mortality, reductions in quality of life, increased costs, and health disparities.[1]
[2]
[3] The National Heart, Lung, and Blood Institute pediatric asthma guidelines stress personalized management with a step-up and step-down approach based on symptom control.[4] Tools such as the Asthma Control Test (ACT) are used in clinical practice to assess control of asthma symptoms, but asthma tools have variable diagnostic performance and may not predict future exacerbations.[5]
[6] Stratification of asthma complication risk is an important step in asthma management. A wide variety of computerized or machine learning-based approaches for asthma risk stratification and complication prediction have been studied, which range from simple classification and regression tree models to more complex machine learning models such as neural networks.[7]
[8]
[9]
[10]
[11]
[12]
However, there are many social, technical, and ethical challenges associated with implementation of computerized prediction models that can reduce the effectiveness of models in clinical settings. Many studies of predictive model performance are conducted on retrospective cohorts, rather than prospective implementation in real-world clinical settings. Even when outcomes have been studied, some models have shown mixed utility in improving the primary clinical outcome. For example, Seol et al implemented an artificial intelligence model for predicting asthma exacerbations, which did not significantly reduce exacerbations as compared with control, although it had other beneficial effects such as improving provider efficiency in chart review.[13] Another major concern is that biased machine learning models may worsen disparities for patients of certain sex, ethnicity, or race.[14]
[15] Model generalizability across different clinical settings can also vary and should be evaluated for models intended for wide deployment.
Many proprietary predictive models for a variety of clinical use cases are now offered by electronic health record (EHR) vendors. These commercial models have the potential for wide uptake by health care organizations due to their ease of implementation. However, external validation of commercial models, such as the Epic Sepsis Model, has highlighted significant model performance and usefulness issues.[16] Additionally, documentation on model training and bias evaluations is often limited for these commercial models.[17]
Epic Systems released a proprietary model which predicts the risk of asthma exacerbation in children. In this study, we aimed to determine if implementation of this model would lead to a reduction in asthma exacerbation outcomes relative to patient encounters in which the model was not available. We implemented this model through a difference-in-differences design for an intervention group consisting of volunteer pediatric pulmonology and allergy providers in an outpatient setting at Children's Healthcare of Atlanta (CHOA) beginning in February 2021, with the control group consisting of other providers in those same departments, for whom the model was not implemented. We also assessed qualitative barriers and facilitators to provider use of this model.
Methods
Setting
This project was conducted in a large, combined academic and community tertiary care pediatric health system in the Southeastern United States with three freestanding children's hospitals and comprehensive outpatient primary and specialty care.
Predictive Model
The Epic Risk of Pediatric Asthma Exacerbation model (Epic Systems, Verona, Wisconsin, United States) is a pretrained, logistic regression model which generates a numeric risk score correlating to the probability that a patient will have an asthma exacerbation within the subsequent 90 days of the prediction time. The vendor definition of an asthma exacerbation included any asthma-related hospital admission or visit to the emergency department (ED), or prescription of dexamethasone or a 3-day or longer course of systemic steroids.[18]
This predictive model was initially implemented in the background (i.e., filing scores to Epic, but not shown to clinical users) for all outpatient encounters. While the model itself was developed by the vendor (including aspects such as feature selection and model training), local technical staff deployed the model within the EMR instance. Epic Systems calculated and provided predictive performance characteristics (see “Predictive Performance”) based on 37,915 local encounters (including allergy, pulmonary, and primary care visits) using proprietary tools.
Subsequently, the model was implemented for six volunteer clinical users in allergy and pulmonology clinics using a noninterruptive approach. The opt-in, provider-facing user interface consisted of an additional column visible within the provider's scheduling view of patients ([Fig. 1]). This column contained the numeric risk score, as well as an expanded model interpretation view available on mouse hover, which highlighted the patient-specific risk factors contributing to the risk score. While the risk score was visible to providers using the column, due to technical limitations, this model did not file scores in our local database for assessment of predictive performance by the study team.
Fig. 1 The model user interface is composed of a risk score column on the patient schedule view. On mouse hover over a score, a pop-up visual displays factors included in the score calculation for a given patient.
Quasiexperimental Design
A quasi-experimental design was used to study the effect of the predictive model intervention: first, six volunteer clinical providers within the allergy and pulmonology departments were recruited between February 2021 and April 2021 to participate in the intervention group. Volunteer members of this group had either individually expressed interest in participating to the authors or were recruited by word of mouth and referrals from existing members of the volunteer group. Training material on how to use the model was shared with these providers in approximately 30-minute synchronous sessions, when they also added the risk score column to their EHR scheduling view. The control group consisted of clinical providers, except for those in the intervention group, within the allergy and pulmonology departments. The study was conducted on asthma visits from February 24, 2019 to February 23, 2022, split into four groups by time period and intervention group (i.e., intervention-pre and intervention-post; control-pre and control-post). This project was deemed to be nonhuman subjects research as a quality improvement initiative by the CHOA IRB (identifier: STUDY00000905).
Data Collection and Analysis
The unit of analysis was the outpatient visits in the allergy or pulmonology clinics for patients with asthma on their problem list. Encounter-level data were extracted from the Epic Clarity database using a custom query. We excluded visits with missing height or weight, age <2 years (where asthma diagnosis is often uncertain and body mass index [BMI] cannot be calculated), unknown ethnicity, and unknown insurance status. The primary outcomes were assessed 90 days after the index visit and included (1) any CHOA hospitalization where asthma was the principal diagnosis, (2) any CHOA ED visit with asthma as a diagnosis, (3) any outpatient prescription for oral steroids, and (4) a composite outcome of any of the above. Covariates included: age in years, the clinic specialty for the encounter (allergy or pulmonology), the maximum documented asthma severity (mild intermittent, mild persistent, moderate persistent, severe persistent, or unknown), sex, race (white, black, Asian, other, or unknown), ethnicity (Hispanic, Non-Hispanic, or unknown), insurance status (public, private), and BMI (categorized as normal [<85th percentile], overweight [85th–95th percentile], and obese [≥95th percentile]).
Population characteristics and outcome rates were compared across four groups: intervention group preimplementation; intervention group postimplementation; no intervention group preimplementation, and no intervention group postimplementation. For categorical variables, significance was assessed using Χ2 tests, and one-way ANOVA (analysis of variance) was used for continuous variables.
To assess the impact of the intervention on the primary outcomes, we first crudely estimated the difference-in-differences estimator (Eq. 1); using this design, we assessed the difference in the intervention group in the pre- and postimplementation timeframes while also accounting for the difference in the no intervention group in the pre- and postimplementation timeframes.
Formulas:
Eq. 1: General formula for difference-in-differences estimator
where Pr = proportion of encounters resulting in the outcome within 90 days
To adjust for potential confounders, we employed mixed-effects linear regression. While the outcomes were all dichotomous, linear regression was nonetheless selected to facilitate difference-in-difference analysis, where the difference-in-differences estimator is the coefficient associated with the interaction term between treatment groups (intervention and no intervention) and pre/postimplementation.[19] The patient was considered a random effect since an individual patient's asthma complication rates may depend on individual factors not captured in our data collection approach.
Finally, we visually evaluated each outcome over time in the intervention and no intervention groups to ensure parallel trends in the preimplementation phase to validate difference-in-differences assumptions.
User Interviews
Qualitative interviews were conducted individually with intervention group providers during the intervention period to better understand facilitators and barriers to provider use of the asthma exacerbation risk model. An interview guide was created for the qualitative interviews, which was derived from the Consolidated Framework for Implementation Research interview tool.[20] Interviews were not recorded or prescribed verbatim. Rather, during each interview, a member-checking method was used to verbally discuss findings shared by the participant. Themes were assessed from interview notes by two authors (A.M., E.O.) and mapped to the MITRE Human Machine Teaming framework.[21]
Results
Study Population and Baseline Characteristics
Of the 39,444 visits to allergy and pulmonology clinics during the study period where asthma was on the patient's problem list, we excluded 2,829 (7.2%) due to missing provider ID (611), weight or height missing (1,344), age <2 years (572), or ethnicity unknown (302) or missing data, yielding a final sample of 36,615 visits split into intervention preimplementation (3,842 visits), intervention postimplementation (2,165 visits), no intervention preimplementation (19,865 visits), and no intervention postimplementation (10,743 visits). Baseline characteristics are shown in [Table 1]. There were significant differences across the four groups in the distribution of age, sex, clinic specialty, asthma severity, white race, black race, unknown race, ethnicity, and insurance status.
Table 1
Baseline demographics by cohort
|
Intervention group, preimplementation, N = 3,842
|
Intervention group, postimplementation, N = 2,165
|
No intervention group, preimplementation,
N = 19,865
|
No intervention group, postimplementation,
N = 10,743
|
p
|
Age; mean (SD)
|
10.0 (4.6)
|
10.4 (4.7)
|
9.7 (4.6)
|
10.1 (4.6)
|
<0.001
|
Age (categorical); N (%)
|
<0.001
|
2–4 y
|
520 (13.4)
|
264 (12.2)
|
2,974 (15.0)
|
1,351 (12.6)
|
|
5–12 y
|
2,105 (54.8)
|
1,149 (53.1)
|
11,001 (55.4)
|
5,946 (55.4)
|
13–17 y
|
990 (25.8)
|
608 (28.1)
|
4,828 (24.3)
|
2,750 (25.6)
|
≥18 y
|
227 (5.9)
|
144 (6.7)
|
1,062 (5.4)
|
696 (6.5)
|
Sex; N (%)
|
0.004
|
Female
|
1,621 (42.2)
|
896 (41.4)
|
8,238 (41.5)
|
4,252 (39.6)
|
|
Male
|
2,221 (57.8)
|
1,269 (58.6)
|
11,627 (58.5)
|
6,491 (6.4)
|
Clinic specialty; N (%)
|
<0.001
|
Allergy
|
1,036 (27.0)
|
695 (32.1)
|
4,865 (24.5)
|
2,051 (19.1)
|
|
Pulmonology
|
2,806 (73.0)
|
1,470 (67.9)
|
15,000 (75.5)
|
8,692 (80.9)
|
Asthma severity; N (%)
|
<0.001
|
Mild intermittent
|
268 (7.0)
|
149 (6.9)
|
1,389 (7.0)
|
774 (7.2)
|
|
Mild persistent
|
677 (17.6)
|
246 (11.4)
|
4,593 (23.1)
|
2,646 (24.6)
|
Moderate persistent
|
1,831 (47.66)
|
1,246 (57.6)
|
7,919 (39.9)
|
4,369 (40.7)
|
Severe persistent
|
722 (18.8)
|
353 (16.3)
|
3,986 (20.1)
|
1,947 (18.1)
|
Unknown
|
344 (9.0)
|
171 (7.9)
|
1,978 (10.0)
|
1,007 (9.4)
|
Race[b]; N (%)
|
Black
|
1,969 (51.3)
|
1,066 (49.4)
|
10,445 (52.6)
|
5,378 (50.1)
|
<0.001
|
White
|
1,651 (43.0)
|
994 (45.9)
|
8,126 (40.9)
|
4,565 (42.5)
|
<0.001
|
Asian
|
110 (2.9)
|
76 (3.5)
|
670 (3.4)
|
403 (3.8)
|
0.06
|
Other
|
47 (1.2)
|
20 (0.9)
|
178 (0.9)
|
88 (0.8)
|
0.16
|
Unknown
|
668 (17.4)
|
296 (13.7)
|
3,753 (18.9)
|
1,810 (16.9)
|
<0.001
|
Ethnicity; N (%)
|
<0.001
|
Non-Hispanic
|
3,176 (82.3)
|
1,853 (85.6)
|
16,611 (83.6)
|
8,853 (82.4)
|
|
Hispanic
|
666 (17.3)
|
312 (14.4)
|
3,254 (16.4)
|
1,890 (17.6)
|
Insurance; N (%)
|
<0.001
|
Public
|
2,363 (61.5)
|
1,207 (55.8)
|
12,363 (62.2)
|
6,556 (61.0)
|
|
Private
|
1,479 (38.5)
|
958 (44.3)
|
7,502 (37.8)
|
4,187 (39.0)
|
BMI percentile; mean (SD)
|
69.2 (30.2)
|
68.2 (30.0)
|
68.7 (29.8)
|
69.1 (30.1)
|
0.40
|
BMI category; N (%)
|
0.17
|
Normal
|
2,131 (55.5)
|
1,236 (57.1)
|
11,383 (57.3)
|
6,036 (56.2)
|
|
Overweight
|
645 (16.8)
|
345 (15.9)
|
3,247 (16.3)
|
1,737 (16.2)
|
Obese
|
1,066 (27.8)
|
584 (27.0)
|
5,239 (26.4)
|
2,971 (27.7)
|
Outcomes within 90 days of index visit; N (%)
|
Hospitalization
|
54 (1.4)
|
14 (0.7)
|
171 (0.9)
|
105 (1.0)
|
0.006
|
ED visit
|
222 (5.8)
|
118 (5.5)
|
1,099 (5.5)
|
727 (6.8)
|
<0.001
|
Oral steroid
|
629 (16.4)
|
353 (16.3)
|
3,072 (15.5)
|
1,848 (17.2)
|
0.001
|
Composite
|
659 (17.2)
|
372 (17.2)
|
3,237 (16.3)
|
1,950 (18.2)
|
<0.001
|
Abbreviations: ED, emergency visit; SD, standard deviation.
Note: Statistically significant p-values are indicated in bold.
a Where not explicitly referenced, numbers in parenthesis refer to percentages.
b Some encounters included patients who identified as multiple races.
Predictive Performance
Predictive performance data were shared by Epic Systems on November 18, 2021 through a proprietary validation tool. The area under the receiver-operating characteristic was 0.73. At a score cutoff threshold of 5 (defined as a medium risk), 22.2% of the patient population with asthma were predicted to have an exacerbation (the composite outcome) within the prediction window of 90 days, yielding a sensitivity of 55.0%, a specificity of 78.8%, a positive predictive value (PPV) of 7.0%, and a negative predictive value (NPV) of 98.4%. At a score cutoff threshold of 20 (defined as high risk), 1.2% of the patient population were predicted to have an exacerbation within the prediction window, with a sensitivity of 8.6%, a specificity of 99.1%, a PPV of 20.8%, and a NPV of 97.4%.
Asthma Outcomes
Outcome proportions by treatment group and pre- versus postimplementation are shown in [Table 1] with run charts for each outcome in [Fig. 2]. In the intervention group prior to implementation, an asthma exacerbation requiring hospitalization occurred within 90 days in 1.4% of visits (54/3,842), which decreased to 0.7% (14/2,165) after implementation. By contrast, in the no-intervention group, the hospitalization rate increased slightly from 0.9% (171/19,865) to 1.0% (105/10,743). Similarly, the rate of asthma ED visits within 90 days of the index visit fell in the intervention group from 5.8% (222/3,842) to 5.5% (118/2,164), while it increased from 5.5% (1,099/19,865) to 6.8% (727/10,743) in the no-intervention group. Oral steroid courses were grossly unchanged in the intervention group (16.4% [629/3,842] preimplementation vs. 16.3% [353/2,165] postimplementation) but increased in the no-intervention group (15.5% [3,072/19,865] pre vs. 17.2% [1,848/10,743] post).
Fig. 2 Run chart for proportion of visits resulting in each asthma outcome (hospitalization (A), ED visit (B), oral steroid (C), and composite (D)) within 90 days of the index visit. ED, emergency department.
Univariable and multivariable analyses for all outcomes are shown in [Table 2], with each coefficient representing the absolute change in the percentage of patients having the outcome for the covariate of interest. The adjusted difference-in-differences estimator for asthma hospitalization was −0.93% (95% confidence interval [CI]: −1.47%, −0.38%), suggesting that implementation of the asthma predictive model was responsible for an absolute reduction in the hospitalization rate of 0.93%, yielding a number needed to treat (NNT) for prevention of 1 hospitalization of 108. The adjusted difference-in-differences estimator was also significant for a reduction in asthma ED visits (−1.44%, 95% CI: −2.74%, −0.15%, NNT = 69 per ED visit prevented). By contrast, for the oral steroid and composite outcomes, there was no significant change based on the intervention.
Table 2
Difference-in-differences regression analyses
Predictor
|
Hospitalization estimate (95% CI)
|
ED visit estimate (95% CI)
|
Steroid estimate (95% CI)
|
Composite estimate (95% CI)
|
Crude
|
Adjusted
|
Crude
|
Adjusted
|
Crude
|
Adjusted
|
Crude
|
Adjusted
|
Age (categorical)
|
2–4 y
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
5–12 y
|
−0.40 (−0.73, −0.07)
|
−0.65 (−0.98, −0.32)
|
−3.57 (−4.38, −2.75)
|
−4.51 (−5.33, −3.69)
|
−6.39 (−7.66, −5.13)
|
−7.56 (−8.82, − 6.29)
|
−6.46 (−7.76, −5.17)
|
−7.76 (−9.06, −6.47)
|
13–17 y
|
−0.64 (−1.02, −0.27)
|
−1.09 (−1.48, −0.71)
|
−5.70 (−6.64, −4.76)
|
−7.13 (−8.08, −6.18)
|
−9.23 (−10.69, −7.77)
|
−11.44 (−12.91, −9.97)
|
−9.59 (−11.08, −8.09)
|
−11.95 (−13.46, −10.44)
|
≥18 y
|
−0.8(−1.35, −0.26)
|
−1.22 (−1.77, −0.67)
|
−6.68 (−8.04, −5.33)
|
−7.94 (−9.30, −6.58)
|
−8.44 (−10.55, −6.33)
|
−10.63 (−12.73, −8.53)
|
−9.29 (−11.45, −7.14)
|
−11.58 (−13.73, −9.44)
|
Sex
|
Female
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Male
|
−0.23 (−0.47, 0.02)
|
−0.27 (−0.51, −0.03)
|
−0.19 (−0.82, 0.44)
|
−0.50 (−1.11, 0.12)
|
−1.11 (−2.10, −0.11)
|
−1.38 (−2.33, −0.42)
|
−1.12 (−2.14, −0.11)
|
−1.44 (−2.42, −0.46)
|
Asthma severity
|
Mild intermittent
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Mild persistent
|
0.20 (−0.23, 0.67)
|
0.17 (−0.31, 0.64)
|
0.50 (−0.69, 1.69)
|
0.40 (−0.78, 1.57)
|
1.90 (0.34, 4.00)
|
1.35 (−0.47, 3.17)
|
1.92 (0.04, 3.79)
|
1.16 (−0.70, 3.30)
|
Mod persistent
|
0.38 (−0.06, 0.83)
|
0.42 (−0.03, 0.87)
|
1.39 (0.27, 2.51)
|
1.80 (0.69, 2.92)
|
4.13 (2.40, 5.86)
|
4.26 (2.53, 5.98)
|
4.28 (2.51, 6.05)
|
4.47 (2.71, 6.24)
|
Severe persistent
|
2.71 (2.18, 3.24)
|
2.76 (2.23, 3.30)
|
6.47 (5.12, 7.81)
|
7.10 (5.76, 8.45)
|
16.48 (14.40, 18.56)
|
18.33 (16.24, 20.43)
|
16.44 (14.31, 18.58)
|
18.23 (16.08, 20.37)
|
Unknown
|
0.46 (−0.08, 1.00)
|
0.55 (0.00, 1.09)
|
0.72 (−0.64, 2.07)
|
1.40 (0.06, 2.74)
|
2.19 (0.09, 4.28)
|
2.41 (0.33, 4.49)
|
2.07 (−0.07, 4.21)
|
2.44 (0.31, 4.57)
|
Race
|
Black
|
0.67 (0.43, 0.91)
|
0.19 (−0.33, 0.72)
|
2.69 (2.07, 3.30)
|
0.50 (−0.83, 1.82)
|
0.41 (−0.47, 1.49)
|
−2.12 (−4.18, −0.06)
|
1.14 (0.14, 2.14)
|
−1.55 (−3.66, 0.55)
|
White
|
−0.51 (−0.75, −0.26)
|
−0.10 (−0.58, 0.39)
|
−2.43 (−3.05, −1.80)
|
−1.41 (−2.64, −0.18)
|
−0.05 (−1.04, 0.93)
|
−1.12 (−3.04, 0.79)
|
−0.55 (−1.55, 0.46)
|
−0.99 (−2.95, 0.97)
|
Asian
|
−0.39 (−1.04, 0.25)
|
−0.10 (−0.85, 0.66)
|
−2.48 (−4.11, −0.84)
|
−1.91 (−3.81, −0.01)
|
−1.78 (−4.36, 0.79)
|
−2.42 (−5.38, 0.54)
|
−2.39 (−5.03, 0.25)
|
−2.57 (−5.60, 0.46)
|
Other
|
0.10 (−1.19, 1.38)
|
0.40 (−0.91, 1.70)
|
2.44 (−0.84, 5.73)
|
3.00 (−0.64, 5.91)
|
0.11 (−5.08, 5.30)
|
2.38 (−2.73, 7.49)
|
0.82 (−4.49, 6.13)
|
3.18 (−2.06, 8.41)
|
Unknown
|
−0.49 (−0.81, −0.17)
|
−0.36 (−0.73, −0.00)
|
−2.11 (−2.94, −1.29)
|
1.33 (0.34, 2.32)
|
−2.07 (−3.37, −0.77)
|
−1.53 (−2.96, −0.11)
|
−2.42 (−3.75, −1.10)
|
−1.69 (−3.15, −0.23)
|
Ethnicity
|
Non-Hispanic
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Hispanic
|
−0.29 (−0.62, 0.03)
|
0.05 (−0.34, 0.45)
|
−.16 (−0.67, 0.99)
|
1.33 (0.34, 2.32)
|
−1.34 (−2.65, −0.04)
|
−1.33 (−2.87, 0.22)
|
−1.29 (−2.63, 0.04)
|
−1.03 (−2.61, 0.56)
|
Insurance
|
Private
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Public
|
0.48 (0.23, 0.72)
|
0.12 (−0.16, 0.40)
|
2.49 (1.86, 3.12)
|
0.79 (0.09, 1.49)
|
0.89 (−0.10, 1.88)
|
0.38 (−0.71, 1.47)
|
1.49 (0.48, 2.51)
|
0.68 (−0.43, 1.80)
|
BMI category
|
Normal
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Overweight
|
0.85 (−0.30, 2.01)
|
−0.11 (−0.40, 0.19)
|
0.41 (−0.32, 1.15)
|
0.28 (−0.45, 1.00)
|
0.70 (−0.43, 1.84)
|
0.78 (−0.33, 1.90)
|
0.85 (−0.30, 2.01)
|
0.88 (−0.26, 2.02)
|
Obese
|
0.52 (−0.59, 1.61)
|
0.21 (−0.07, 0.48)
|
0.47 (−0.22, 1.16)
|
0.21 (−0.48, 0.89)
|
0.44 (−0.64, 1.52)
|
0.68 (−0.39, 1.74)
|
0.51 (−0.59, 1.61)
|
0.63 (−0.46, 1.72)
|
Clinic specialty
|
Allergy
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Ref
|
Pulmonology
|
−0.47 (−0.76, −0.19)
|
−0.22 (−0.52, 0.09)
|
−2.65 (−3.35, −1.93)
|
−2.00 (−2.71, −1.24)
|
1.52 (0.40, 2.63)
|
1.79 (0.61, 2.96)
|
0.65 (−0.49, 1.79)
|
1.07 (−0.14, 2.27)
|
Outcome within 90 days of index visit
|
Intervention group
|
0.15 (−1.14, 1.44)
|
0.42 (0.08, 0.75)
|
0.08 (−0.74, 0.91)
|
−0.05 (−0.87, 0.77)
|
0.08 (−1.18, 1.34)
|
0.13 (−1.12, 1.38)
|
0.15 (−1.14, 1.44)
|
0.12 (−1.16, 1.40)
|
Postintervention
|
0.60 (−0.23, 1.43)
|
0.18 (−0.04, 0.40)
|
0.77 (0.24, 1.30)
|
1.24 (0.71, 1.77)
|
0.48 (−0.33, 1.30)
|
1.06 (0.25, 1.87)
|
0.60 (−0.23, 1.43)
|
1.24 (0.41, 2.07)
|
Difference-in-differences estimator
|
−0.80 (−2.83, 1.23)
|
−0.93 (−1.47, −0.38)
|
−1.15 (−2.45, 0.16)
|
−1.44 (−2.74, −0.15)
|
−0.75 (−2.74, 1.24)
|
−0.98 (−2.96, 1.01)
|
−0.80 (−2.83, 1.23)
|
−1.11 (−3.13, 0.92)
|
Note: Bold values reflect significance.
Additional factors significant in adjusted analysis included age and asthma severity for all outcomes. Male sex was associated with a significantly lower rate of asthma hospitalization, steroid use, and composite outcome though not ED visits. White and Asian race were associated with lower ED visit rates, while unknown race was associated with lower rates of all four outcomes. Hispanic ethnicity and public insurance were both associated with a higher ED visit rate but no other outcomes. Finally, pulmonology clinic visits were associated with fewer ED visits, but more steroid prescriptions in the subsequent 90 days.
Feedback from User Qualitative Interviews
Clinician feedback from interviews conducted with five providers from the intervention group was gathered, focusing on model explainability, usefulness, workflow, and suggestions on how to improve clinician buy-in and improve the current tool.
For model explainability, most providers understood that the model took various risk factors into account to flag patients at high risk of asthma exacerbation, though there was variation in their recall of the prediction timeframe and exact model functionality. Notable concerns brought up regarding the model included whether the training data were still valid given changes in asthma exacerbation prevalence due to the coronavirus disease 2019 (COVID-19) pandemic and the inability of the model to take into account health care visits outside of our system.
For usefulness, the model was felt to be generally useful in bringing attention to a patient's exacerbation risk, taken within the context of a provider's own prior knowledge of a patient's history. Some providers considered the score more when stepping down therapy, while others considered it more when stepping up therapy. One provider gave an example of a score of approximately 30 influencing their decision to not completely wean steroids for a patient. Nonetheless, providers described continuing to prioritize their personal clinical judgement, with one provider sharing “I'm a clinician, and I don't rely on a computer model to define what I'm hearing or seeing…asthma is unpredictable, and there are so many factors involved, that I don't know that a computer model can accurately predict exacerbations.”
For workflow, the inclusion of the score column on the EHR scheduling view was generally described as helpful in raising awareness of exacerbation risk prior to seeing a patient. However, limited visibility and inconsistent use were described as concerns as providers were not forced to review the score for each patient, and with usage depending on provider motivation. One provider described showing the tool and interpretation functionality to trainees to demonstrate the technology for teaching purposes.
Several suggestions were shared by providers for how to improve provider buy-in for the tool. These included: displaying exacerbation outcomes for each patient, demonstrating that the tool improves patient outcomes, increasing transparency regarding the accuracy of the training data, and highlighting any subjective model features that may introduce bias, as well as providing better explanations of what the model parameters are and their significance. Suggestions for modifications included: color coding the score based on risk level, displaying the score near other frequently reviewed asthma data (such as ACT scores in flowsheets), autopopulating the score within asthma patient notes, and potentially including medication compliance as a model parameter.
Discussion
A commercial, noninterruptive predictive model for pediatric asthma exacerbation implemented in pediatric pulmonary and allergy clinics was associated with significant reductions in asthma hospitalizations and ED visits in the 90 days postvisit, but no change in oral steroid prescriptions. Given the small sample of providers, single-center implementation, and crude reduction in the intervention group compared with the control group, the conclusion regarding these outcomes should be interpreted with caution. On the other hand, this quasiexperimental study shows that this model with moderate predictive performance was unlikely to be harmful or worsen disparities by race, ethnicity, or gender in this implementation.
We found themes common to human machine teaming in terms of how the model directed users' attention, the observability of model predictions, providers' ability to calibrate trust, and deficiencies in information presentation.[22] For observability, the model interpretation functionality was designed to provide basic insight into the features and coefficients that the model included in the calculation. However, the variation in provider usage and recall of this functionality may suggest that its current form might not be sufficient; as a provider suggested, better explanations of the model parameters and highlighting of any parameters which rely on subjective data which may introduce bias might be more helpful. Regarding calibrated trust, when the risk score was considered, it was generally described as being taken within the context of providers' own clinical knowledge and judgement. This was the intended use for the risk score (instead of providers disregarding their own clinical judgement and solely relying on the risk score), which is preferable for the use of predictive models in general. A proposal for improving calibrated trust in the model, derived from a suggestion from a provider, would be to close the loop for providers by including actual exacerbation data at the individual patient level within the tool. For example, if a patient with a certain risk score is seen by a provider and subsequently has an exacerbation, identifying this within the tool during the next visit may provide additional context to the provider regarding the model's accuracy for that patient. Trust in the model may also differ between providers who have more expertise in asthma, like the allergy and pulmonology specialists in this intervention group, and primary care providers. For information presentation, the incorporation of the risk column into the EHR scheduling view in a nonintrusive manner may have limited visibility to providers; however, as suggested by providers, future model iterations with risk-based color coding or auto-population of the score into flowsheets or notes may better surface the information without disrupting workflows. Overall, these usage themes and suggestions highlight barriers and facilitators to real-world provider use of the tool, which can inform future improvements.
Our results contrast with a randomized trial of noninterruptive clinical decision support (CDS) aiming to reduce length of stay for COVID-19 patients predicted to have low readmission risk.[22] This difference may be due to the superior study design of a randomized trial and larger sample size in Major et al. Alternatively, contextual differences in study settings or the implementation approach may explain the difference. In our study, a limited number of providers were engaged in the intervention group and received targeted education about the model's function. In Major et al, a larger group of stakeholders was engaged, which may have limited educational opportunities for specific providers. Additionally, in the ambulatory setting there is no ambiguity as to who is responsible for asthma treatment decisions that might incorporate the model, whereas in the inpatient setting responsibility may be more diffuse. Finally, our study's quasi-experimental design in the ambulatory setting facilitated separating the intervention and control groups at the level of the provider, minimizing the risk of contamination leading to a false negative.
In addition to the effect of the predictive model intervention, we also found significant associations between worse asthma outcomes and younger age groups and increased documented asthma severity, consistent with prior studies.[23]
[24] Similarly, the associations of white or Asian race with better asthma outcomes and Hispanic ethnicity and public insurance with worse outcomes are consistent with the prior literature.[25]
[26] In contrast, male patients in our study had lower rates of asthma hospitalizations and steroid use, whereas asthma prevalence and hospitalizations tend to be higher among young males in prior studies.[27]
When considering possible scaling of this predictive model for asthma, complementary work in the existing literature can guide potential approaches in conjunction with the results shared here. In future deployments, utilizing similar multimodal approaches (as described in patient-facing approaches[28]) in provider recruitment for this model may improve uptake among clinicians. Additionally, the qualitative results described in this study should be taken in the context of existing literature for asthma which describes factors affecting the use of electronic tools.[29]
Limitations
There were several limitations in this study.
-
This was a single-institution quality improvement study with a small sample of intervention providers who voluntarily adopted the predictive tool in a nonrandomized manner, which may have led to falsely positive results due to random chance or a Hawthorne effect with intervention providers. Additionally, the analysis was limited to outcomes documented in our EHR system, and any hospitalizations, ED visits, or steroid prescriptions at outside institutions were missed. The asthma prevalence within our local population also was lower than that within the Epic training sites, which may affect performance characteristics.
-
While scores pertaining to each patient were displayed to providers on their scheduling view, we were unable to independently calculate model predictive performance due to the inability to store and link scores retrospectively to encounters; thus, we had to rely on Epic's proprietary validation tool for these data. We also did not assess model drift during the short postimplementation time period.
-
For the difference-in-differences analysis, it is not possible to completely verify the parallel trends assumption (i.e., in the absence of intervention, trends for each group would continue in parallel). However, visual inspection of run charts suggests that trends were grossly parallel prior to implementation for all four outcomes and deviated postimplementation for the hospitalization and ED outcome. While our adjusted analysis also helps control for factors that may have confounded the association between the intervention and improved hospitalization and ED outcomes, there remain measured and potentially unmeasured variations in patient group characteristics that could have affected the results.
-
For the qualitative user feedback, we did not record or transcribe interviews or assess inter-coder reliability on themes, and interviews were reviewed by the authors as compared with third-party specialists. Additionally, our small sample size of five out of six intervention group providers may have limited insights as well as generalizability of our qualitative observations.
-
There were no primary care settings involved in this implementation, which is where many asthma patients are likely seen, and there was no consistent follow-up training given to providers after they initially learned about and added the risk column to their scheduling view.
-
Steroid prescriptions were unable to be verified as specific to treatment for an asthma exacerbation, which may have led to misclassifications in the steroid and composite outcome.
-
This analysis did not include balancing metrics; it is possible that model implementation may have led to increased health care resource utilization such as for patients with higher risk scores. Although it was not measured in this study, users may have modified or removed other provider list columns to add the model score, which may also influence clinical care.
-
We were also unable to quantify utilization of the model at a level beyond whether a provider had added the score column due to its noninterruptive nature.
-
We did not collect data on provider characteristics between the intervention and nonintervention groups or interview nonintervention group providers in this study design; there may be unmeasured factors within these areas which may influence the results.
-
As the unit of analysis was the encounter level, it is possible a patient may have been seen by an intervention group provider and then during the 90-day observation period they may have also been seen by a no-intervention group provider (and vice versa), leading to potential misclassification.
Conclusion
In the context of outpatient pediatric allergy and pulmonology clinics, a predictive model for pediatric asthma exacerbations was associated with reduced risk of asthma hospitalization or ED visit in the 90 days after the appointment, despite imperfect predictive performance. Qualitative data suggest providers appropriately prioritized their own clinical judgment when stepping up or down therapy but would nonetheless direct attention differently in some contexts based on model outputs. While these results may not hold up with larger, multicenter, or randomized study designs, these findings nonetheless suggest that noninterruptive predictive models are likely safe and may be efficacious in specific contexts such as for outpatient pediatric asthma management. Further studies in more diverse settings, across multiple institutions, and with larger sample sizes would help determine the generalizability of these findings and determine the optimal implementation strategy of noninterruptive, predictive CDS to improve outcomes.
Clinical Relevance Statement
Clinical Relevance Statement
It remains unknown how vendor predictive models impact clinical outcomes by disease process and care settings. In this difference-in-differences analysis, patients with asthma who visited providers with access to an electronic health record vendor predictive model for pediatric asthma exacerbation had significantly fewer asthma hospitalizations or ED visits in the subsequent 90 days than patients with visits to providers in the same clinic who did not have access to the predictive model. Noninterruptive predictive models may help pediatric allergists and pulmonologists prevent asthma hospitalizations and ED visits.
Multiple-Choice Questions
Multiple-Choice Questions
-
Which of the following is generally an advantage of electronic health record vendor-produced predictive models over predictive models developed within a health system?
Correct Answer: The correct answer is option a. EHR vendor-produced predictive models are generally easier to implement in production systems since the vendor goals are usually to maximize adoption across sites. For Epic Systems, for example, if the health system has appropriate licenses, they need only download the model and perform basic local configuration steps to ensure appropriate mapping. By contrast, health systems may develop models on retrospective data that are not easily implemented in the production EHR if the vendor does not accommodate a multitude of potential choices such as the software the model is written in, the format of data informing the model, the model type (e.g., deep learning vs. regression-based models), etc.
Vendor-produced models may not have as good of predictive performance as locally developed models that maximize predictive performance on a test set of data within the local health system.
The effect on clinical outcomes of vendor-produced models compared with locally produced models remains unknown and likely depends on more than predictive performance. Factors such as care setting, educational plan, clinical decision support representation, and other sociotechnical concerns likely play a major role in the impact of a model on clinical outcomes.
Predictive models are generally not used primarily for data collection.
-
Which of the following factors likely contributes to a predictive model's impact on clinical outcomes?
Correct Answer: The correct answer is option d. Predictive models have had variable impact on clinical outcomes; in some cases, even the same model implemented in different ways at two health systems has resulted in different outcomes. Thus, while predictive performance (sensitivity, specificity, etc.) likely has some impact on the model's effectiveness, it is not the only important factor.[16]
[30] Clinical decision support representation (i.e., how the predictive model is presented to the user—in an alert, a patient list column, etc.) likely plays a role in how clinicians perceive and use the model outputs. However, while in this manuscript a patient list view of asthma exacerbation risk was associated with reduced exacerbations in the intervention group, a similar approach for COVID-19 readmission risk did not yield reductions in subsequent readmissions.[22] Workflow integration of artificial intelligence may be the most critical determinant of effectiveness,[31] but model performance characteristics and model representation also affect adoption. Thus, all of the above in this case are likely contributors to a predictive model's impact on clinical outcomes.