Keywords
Random Forest - COVID-19 - public health - statistical models - syndromic surveillance
Introduction
Creating long-term, multisource, national surveillance data services for emerging disease response is a complex topic to which coronavirus disease 2019 (COVID-19) has given new importance.[1]
[2]
[3]
[4]
[5] Public health emergencies responses seldom leave surplus time or resources to stand up novel methods and respond, further essentializing (specific) disease preparedness.[6]
[7]
[8] More often than not epidemic response is managed using preexisting data services, often legacy data series from yesteryear's epidemics.[9]
[10]
[11] Epidemic preparedness in the United States is generally weak, and the COVID-19 response is largely drawn from preexisting pan-flu emergency plans.[12]
[13]
During a public health emergency, the clinical knowledge needed to respond is developed by case surveillance drawn from preexisting data series. COVID-19 has presented an unusual opportunity to evaluate agreement across surveillance efforts within the United States. The ability to detect clinical findings from surveillance nets and epidemiology methods which were not necessarily designed to detect them in meaningful ways is a high priority for the future management of emerging infectious diseases. Strikingly, the difference in COVID-19 mortality for severe acute respiratory syndrome (SARS)-impacted countries (China, South Korea, and Australia) versus the United States comes down to what emergency response plan was last implemented (SARS vs. swine flu) and the fitness of surveillance (case specific vs general population) rather than deeper cultural, economic, or racial differences, as have been proposed in popular media.[14]
[15]
[16]
[17]
[18]
[19]
[20]
Objectives
In this study, public health surveillance data are processed using a machine learning approach to discover the relative agreement of a surveillance event series when predicting surveillance event series. Toward objectives, this study seeks to assess the agreement between event series and contrast the value of traditional surveillance methods (death certificates, influenza, and respiratory infection claims volumes) with nontraditional sources such as national Emergency Medical Services (EMS) call volume data in the COVID-19 era in the United States.
Methods
Statistic of Interest
Variable importance is the statistic of interest in this study. Variable importance means that when predicting the dependent variable, an independent variable which is of comparatively higher predictive value (association) than another is of higher (predictive) use value. When considering high variable importance with weekly event series data, series which help the machine learning models learn, predict, or guess the correct dependent weekly event series could be cooccurring or mutually observed events. The high variable importance scores from different sources suggest that series observe the same real-world event across surveillance efforts as they support prediction better than noise and other candidate series (other independent variables).
Of special interest are “high variable importance and independent variables” from a different data source than the dependent variable. High same-source variables are most likely high in value because they are similarly distributed across study weeks to their parent–sister series and in turn are not necessarily interesting. A series of events can be said to have “agreement value” if it has high statistical agreement with other series from a different source. Low statistical agreement suggests “out of era” events or events which are not driven by the same causes as other series considered here.
Toward noise and disagreement, influenza and respiratory infection claims volumes are considered below with COVID-19 claims volumes. Claims volumes are traditionally used in influenza surveillance. As a test of the efficacy of the models described here, COVID-19 volumes should be able to “outperform” influenza volumes as the COVID-19 era is largely understood to be influenza sparse. In this way respiratory and influenza events could be understood as a control arm as well as a model output of independent interest.
Data Sources
Medicare
Medicare provided three event series to this study. Medicare encounter-level claims through July, 2021 were sourced through the Chronic Conditions Warehouse (CCW). Records from 2015 through July 2021 were considered. Claims that contained influenza, COVID-19, or respiratory infection diagnostic code were enrolled. A series was generated for counts of distinct individuals within a series by calendar week. The Medicare-sourced series do not describe the duration of illness but the frequency of billing over time for distinct individuals. Medicare claims provided three series to this study, specifically “Influenza Diagnostic (DX) Codes,” “COVID-19 DX Codes,” and “(Viral) Respiratory Infection DX Codes” series. The viral respiratory series includes fever, bronchitis, viral lung infection, acute respiratory distress syndrome (ARDS), and pneumonia ICD10-CM codes. Procedure, HCPS, and CPT-4 codes were not considered.
The Centers for Disease Control and Prevention
The Centers for Disease Control and Prevention (CDC) provided five series for this study. COVID Deaths: COVID deaths are described as weekly data set which disambiguates the primary cause of death (COD) on Multiple Cause of Death Certificates (MCDC) received by the CDC within the given week. The dataset further describes secondary causes of death when COVID-19 diagnostic codes are present. The COD All Cause, COD COVID Primary, and COD COVID Secondary series in this study were learned from this data set. COVID deaths data were retrieved from: “https://data.cdc.gov/NCHS/Provisional-COVID-19-Deaths-by-HHS-Region-Race-and/tpcp-uiv5.”
Excess Mortality: CDC evaluates “excess mortality' or death certificates above expectation where expectation means the three smallest death rates per state within a condition and calendar week.[21]
[22]
[23]
[24]
[25]
[26]
[27] These deaths are technically preventable because they are being prevented in real time in other states. The interpretation of excess mortality is a complex topic, and individuals who die in excess are not necessarily dying significantly before they would have died baring excess. Two study series are learned from this data set, Observed Deaths and Excess Deaths. Excessive deaths are produced using Farrington flexible methods.[28]
[29] Excess mortality data were retrieved from “https://data.cdc.gov/NCHS/Excess-Deaths-Associated-with-COVID-19/xkkf-xrst” and “https://github.com/Mortality-Surv-and-Reporting-Proj/county-level-estimates-of-excess-mortality.”
The National Emergency Medical Services Information System
The National Emergency Medical Services Information System (NEMSIS) provided five event series to this study. NEMSIS is a complex data center which collects data from state-level supervising EMS authorities.[30]
[31] NEMSIS is designed to support EMS outcomes research and complex, evidence-based-medicine research.[32] NEMSIS has a stable data model of EMS episode values which are collected for every emergency (911) call which is routed to an EMS in the United States. A weekly extract was created using NEMSIS OLAP cubes for 2014 to 2016 and 2017 present. The cardiac arrest (CA) subset which codes calls for arrests before and after EMS arrived on the scene was also extracted. “NEMSIS Calls,” “NEMSIS Calls CA Yes,” “NEMSIS Calls CA No,” and “NEMSIS CA Prior” to arrival and “NEMSIS CA After” arrival of the EMS crew were learned from NEMSIS. NEMSIS data was retrieved from: “https://nemsis.org/view-reports/public-reports/ems-data-cube/.”
Statistical Models
The 13 series sets were integrated into a single “cases per week” data model and processed using machine learning methods in h2o.ai (https://www.h2o.ai). Specifically, models were generated to learn the dependent to independent variable relationships across the series such that each series weekly value was attempted to be learned (predicted) from all other weekly event series values. Each series took a turn being the dependent variable in a Distributed Random Forest (DRF) model.[33] R squares (r
2) for models as well as scaled variable importance in decision-making are described below in detail. Models were cross-validated five times each. Note each series was itself a model (being predicted) from other series for a total of 14 models (13 event series and the study week itself). The statistic of interest is the variable importance of an independent variable when attempting to predict the dependent variable within a DRF model.
Models considered any volume between January 1st, 2018 and July 1st, 2021. Raw case count values were used, neither log/lag modeling nor relative rates were considered. Note DRF transforms numeric values to a continuous distribution in preprocessing (before processing). The fitness of “week” of event most likely obscures or confounds episode attribution of count data model events as a case could be transported by EMS, bill Medicare and populate a CDC death certificate within a calendar week or over several months in the case of advanced life support. The models should not be used to model the epidemic but rather to assess the agreement within the implicit (pseudo-harmonized) time scales of the series.
Results
[Table 1] describes the event series, its data source, the specific data set name, the series extracted for this study, the time range, and the total events within the series of interest. Note that NEMSIS CA status is a declaration aggregate, and call where CA did not occur is a call with an explicit declaration. In turn, the total calls (sum) do not reflect the sum of CA and non-CA calls.
Table 1
Series ranges and data sources
Source
|
Data set
|
Series
|
Start
|
Stop
|
Case weeks
|
Medicare
|
Patient Level Claims
|
Influenza Events
|
01-01-2015
|
6/31/2021
|
38,37,068
|
Medicare
|
Patient Level Claims
|
COVID Events
|
01-01-2015
|
6/31/2021
|
1,78,49,177
|
Medicare
|
Patient Level Claims
|
Respiratory Infection Events
|
01-01-2015
|
6/31/2021
|
14,07,77,208
|
CDC
|
Excess Deaths Associated with COVID-19
|
Total Weekly Deaths
|
01-01-2017
|
04-12-2021
|
1,50,66,215
|
CDC
|
Excess Deaths Associated with COVID-19
|
Weekly Excess Deaths
|
01-01-2017
|
04-12-2021
|
9,51,680
|
CDC
|
Provisional COVID-19 Deaths by HHS Region, Race, and Age
|
Weekly MCDC
|
01-01-2015
|
11-12-2021
|
1,95,69,921
|
CDC
|
Provisional COVID-19 Deaths by HHS Region, Race, and Age
|
Weekly COVID Primary MCDC
|
01-01-2015
|
11-12-2021
|
5,90,090
|
CDC
|
Provisional COVID-19 Deaths by HHS Region, Race, and Age
|
Weekly COVID Secondary MCDC
|
01-01-2015
|
11-12-2021
|
6,52,472
|
NEMSIS
|
OLAP Cube
|
EMS Calls
|
01-01-2014
|
10-12−2021
|
23,79,08,326
|
NEMSIS
|
OLAP Cube
|
EMS Cardiac Arrest Calls
|
01−01−2014
|
10−12−2021
|
21,78,494
|
NEMSIS
|
OLAP Cube
|
EMS Non-Cardiac Arrest Calls
|
01−01−2014
|
10−12−2021
|
16,36,24,383
|
NEMSIS
|
OLAP Cube
|
All Cardiac Arrest Pre-EMS Arrival
|
01−01−2014
|
10-12-2021
|
19,10,767
|
NEMSIS
|
OLAP Cube
|
All Cardiac Arrest Post-EMS Arrival
|
01-01-2014
|
10-12-2021
|
2,67,727
|
Abbreviations: CDC, The Centers for Disease Control and Prevention; COVID-19, coronavirus disease 2019; EMS Emergency Medical Services; MCDC, Multiple Cause of Death Certificates; NEMSIS, The National Emergency Medical Services Information System.
[Fig. 1] shows the weekly volume of events within series described as totals in Table 1. The upper right describes Medicare weekly case events, and the bottom right describes excess mortality series. The upper left describes NEMSIS series, and the bottom left describes COVID-19 death certificates. Figure one demonstrates a collapse in influenza Medicare claims and spikes in covid and viral respiratory infection codes toward the end (right) of the series. COVID excess deaths and MCDC indicate similar peaks on the right side of the x-axis as well. All NEMSIS call volumes are elevated as time progresses.
Fig. 1 The weekly event volume by event type. The upper right line graphs describe the per member per weekly occurrence of qualifying diagnostic codes on identifiable Medicare claims. COVID-19 (red), influenza (green), and respiratory infection codes (blue) are featured. The bottom right figures show the Excess Deaths (Red) and Observed Deaths (Blue) from which excess deaths are learned in the CDC excess mortality model. The upper left region describes the NEMSIS series with cardiac arrest after EMS arrival (Red), cardiac arrest prior (Brown), total calls (Green), calls without cardiac arrest (Blue) and calls with arrests (Purple). The lower left shows the all-cause mortality multiple cause of death certificate volumes (Red) and volumes where the primary (Green) and secondary causes of death (Green) were COVID-19. The x-axis is the study week, and the y-axis is the volume for all figures.
[Table 2] presents a matrix of dependent and independent variable series relationships, where the scaled variable importance is presented. Each column is a DRF model where the column header is the dependent variable. The independent variables are listed along the left-hand side of the table. In scaled variable importance measures, “1” is the highest value and independent variable can receive; and only one “1” can be awarded within a model. For example, dependent “Influenza DX Codes” weekly values from Medicare were most strongly learned from “Respiratory Codes” (1) from Medicare followed by “All Cause COD” (0.7191) from MCDC, “Observed Deaths” from Excess Deaths (0.6552) and “COVID-19 DX Codes” from Medicare (0.4475). Alternately, “COVID 19 DX Codes” from Medicare shows “Week Ending Date” (1), followed by “COVID Primary COD” (0.4015) and “COVID Secondary COD” from MCDC (0.3455), “Excess Deaths” (0.2451), and strikingly “NEMSIS CA Prior EMS” (0.2445). Note that when predicting “COVID 19 DX Codes,” “Respiratory Codes” are of little help (0.0636) but when predicting “Respiratory Codes,” “COVID 19 DX Codes” are fairly helpful (0.8722) when making said prediction. r
2 is plotted above the dependent variable.
Table 2
Variable importance matrix and original values with dependent variables (column wise)
r
2:
|
0.5532
|
0.9951
|
0.9968
|
0.9963
|
0.9839
|
0.9961
|
0.8897
|
0.7858
|
0.9048
|
0.9844
|
0.9823
|
0.8768
|
0.9537
|
0.9758
|
|
Week Ending Date
|
NEMSIS Calls
|
NEMSIS Calls CA Yes
|
NEMSIS Calls CA No
|
NEMSIS CA After EMS
|
NEMSIS CA Prior EMS
|
Influenza DX Codes
|
Respiratory Codes
|
COVID 19 DX Codes
|
COD COVID Primary
|
COD COVID Secondary
|
COD All Cause
|
Excess Deaths
|
Observed Deaths
|
Week Ending Date
|
NA
|
0.1163
|
0.0904
|
0.0969
|
0.1025
|
0.1128
|
0.1705
|
0.7063
|
1
|
0.066
|
0.0703
|
1
|
0.1654
|
1
|
NEMSIS Calls
|
0.0115
|
NA
|
0.3877
|
0.8451
|
0.3234
|
0.3354
|
0.1749
|
0.0374
|
0.1052
|
0.0083
|
0.0149
|
0.0037
|
0.0097
|
0.045
|
NEMSIS Calls CA Yes
|
0.0141
|
0.908
|
NA
|
0.2401
|
0.6746
|
0.6455
|
0.099
|
0.0239
|
0.0517
|
0.1791
|
0.1865
|
0.0373
|
0.1637
|
0.0037
|
NEMSIS Calls CA No
|
0.0147
|
0.5712
|
0.1672
|
NA
|
0.3221
|
0.3379
|
0.2451
|
0.048
|
0.138
|
0.0249
|
0.0229
|
0.0054
|
0.0108
|
0.0143
|
NEMSIS CA After EMS
|
0.0039
|
0.1973
|
0.3163
|
0.1875
|
NA
|
1
|
0.1404
|
0.0282
|
0.0356
|
0.5651
|
0.585
|
0.0173
|
0.7209
|
0.0016
|
NEMSIS CA Prior EMS
|
0.0037
|
1
|
1
|
1
|
1
|
NA
|
0.1109
|
0.0482
|
0.2445
|
0.1867
|
0.1851
|
0.0236
|
0.3016
|
0.0041
|
Influenza DX Codes
|
0.1255
|
0.0056
|
0.0009
|
0.006
|
0.0035
|
0.0013
|
NA
|
1
|
0.0408
|
0.0095
|
0.0088
|
0.0789
|
0.0215
|
0.0025
|
Respiratory Codes
|
0.0834
|
0.0061
|
0.0009
|
0.006
|
0.0041
|
0.0009
|
1
|
NA
|
0.0636
|
0.0066
|
0.0108
|
0.1659
|
0.0575
|
0.003
|
COVID 19 DX Codes
|
0.0618
|
0.0512
|
0.0503
|
0.0444
|
0.0599
|
0.0599
|
0.4475
|
0.8722
|
NA
|
0.0461
|
0.0431
|
0.0702
|
0.0396
|
0.1613
|
COD COVID Primary
|
1
|
0.0676
|
0.0647
|
0.0615
|
0.0822
|
0.0774
|
0.2703
|
0.1078
|
0.4015
|
NA
|
1
|
0.4791
|
0.7849
|
0.0373
|
COD COVID Secondary
|
0.4674
|
0.0029
|
0.002
|
0.0031
|
0.0069
|
0.0063
|
0.3611
|
0.0567
|
0.3455
|
1
|
NA
|
0.6306
|
1
|
0.0704
|
COD All Cause
|
0.8665
|
0.0056
|
0.0049
|
0.0054
|
0.0088
|
0.0078
|
0.7191
|
0.2043
|
0.0726
|
0.8997
|
0.9163
|
NA
|
0.1388
|
0.1097
|
Excess Deaths
|
0.0195
|
0.0293
|
0.029
|
0.0286
|
0.0409
|
0.035
|
0.4472
|
0.3178
|
0.2451
|
0.8894
|
0.8964
|
0.0759
|
NA
|
0.4037
|
Observed Deaths
|
0.0073
|
0.0105
|
0.0078
|
0.0057
|
0.0162
|
0.0126
|
0.6552
|
0.0834
|
0.0516
|
0.666
|
0.6915
|
0.0871
|
0.9865
|
NA
|
Abbreviations: COD, cause of death; COVID-19, coronavirus disease 2019; EMS Emergency Medical Services; NEMSIS, The National Emergency Medical Services Information System.
[Table 3] replots [Table 2] values as above or below the model run's geometric mean variable importance score (column-wise geometric mean). The regions within the black outlines should be understood as variables from the same series source. While the models did know weekly features from the same data source their importance toward the study objective is minimal. For example, the only “same source series” variable importance below average was the Medicare “COVID 19 DX” model with influenza and viral respiratory variables being low importance (as expected). This should mean that the model did not learn what the weekly “COVID 19 DX Codes” volume was from viral infection and influenza codes; their series are independent in this study. Above variable importance within column models from different series should detail the interrelatedness of the multiseries weekly events. For example, “NEMSIS CA After EMS” shows above the geometric mean of variable importance for “Week Ending Date,” “COVID 19 DX Codes,” and “COD COVID Primary” series. The “Total Above” ranged 5 to 8, indicating similar importance distributions.
Table 3
Variable importance matrix by dependent value column wise with independent variables above and below the geometric model mean (column wise)
|
Week Ending Date
|
NEMSIS Calls
|
NEMSIS Calls CA Yes
|
NEMSIS Calls CA No
|
NEMSIS CA After EMS
|
NEMSIS CA Prior EMS
|
Influenza DX Codes
|
Respiratory Codes
|
Covid 19 DX Codes
|
COD COVID Primary
|
COD COVID Secondary
|
COD All Cause
|
Excess Deaths
|
Observed Deaths
|
Week Ending Date
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
NEMSIS Calls
|
BELOW
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
NEMSIS Calls CA Yes
|
BELOW
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
NEMSIS Calls CA No
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
NEMSIS CA After EMS
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
NEMSIS CA Prior EMS
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
Influenza DX Codes
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
NA
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
Respiratory Codes
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
NA
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
COVID 19 DX Codes
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
ABOVE
|
COD COVID Primary
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
COD COVID Secondary
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
COD All Cause
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
Excess Deaths
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
Observed Deaths
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
Total Above
|
6
|
7
|
7
|
7
|
7
|
7
|
6
|
5
|
6
|
7
|
7
|
8
|
8
|
7
|
Abbreviations: COD, cause of death; COVID-19, coronavirus disease 2019; EMS Emergency Medical Services; NEMSIS, The National Emergency Medical Services Information System.
In [Table 4], the geometric mean has been computed for each row and if the raw value exceeds the geometric mean, the raw value is marked “above” as in [Table 3]. [Table 4] can assess above average variable importance across models. High variable importance across models indicates that multiple series relied on the independent variable to learn the dependent weekly value. For example, in [Table 4], “COD All Cause” independent variable was above the average variable importance (for different sources) models “Week Ending Date,” “Influenza DX Codes,” “Respiratory Codes,” “Excess Deaths,” and “Observed Deaths” (from excess deaths source). Total Above ranged from 2 to 10, suggesting that some series had acute agreement (small number) and some have generalized agreement. The Medicare sourced series have low Total Above, indicating their value is concentrated in models “COVID All Cause” and “Observed Deaths.” Note that NEMSIS CA Prior EMS is tied with Week Ending Date in first place (10).
Table 4
Scaled variable importance above the geometric mean row wise (independent variable) across models (column wise)
|
Week Ending Date
|
NEMSIS Calls
|
NEMSIS Calls CA Yes
|
NEMSIS Calls CA No
|
NEMSIS CA After EMS
|
NEMSIS CA Prior EMS
|
Influenza DX Codes
|
Respiratory Codes
|
COVID-19 DX Codes
|
COD COVID Primary
|
COD COVID Secondary
|
COD All Cause
|
Excess Deaths
|
Observed Deaths
|
Total Above
|
Week Ending Date
|
NA
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
10
|
NEMSIS Calls
|
BELOW
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
7
|
NEMSIS Calls CA Yes
|
BELOW
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
8
|
NEMSIS Calls CA No
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
7
|
NEMSIS CA After EMS
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
9
|
NEMSIS CA Prior EMS
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
10
|
Influenza DX Codes
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
NA
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
2
|
Respiratory Codes
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
NA
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
2
|
COVID-19 DX Codes
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
3
|
COD COVID Primary
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
BELOW
|
9
|
COD COVID Secondary
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
ABOVE
|
8
|
COD All Cause
|
ABOVE
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
ABOVE
|
7
|
Excess Deaths
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
ABOVE
|
7
|
Observed Deaths
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
BELOW
|
ABOVE
|
ABOVE
|
BELOW
|
ABOVE
|
ABOVE
|
ABOVE
|
ABOVE
|
NA
|
6
|
Abbreviations: COD, cause of death; COVID-19, coronavirus disease 2019; EMS Emergency Medical Services; NEMSIS, The National Emergency Medical Services Information System.
Discussion
Toward prior work, syndromic surveillance and the uses of prehospital data in understanding hospital utilization, (influenza) vaccination uptake, and community health are well described.[34]
[35]
[36] However, the potential for prehospital CA to be considered as a syndromic effect is perhaps limited to influenza and local area use cases in the United States.[37] The same cannot be said for Europe.[38]
[39] There is evidence that COVID-19 is associated with sudden cardiac death, some of which should be prehospital and pre-EMS arrival.[40] As influenza has inspired developments in syndromic surveillance, perhaps COVID-19 will do the same.[38]
Toward study findings, appreciating the severity of COVID-19 in the United States has been met with difficulty.[41]
[42]
[43] Preexisting surveillance methods have proven inadequate, and CDC has proposed a modernization effort to produce novel surveillance efforts within the epidemic response.[44] Ancillary events, such as EMS calls and Medicare bills, could support surveillance tasks like early detection of an outbreak, severity models, and prevention efforts. This paper demonstrates that Medicare and NEMSIS data have value when predicting traditional measures of epidemic modeling like COD and Excess Mortality.
Within Medicare sourced series, EMS call volumes were below average variable importance for Influenza and Respiratory Viral claims volumes but were above average for COVID-19 volumes when calls without CA and calls where CA occurred prior to EMS arrival are considered. NEMSIS series benefited from knowing the call volumes which were CA prior to EMS arrival, consistently ranked within NEMSIS series as 1 or the most important. COVID-19 as primary COD on a multiple COD certificate and the volume of Medicare COVID-19 claims was also above average in importance when predicting NEMSIS call volumes. This suggests that COVID-19 is driving EMS call volumes.
Within CDC MCDC series both primary and secondary COD models found above average predictive value from NEMSIS call volumes which involved a CA, suggesting that EMS arrests may not survive the experience. There is also predictive value in the CDC excess mortality model values but this is to be expected as the excess mortality model was designed to evaluate excess mortality from COVID-19. Within CDC Excess Mortality series, NEMSIS call volumes for CA as well as COVID-19 being present on a multiple COD certificate were high value when predicting the weekly Farrington Flexible mortality excess estimates.
Variable importance detailed in Tables 2 and 3 demonstrates meaningful model segmentation between series and series events. Influenza and viral respiratory codes are particularly interesting as a “control” case in this COVID-19 era data set. Both influenza and viral respiratory series show interrelatedness in their variable importance and difference or segmentation from COVID-19. “CA prior to EMS” arrival was also of note because “CA prior to EMS” arrival most likely results in a decedent without a COVID-19 diagnosis, a decedent who may be ineligible for a primary COD ‘COVID-19’ declaration. [Table 3] further belabors the point, with “COD Primary COVID” model showing “NEMSIS Calls CA Yes,” “NEMSIS CA Before,” “NEMSIS CA Prior,” “Observed Deaths,” and “Excess Deaths” above the geometric mean of variable importance within the “COD Primary COVID” model. Given that DRF does not know what a cardiac arrest is nor Farrington Flexible but is still able to associate the weekly distributions with COVID-19 primary COD on MCDCs from only the weekly counts highlights the strength of this approach.
Table 4 demonstrates high general utility for most independent variables in the model series. It also suggests that the Medicare series was not as strongly utilized in decision-making with a geometric mean range of 2–3. This could be due to the real-world sampling distribution of Medicare enrollment relative to the total morbidity burden in the United States. How much of the COVID-19 burden should be among Medicare beneficiaries remains unknown. All other series are national, while Medicare is enrollee specific and may not offer as much instruction to prediction. However, despite the difference in real world lag (between claims being processed and a death certificate being populated, or a 911 call being placed), the model produced r
2 > 0.9 in most cases. Note that “NEMSIS CA Prior EMS” had as many “above” the geometric mean in [Table 4] as the week itself. This means it is tied for the best predictor across models. The implications of these prior arrests are profound, and they may be a sink of underrecognized COVID-19 mortality.
The length of the series, and the “isotonic” nature of the data may explain the difficulty of predicting the week of series, as the opportunity for weekly patterns to repeat most likely confused week assignments. As COVID and influenza had multiple “waves” over the observation period, a bad week guess could be a repeat start, peak, or end event. A bad week guess could also be a time point with little data being confused for another low-volume time point. The NEMSIS anomaly in 2017 (low volumes) is not well understood but is most likely due to NEMSIS transitioning OLAP series in 2017 or perhaps there was a national decrease in EMS call volumes in 2017. Most likely the models are not impacted as the models consider records from 2018 onward.
The analysis would be more robust if series completeness could be achieved, especially in early model years. [Table 1] shows several data series available in earlier years than others. Medicare data particularly suffers from changes in diagnostic code recall in ICD9-CM versus ICD10-CM years (only ICD10-CM years were considered here). The “stability” of a series is of high importance when evaluating future surveillance value. The model did not weigh variables by series source and did not “know”' that variables were from the same data sources. Weighting series completeness may improve model results; however, r
2 was high across models. The Medicare series contains diagnostic and pathology codes for influenza and COVID-19. There may be noncase incidence drivers of testing, vaccination, and pathology including nosocomial infections, the “worried well” as well as public health interventions (mass testing and roster vaccinations). Disambiguating the Medicare indexes could increase their utility even further. The viral respiratory code list includes minor codes like fever as well as ARDS and pneumonia. Their disambiguation by severity may improve model utility as well.
Conclusion
Prehospital data (EMS) are of high value in COVID-19 surveillance and should be considered as a potential data source when attempting to learn COVID-19 severity within jurisdictions. Medicare data faired weaker though individuals providing care to the Medicare population should consider the disambiguation of patients with COVID-19 from individuals seeking COVID-19 prevention services (testing and vaccination).
Human Subjects Protections
Human Subjects Protections
While this study contains identifiable information describing live human subjects, no National Institutes of Health Institutional Review Board (NIH IRB) review was required. Note that Centers for Medicare and Medicaid Services (CMS) data access and use are approved through the CMS IRB, however. Data were further “cleared” for public release by C.C.W., and C.C.W. evaluated our compliance with CMS nonreidentification standards for data describing beneficiary populations.