Background and Significance
The U.S. federal government's goal is to have 90% of its health care payments based
on care quality by 2018.[1 ] In addition, private payers have increasingly incorporated quality outcomes in their
contracts.[2 ] The transition from fee-for-service to value-based-payment relies on accurate and
reliable methods to measure the quality of care delivered. Many programs have advanced
this capability; all of which require objective data and measure definitions.
The longest established program for quality measurement in the United States is the
Health Effectiveness Data and Information Set (HEDIS) program managed by the National
Committee for Quality Assurance (NCQA). This program began in 1991 and is currently
used by over 90% of health plans.[3 ] HEDIS has historically used longitudinal information, primarily electronic billing
data from multiple providers, to calculate care quality. This program has shown progress
in improving quality outcomes.[4 ]
[5 ] However, it is challenging to use measures calculated from payer administrative
data for ambulatory care improvement due to reporting latency, insufficient clinical
specificity, payer patient market share, and inadequate risk adjustment.[6 ]
A more recent national initiative directly focused on ambulatory care improvement
is the Physician Quality Reporting System. Started in 2006, this program provided
a voluntary reporting bonus. It reached over 600,000 physicians participating in Medicare
but relied on methods developed before widespread electronic health record (EHR) adoption.[7 ] To accelerate EHR adoption with a goal of improving care quality, the Meaningful
Use incentive program was launched in 2010 by the Centers for Medicare and Medicaid
Services (CMS). Only 11% of physicians had a basic EHR at that time.[8 ] The Meaningful Use program brought widespread EHR adoption with over 78% of ambulatory
clinicians using certified EHRs by the end of 2015.[9 ] Part of the Meaningful Use program required the calculation and reporting of at
least six quality measures. Incentives were paid for reporting but were not tied to
performance. Quality calculations for reporting in this program used information available
in EHRs; challenges have been noted in this approach.[10 ]
[11 ]
[12 ] Unlike HEDIS, EHRs often calculate measure compliance using only data documented
within that EHR, in part due to lack of health information exchange and interoperability
challenges.[13 ]
[14 ]
The Merit-Based Incentive Payment System, enacted as part of the Medicare Access and
CHIP Reauthorization Act (MACRA), succeeded Meaningful Use for ambulatory clinical
quality reporting. Beginning in 2017, based on quality performance, high performing
clinicians are paid more than lower performing ones.[15 ] This program also introduces an alternative method of quality reporting, qualified
clinical data registries (QCDRs). QCDRs are third-party organizations that accumulate
clinical data from various providers for quality measurement. Since QCDRs can collect
data on the same patient from different organizations, including those using different
EHRs, they can provide a longitudinal approach to performance measurement like HEDIS.
This requires the use of interoperability standards to aggregate the data from different
EHRs.
The primary standards that support clinical data exchange today from EHRs are Health
Level 7 (HL7) messaging and the Consolidated Clinical Document Architecture (C-CDA).
Previous research has demonstrated that clinical documents, such as the C-CDA, provide
many of the necessary data elements for quality measure calculation.[16 ]
[17 ] Research is lacking, however, on the implementation of quality measurement by QCDRs,
particularly those integrated with health information exchanges. In addition, studies
have called into question the validity and reliability of quality measures calculated
by EHR reporting systems. This is due to challenges in data completeness, accuracy,
appropriate codification, gaps between structured fields and available free-text,
as well as inconsistency of measure logic implementation.[18 ]
[19 ]
[20 ] Examination of clinical data from multiple EHRs provides an opportunity to explore
how data transformation may improve quality measure calculation while recognizing
these concerns. Furthermore, quality measure definitions for HEDIS and other reporting
programs are specified using the Health Quality Measure Format and Quality Data Model
(QDM). These specifications expect Quality Reporting Document Architecture (QRDA)
documents as the clinical data format while this research explores the applicability
of C-CDA documents to quality measurement.
Methods
We sampled the KHIN data from 11 ambulatory care sites during the 1-year period from
July 1, 2016 to June 30, 2017. Sites were selected based on size (> 300 visits per
month), continuous submission of clinical documents to KHIN, and independence from
an acute care institution since all the quality measures in this study relate to ambulatory
care. Selected facilities were not contacted in advance, so the data sample represents
a sample of data regularly used in health information exchange. Patient data use in
this research was approved by the UTHealth Committee for the Protection of Human Subjects.
One hundred unique patients were randomly selected from each facility; the same patient
was never selected from more than one facility. Data from a single clinical document
during the time frame was used for quality measurement. Documents included a wide
range of clinical data, including patient diagnoses, immunizations, medications, laboratory
results, problems, procedures, and vital signs. These clinical domains are required
by Meaningful Use as part of Continuity of Care Documents. Multiple EHRs were represented,
including Allscripts (Chicago, Illinois, United States), Computer Programs and Systems,
Inc. (Mobile, Alabama, United States), eClinicalWorks (Westborough, Massachusetts,
United States), General Electric (Chicago, Illinois, United States), and Greenway
Medical (Carrollton, Georgia, United States). The data were processed by Diameter
Health's (Farmington, Connecticut, United States) Fusion and Quality modules (version
3.5.0), technology certified by NCQA for electronic clinical quality measurement.[21 ] This software includes both transformation logic associated with clinical data and
measure logic necessary to calculate and report quality performance. An example of
how quality measure compliance may be calculated in the software application for a
fictional patient not derived from any real patient information is shown in [Fig. 1 ].
Fig. 1 Quality measure presentation in software application. Quality calculation shown for
a fictional patient for calculated measures, with clinical detail shown for a specific
measure. Note 1: Tabs along the top show three eligible measures with compliance and
three eligible measures with noncompliance. Note 2: The button labeled “Smoking Gun”
provides specific clinical detail that substantiates measure eligibility and compliance
calculation. Note 3: The clinical detail of the eligible encounter, diagnosis and
laboratory result that supports compliance for the selected measure (cms122v5 Diabetic
HbA1c < 9%). Copyright and reprinted with permission of Diameter Health, Inc.
Twenty-four measures were available in the certified software, although 7 were excluded
from this study. Five measures were excluded since they require data on multiple care
encounters, which may not be accurately represented in a randomly selected clinical
document (e.g., multivisit initiation and maintenance for drug dependence therapy).
One was excluded due to the lack of behavioral assessment data in the sample and one
was excluded since it had been discontinued for use by CMS. The 17 examined measures
constituted a broad range of process and outcomes measures across diseases and preventative
care as shown in [Table 1 ]. Each measure's logic was specified according to the QDM and was eligible for use
in CMS quality reporting programs.[22 ]
Table 1
Quality measures selected in this research
CMS identifier
Measure description
Measure type (reason)
Measure steward
74v6
Primary caries prevention
Process (preventative)
CMS
82v4
Maternal depression screening
Process (preventative)
NCQA
122v5
Diabetes: Poor HbA1c control
Outcome (disease control)
NCQA
123v5
Diabetes: Annual foot exam
Process (preventative)
NCQA
124v5
Cervical cancer screening
Process (preventative)
NCQA
125v5
Breast cancer screening
Process (preventative)
NCQA
127v5
Pneumonia vaccination of older adults
Process (Preventative)
NCQA
130v5
Colorectal cancer screening
Process (preventative)
NCQA
131v5
Diabetes: Annual eye exam
Process (preventative)
NCQA
134v5
Diabetes: Attention for nephropathy
Outcome (disease control)
NCQA
146v5
Appropriate testing for children with pharyngitis
Process (utilization)
NCQA
153v5
Chlamydia screening for women
Process (preventative)
NCQA
154v5
Appropriate treatment for children with upper respiratory infection
Outcome (utilization)
NCQA
155v5
Pediatric weight assessment
Process (preventative)
NCQA
156v5
High risk medication use in elderly
Outcome (patient safety)
NCQA
165v5
Controlling high blood pressure
Outcome (disease control)
NCQA
166v6
Use of imaging studies for back pain
Outcome (utilization)
NCQA
Abbreviations: CMS, Centers for Medicare and Medicaid Services; NCQA, National Committee
for Quality Assurance.
The quality measures were first calculated using clinical data without any transformation
logic. Since clinical documents generally available to KHIN were used in this study,
the software-aligned clinical data from these extracts to quality measure criteria
as specified in the QDM. The measures were then recalculated using an iterative approach
where techniques were added to improve adherence to national standards, such as terminology
and free-text mappings. This included techniques to deal with data heterogeneity in
clinical document as detailed in prior research.[14 ] Quantitative metrics on clinical encounters, problems, medications, laboratory results,
and vital signs were analyzed for the 1,100 patients and illustrative issues affecting
quality measurement were recorded. Changes in measure calculation were then extensively
tested against test cases made available by NCQA to determine if certification was
affected by the iterative improvement. Population counts of both denominators and
numerators were captured both before and after the iterative improvement process.
Results
Depth of Clinical Data
All 1,100 selected clinical documents were loaded into the quality measurement software
without error. Of the facilities selected, 4 (36%) submitted Healthcare Information
Technology Standards Panel C-32 Continuity of Care Documents and 7 (64%) submitted
HL7 C-CDA 1.1 Continuity of Care Documents. Patient age ranged from 0 to 99 at the
beginning of the measure period. A total of 589 (53.5%) of patients were female and
510 (46.4%) were male with one patient not having gender recorded as male or female.
Content extracted from the clinical documents included 12,308 clinical encounters,
3,678 immunizations, 20,723 medications, 25,921 problems, 17,959 procedures, 45,704
diagnostic results, and 32,944 vital sign observations. All 11 sites produced clinical
documents with information in the domains of patient medications, problems, procedures,
results, and vital signs. The majority of clinical encounters represented annual wellness
visits and typical evaluation and management distributions for ambulatory encounters.
For nine of the sites, data included information for prior clinical visits dating
back months and years prior. Historical data are important for several of the quality
measures that examine prior clinical information (e.g., past colonoscopies for colon
cancer screening). For two of the sites, the clinical documents were more limited,
sending data primarily related to the most recent clinical encounter.
Nonnormalized Measure Calculation and Focus Areas for Improvement
Using the clinical data without any transformation, quality measures were calculated
using certified technology for a 12-month period from July 2016 to June 2017. Results
for individual patients was collected using standard reporting formats of the software
and presented in [Table 2 ] (the “Calculation before Iterative Improvement” column).
Table 2
Quality measure calculation before and after iterative improvement
Calculation before iterative improvement
Calculation after iterative improvement
CMS identifier
Measure description
Denominator
Compliance
Denominator(% change)
Compliance(absolute change)
74v6
Primary caries prevention
107
4.7%
164 (+53%)
3.0% (–1.7%)
122v5
Diabetes: Poor HbA1c control
20
45.0%
78 (+290%)
37.2% (–7.8%)
123v5
Diabetes: Annual foot exam
20
0.0%
78 (+290%)
0.0% (NA)
124v5
Cervical cancer screening
88
0.0%
182 (+107%)
7.1% (+7.1%)
125v5
Breast cancer screening
64
0.0%
120 (+88%)
9.2% (+9.2%)
127v5
Pneumonia vaccination of older adults
113
55.8%
204 (+81%)
55.9% (+0.1%)
130v5
Colorectal cancer screening
117
1.7%
237 (103%)
14.3% (+12.6%)
131v5
Diabetes: Annual eye exam
20
0.0%
78 (+290%)
0.0% (NA)
134v5
Diabetes: Attention for nephropathy
20
35.0%
78 (+290%)
69.2% (+34.2%)
146v5
Appropriate testing for children with pharyngitis
0
NA
50 (NA)
9.1% (NA)
153v5
Chlamydia screening for women
0
NA
5 (NA)
20.0% (NA)
155v5 Rate 1
Pediatric weight assessment: BMI percentile
81
0.0%
123 (+52%)
22.0% (+22%)
155v5 Rate 2
Pediatric weight assessment: Nutrition counseling
0.0%
0.0% (NA)
155v5 Rate 3
Pediatric weight assessment: Activity counseling
0.0%
0.0% (NA)
156v5Rate 1
High risk medication use in elderly: 1 medication
109
100%
196 (+80%)
98.5% (–1.5%)
156v5Rate 2
High risk medication use in elderly: 2 or more medications
100%
100% (NA)
165v5
Controlling high blood pressure
44
34.1%
190 (+332%)
36.4% (+2.3%)
Measures not included in iterative improvement
82v4
Maternal depression screening
1
0.0%
Not available
154v5
Appropriate treatment for children with upper respiratory infection
44
100% (73% excluded)
Not available
166v6
Use of imaging studies for back pain
2
Not available (100% excluded)
Not available
Abbreviations: BMI, body mass index; CMS, Centers for Medicare and Medicaid Services;
NA, not available.
Of the 17 measures, most measures showed unexpectedly low proportions of eligible
patients (i.e., denominators) both relative to disease prevalence and patient demographics.
For example, a recent report identified 9.7% of adults in Kansas as having diabetes,
but only 1.8% of the 1,100 patients qualified for the diabetes measures examined.[23 ] Consequently, one area for examination and iterative improvement was to increase
the number of eligible patients (“Iterative Improvements for Patient Inclusion”).
Of the 15 measures with at least 1 eligible patient, 9 showed no clinical events associated
with the measure numerator, resulting in either 0 or 100% compliance. These rates
called into question the validity of the calculation. Consequently, a second area
for iterative improvement was to examine if data transformations would improve the
accuracy of compliance rates (“Iterative Improvements for Quality Measure Compliance”).
Iterative Improvements for Patient Inclusion
Eligible Population Improvement for Encounters. Each of the 17 quality measures as defined by the measure steward requires a face-to-face
encounter or office visit in the measurement period for the patient to be eligible
for quality measure calculation. Since our information drew directly from interoperable
documents from EHRs, the codes used in encounter documentation often lacked this specificity.
An example is shown in [Fig. 2 ], where no specific code is shown in the yellow highlighted XML, although the human-readable
text provides the context of the visit.
Fig. 2 Illustrative example of encounter normalization. This example from a clinical document,
edited to protect patient identity, demonstrates how code omission in the XML (highlighted
in yellow) would normally exclude this patient from being included in quality measures.
Using the text of “office visit” in the reference tag, however, allows a valid code
to be selected from appropriate terminology.
Using automated mapping available in the software, the reference between the human-readable
narrative and machine-readable content were used to assign a code for this encounter
based on the text of “Office Visit.” The software uses a simple text-matching algorithm
using exact keywords in the text (e.g., “Office Visit,” “Hospitalization,” “ER Visit”)
to assign an appropriate code when not appropriately codified in the machine-readable
portion. The code selected was “308335008 (Patient encounter)” from the Systemized
Nomenclature of Medicine (SNOMED) which qualified this patient encounter for quality
calculation. Analogous encounter normalization techniques were performed on all 1,100
patients.
Eligible Population Improvement for Problem Inclusion. Several of the quality measures require patients to have a specific diagnosis before
a specific date for inclusion in the quality measure. For example, for inclusion in
the diabetes measures, a patient must have an eligible SNOMED, International Classification
of Diseases (ICD)-9, or ICD-10 code on or before the measure period. Real-world documentation
of onset dates, however, is often lacking in EHRs. This may be due either to the information
not being known or from clinicians skipping over fields when documenting in the EHR.
Nine measures selected for this sample require a specific problem to be documented.
These include diabetes (measures 122v5, 123v5, 131v5, 134v5), pharyngitis (146v5),
pregnancy or sexually transmitted disease (153v5), respiratory infections (154v5),
hypertension (165v5), and back pain (166v6). We examined all 25,291 problems that
were documented on the 1,100 patients to determine the documentation of the time of
problem onset. Note that 51.7% of problems had no onset date documented. In addition
to the omission of problem onset date, we also examined other sections in the clinical
documents which may contain problems that were not on the problem list. These included
the history of past illness and the encounters sections. We found 5,483 incremental
problems or diagnoses in these sections, which represented a meaningful percentage
(21.1%) of overall problems.
To address these issues, we used all sections of clinical documents that may include
problems and changed our measure logic to address problem onset omission. Specifically,
if a problem was documented as active, we assessed that the onset date must have been
prior to visit date (i.e., it is not reasonable that any clinician would document
a problem to occur in the future).
Iterative Improvements for Quality Measure Compliance
Compliance Improvement through Value Set Mapping. Electronic clinical quality measures use a set of codes, often referred to as “value
sets,” to determine whether a specific activity was performed. For example, with breast
cancer screening (125v5), the measure specifies a value set of mammography studies
that would qualify a mammography as being performed. Through the examination of specific
records, we found the specific codes used in these value sets have a material impact
on quality measure calculation. With mammography, all the specified codes were from
Logical Observation Identifier Names and Codes (LOINC). As shown in [Table 2 ] for mammography, none of the eligible patients for this measure had one of those
LOINC codes in the appropriate time period since the compliance rate was 0%. This
electronic clinical quality value set for mammography, however, varies from the value
set for the equivalent HEDIS measure for mammography, which allows for Current Procedural
Terminology, ICD-9, and ICD-10 codes.
We contacted NCQA, who is a measure steward for 16 of the 17 measures included in
this research, to discuss this specific concern. They agreed that for the measures
where codes were included in HEDIS, equivalent concepts are acceptable through mapping
(Smith A, Archer L, at National Committee for Quality Assurance, phone call, November
2017). This significantly increased compliance for the cancer preventative screening
measures (124v5, 125v5, 130v5). This process would be expected to have had an impact
on the two diabetes measures (123v5, 131v5) although no change was observed based
on the small eligible populations for these measures.
Compliance Improvement through Medication Normalization. Electronic clinical quality measures use a national standard vocabulary, RxNorm,
established by the National Library of Medicine for medication-related logic. RxNorm
is a normalized naming system that contains concepts spanning ingredient, coordinated
dose forms, generic name, and brand names. When value sets are created for medication
usage, however, they often include only generic concepts, omitting branded and ingredient
concepts. There are significant challenges in using such a limited value set. First,
we found that 3,095 (14.9%) of medications collected in this sample are not coded
in RxNorm. These likely included medications affecting measure calculation, but without
terminology mapping would provide inaccurate results. Second, we found that the term
types of RxNorm codes in real-world data often did not match the measure value set.
Specifically, only 12,146 (69.3%) of RxNorm-coded medications were mapped to a generic
drug concepts that align with quality measure value sets. The combined effect of medications
not coded in RxNorm and not mapped to generic medication concepts are that only 58.6%
of real-world medications from our samples appropriately functioned with quality measures
that include medication logic.
The resolution to this inability to identify medications for measure calculations
was to use terminology mapping of medications that were available in the research
software. This mapping included relationships between the RxNorm term types publicly
available as well as proprietary technology for the free-text mapping of medications
names. This successfully mapped 18,767 (90.6%) of the original medications to a usable
RxNorm concept which could then be applied to the quality measure logic. For the remaining
1,956 medications that were not mappable, manual review showed that 460 were vitamins
(e.g., multivitamins that did not specify content), 360 were medical supplies (e.g.,
lancets, test strips, nebulizers), and 191 were “unknown” or null entries. These types
of entries were not applicable to the quality measures selected. This left 945 (4.5%)
of medication entries not available to quality measure logic. Several of these were
actual medications, but others were concepts recorded in a manner which did not detail
a specific ingredient (e.g., “allergy immunotherapy” or “hormones”). The effective
yield of usable medication data was approximately 95% (18,767 mapped medication entries
vs. 945 unmapped medication entries).
Once translations were performed, it was also necessary to adjust the logic associated
with medication administration before medication quality logic would function appropriately.
Specifically, 17,505 (84.5%) of all medications were recorded in clinical document
as medication orders (i.e., HL7 moodCode of “INT”). Of those, however, 14,318 (81.8%)
had an associated start date at or before the clinical encounter. For medications
that had a start date in the past, we treated them as administered medication events
rather than intentional. This allowed the medication duration logic of High Risk Medications
in the Elderly (156v5) to function (i.e., have at least 1 numerator event). This issue
may stem from poor implementation of the clinical document standards as detailed in
prior research.[14 ]
Compliance Improvement through Laboratory and Vital Sign Normalization. Often laboratory information recorded in EHRs does not meet the value set of laboratory
results in quality measures. This impacted the diabetes control measure (122v5) which
required HbA1c results. Using all the result data in the collected information, 4.1%
of all HbA1c results were found to not have the appropriate LOINC code. In addition,
14.8% of these HbA1c results did not use the appropriate unit of measure (i.e., %)
for the laboratory results. An even larger impact was shown among laboratory results
related to the diabetes nephropathy measure (136v5), where 18.3% of results were not
shown to have appropriate code. For pediatric body mass screening measure (155v5),
while vital signs used the appropriate LOINC code for body mass index (BMI), 35.1%
did not use the appropriate unit (i.e., kg/m2 ). The solution to this was to normalize laboratory and vital signs using both code
mapping and unit translation to transform data for the above examples which affected
measures 122v5, 134v5, and 155v5.
Compliance Improvement through Logic Changes. Finally, additional logic changes were attempted for three pediatric-related measures.
For the pediatric testing of pharyngitis, the relationship between the timing of the
encounter, medication start, and problem onset were simplified. For the treatment
of childhood upper respiratory infections (154v5), we found that the relationship
between encounter timing, problem onset, and medication timing could not be simplified
to make this measure include a reasonable portion of patients. Attempted resolutions
for this measure were unsuccessful. For the measure relating to pediatric weight (155v5),
we found that the requested vital sign of BMI percentile was never recorded in interoperable
clinical documents we examined. Using the data that was recorded on BMI, gender, and
patient age, however, permitted the calculation of the appropriate percentile for
part of this measure (i.e., BMI percentile was unambiguously known from information
provided).
Resultant Quality Measure Calculations
Of the original 17 measures selected, we found two measures (166v6 and 82v4) where
the eligible population remained under 5 patients from the sampled population of 1,100.
In addition, all attempted changes to the treatment of upper respiratory infections
measure (154v5) were not able to reasonably reduce the exclusion rate. These three
measures were considered to be nonfunctional despite attempts to increase the eligible
populations in ([Table 2 ]—“Measures not included in Iterative Improvement”). For the remaining 14 measures,
we report both the original and the normalized quality measure rates in [Table 2 ] (“Calculation after Iterative Improvement”).
The overall impact of the iterative improvement on the eligible population increased
the denominator populations across these 14 measures from 803 to 1,783 (+122%). This
counts the same patient multiple times when the patient qualifies for multiple measures.
The number of unique patients included in at least one measure increased from 315
to 601 (+91%).
The overall impact of the iterative improvement in compliance was varied. Five measures
saw an increase from no applicable compliance to a nonzero number. One measure decreased
from 100% compliance to a lower rate. Three measures had at least one rate component
remain at zero compliance despite attempts to improve compliance. Other measures had
small or moderate changes in reported compliance.
Once these changes were made, the 14 revised measures were extensively tested to determine
if certification compliance was maintained. Appropriate Testing for Children with
Pharyngitis (146v5) was found to not maintain certification. While data are presented
for this measure, the revised logic could not be used in reporting. Certification
for the other 13 measures was unaffected since techniques for free-text normalization,
terminology mapping, or missing data as addressed through the iterative improvement
do not affect certification test data, which include only properly structured data.
Discussion
Implications of these results can be categorized into two domains: considerations
for measure authors and stewards and considerations for organizations performing quality
calculation.
Considerations for Measure Authors and Stewards
Quality measure development is a difficult task often done in the abstract; authors
lack heterogeneous clinical data sets to validate logic and examine how real-world
documentation practices affect calculations. Our findings support the need for measure
developers to better understand how the routine collection of clinical data impacts
quality measurement, as policymakers have acknowledged.[24 ] That requires access and testing with real-world data before a measure is released
for use. This will help measure authors evaluate the inherent limitations of terminologies,
value sets, discrete data entry, and cohort definitions in the process of measure
development. It also helps identify gaps between clinical data collection and the
available data for reporting. This study validated that interoperability standards
for clinical documents, as promoted by the Meaningful Use program, is a viable strategy.
In addition, the use of interoperability standards provides a clear audit trail back
to the source EHR. Auditing using interoperability standards can include both the
original source information and any data transformations performed. This becomes increasingly
important as both private and public payers use quality measure performance for provider
payment.
Another finding is the importance of measure consistency across programs. We observed
that value sets for terminologies varied substantially from HEDIS to electronic clinical
quality measures. Specifically, some terminologies included in HEDIS were excluded
in clinical quality measures. This caused several preventative measures to report
zero compliance, when any observer would find evidence of the clinical prevention
in the data. We strongly believe that there should be alignment and compatibility
of value sets across measure programs, particularly since providers have been encouraged
to document in a way which supports older programs such as HEDIS. This need for consistency
also applies to how patients are qualified for measures as documented in other research.[25 ] Electronic clinical quality measures incorporate the concept of a specific type
of visit before a patient is eligible for quality measure calculation. The lack of
proper encounter coding in EHRs creates a burden in this domain. HEDIS measures apply
to broader member populations based on billing profiles, while electronic clinical
quality measures are artificially restricted. Such attribution logic also overlooks
patients who go 12 to 24 months between physician visits and emerging modalities where
virtual encounters are used for patients in good health. We believe that measure eligibility
logic should recognize these concerns to ensure greater consistency across programs.
Finally, poor documentation practices, such as free-text order entry or missing qualifiers,
should never result in better compliance. In the example of high risk medications
in the elderly, we found higher compliance when medication data were not normalized.
This rewards clinicians and technologies that do not record medications in the standard
terminology. Since we found 41% of medications were not in the expected term type
of RxNorm, this issue of normalization for complex clinical data, such as medications,
will remain important for the near term.
Considerations for Organization Performing Quality Calculation
This study validates that the strategy promulgated by MACRA to establish QCDRs for
quality measurement is technically feasible for at least several measures. It also
demonstrates viability of collecting clinical data from various sources using interoperability
standards that could be adopted by integrated delivery systems with multiple in-house
EHRs. While the compliance rates reported for selected measures vary from known benchmarks,
we believe that to be reasonable given the limited data examined and the fact that
selected facilities were not known to have any focus on the selected measures. Measure
selection by QCDRs will be important based on the findings of this research. Also
important will be the selection of a technology vendor to collect and normalize clinical
data. Our findings substantiate the value in transforming clinical data collected
using interoperability standards, as had been previously demonstrated for individual
EHRs.[26 ]
In addition, clinical documentation practices should always remain a priority when
working with providers who intend to use a QCDR to support electronic clinical quality
measurement. For several of the measures with low or zero compliance rates, the information
required is often not structured in the appropriate place to be available for quality
measure calculation, as documented in prior research.[27 ] For example, we never found nutritional or physical activity counseling to be documented
as a particular code for the pediatric weight assessment measure, but we fully expect
this was performed on at least some of the 123 eligible pediatric patients. Previous
research has validated that practice type, size, and experience with EHR technology
have significant impacts in data availability for quality reporting.[28 ] Further work with local practices and EHRs will be required to implement tactics
that will increase data completeness.
Since QCDRs have access to real-world data and the ability to author measures, they
are in a unique position to advance the state of quality measure development. We believe
that cross-industry collaboration between QCDRs and payers needing quality measurement
for value-based contracting measure will be critical. These collaborations could include
deidentified data repositories for new measures, measure validation using real-world
clinical data, and best practices in data transformation to support quality measurement.
Finally, some QCDRs are tightly integrated with a health information exchange, and
we believe this research highlights an important implication. Improving clinical data
will not only improve clinical quality measurement but will also improve care transitions
and improvement objectives supported by HIEs. We believe that using interoperability
standards to empower quality measurement provides an incentive and feedback loop to
improve interoperability generally.