Keywords Data quality - data accuracy - diabetes mellitus - type 2 - medical record systems
Introduction
Computerised medical records (CMRs) whilst originally introduced to improve quality [[1 ],[2 ]], now provide large volumes of data that can be used to assess and compare the quality of healthcare on a global scale. Although several frameworks for healthcare quality assessment have been developed, one of the most influential frameworks is that set out by the US Institute of Medicine (IOM) [[3 ]]. This framework defines six aims for healthcare systems ([Box 1 ]). CMRs potentially provide the ability to rapidly amalgamate and analyse data on a large scale to assess quality [[4 ]].
Box 1 The six areas of healthcare quality as suggested by the IOM [[3 ]]
Safe: avoidance of patient harm
Effective: evidence based-practice
Patient centred: care tailored towards the individual
Timely: reduction of harmful delays
Efficient: avoidance of waste
Equitable: providing the same care quality to all
An aging population has an increased prevalence of chronic illness, of which type 2 diabetes mellitus (T2DM) is an important condition. The global prevalence of T2DM is continuing to rise, and the condition is now considered a global pandemic [[5 ]]. Developing and disseminating evidence to support chronic disease management (CDM) is a significant challenge for several reasons including the need to collect and integrate multiple disparate data sources while also needing timely turnaround of findings to support patient care delivery [[6 ]]. Whilst high quality data is needed to support ongoing analyses as part of improving health outcomes, in reality, the collection, analysis, and re-circulation of findings is often done in a piecemeal fashion and not as part of a well-defined framework.
The IMIA Primary Health Care Informatics Working Group (PHCIWG) has a long interest in data quality. Its work has included defining key concepts to assess readiness of data for use in research [[7 ]] and systematic methods to underpin pooled use of data and interoperability [[8 ]]. A comprehensive realist review of the data quality literature helped to determine prerequisites for using routine data for integrated care [[9 ]] and to support practice-based research networks [[10 ]]. Strategies included structured data quality reports and feedback sessions [[11 ]] and working with unstructured CMR data [[12 ], [13 ]]. Research was also conducted to assess the effect of data quality on the development of clinical predictive models and the identification and development of cohorts and registries, using both structured [[14 ]] and unstructured [[15 ]] data.
This paper presents the results of an investigation we carried out to explore how the comparison of routinely collected data could allow the comparison of the quality of care in T2DM across various countries.
Methods
We provide a two-component analysis of global data quality. Firstly, we undertook a literature review of data quality research with a focus on data quality in diabetes. Secondly, we undertook a survey of working group members to give a contemporary cross-section of data quality in existing databases about the quality of care of people with T2DM.
Literature Review
We carried out a literature review to identify published research related to data quality in studies related to T2DM patients. We searched PUBMED/Medline, Scopus, Web of Science, CINAHL, and the Cochrane Database for publications related to this topic. The search terms used comprised “data quality” (or “data accuracy”) and “diabetes”.
An overview of the literature review is given in the adapted PRISMA flow chart shown in [Figure 1 ]. Research papers published between November 2006 and November 2016 were included to provide an overview of the last ten years of literature. We limited the literature search to publications written in the English language. We included relevant studies from the references cited within the studies returned by the search and additional studies known to us from our experience in this domain.
Fig. 1 Adapted PRISMA flow diagram of the literature search
The initial search yielded a total of 321 publications, which was then reduced to 263 after removing duplicates. By conducting a title and abstract review, we further reduced the number of relevant studies to 25. From the analysis of references and the addition of studies we were previously aware of, three studies were added. These 28 publications were used for the final in-depth review.
We provide a narrative summary on what is known about diabetes data quality in existing databases on the basis of the papers identified by this search strategy. We also provide comments on the limitations of the current literature and suggestions for future research in this area.
Survey for Assessing Diabetes Data Quality and Availability
We conducted a survey analysis of T2DM data quality, collected and recorded in existing large datasets from various countries. We surveyed data recording across the six areas of healthcare quality defined by the IOM ([Box 1 ]).
Using a preformatted data collection form (Appendix 1), we collected data for each database on database country, information source, upload frequency, funding source, and denominator population. We also queried the method used by database analysts to identify T2DM cases. For each areas of healthcare quality we carefully selected one or two key outcome measures relevant to the care of T2DM based on national and international diabetes guideline recommendations. We chose retinopathy as a marker of intervention efficacy (rather than other diabetes outcomes) as this is the key microvascular diabetes outcome which has historically determined diabetes targets [[16 ], [17 ]]. We distributed the data collection form by email to healthcare database representatives and diabetes academics identified via various professional networks. In the case the recipients reached were not the right persons to answer the survey, we asked that they forward the email onto someone better placed to complete the survey, known as snowball sampling [[18 ]]. We distributed 35 surveys in total.
We provide a summary of the survey results, including the proportion of incomplete data in responses.
Results
Literature Review
We found limited evidence in specific studies focusing on quality of data for diabetes research. Most work was in relation to generic data quality metrics and assessment where diabetes was considered as a comorbidity or a risk factor. We found two overarching themes during the detailed analysis. The first theme was related to the challenges and issues in collecting data (CMR extraction, data quality), while the second theme was about using the data to monitor disease management.
Goudswaard et al. conducted a cross-sectional survey in general practitioner (GP) practices in the Netherlands and were unable to evidence an association between incomplete clinical records and glycaemic control (although they reported that low levels of recording affected the systematic delivery of care) [[19 ]]. Keating et al. assessed quality indicators using administrative and medical records from a cohort of diabetic patients [[20 ]]. They reported that tests such as HbA1c and LDL cholesterol have a higher recording in CMR data while retinopathy screening was more often recorded in administrative data. Richesson et al. conducted a study to compare phenotype definitions for diabetes considering seven different phenotype definitions [[21 ]].
Two studies analysed data quality within a UK-based research and surveillance primary care network [[22 ], [23 ]]. They reported a high level of data recording in people with diabetes, such as ethnicity identification (82.1%), smoking status (99.3%), alcohol use (93.3%), glycated haemoglobin (HbA1c; 97.9%), body mass index (98.0%), blood pressure (99.4%), cholesterol (87.4%), and renal function (97.8%).
Epidemiology cohorts focused on diabetes have provided better quality datasets for research by assembling data from multiple data sources (such as clinical records, prescriptions, laboratory tests, and mortality files). Linkage across multiple dimensions of care has successfully increased the quality of care [[24 ]]. Furthermore, validation studies have demonstrated that data related to processes and outcomes of diabetic care is well recorded within primary care [[25 ]]. Bailie et al. have demonstrated that the consistency of denominator data is important when comparing indicators over time and between services [[26 ]].
Governance of data networks and health information sharing were important and constrained data use.
Survey Results
We received 14 responses from seven countries (Australia, Canada, Italy, the Netherlands, Norway, Portugal, Turkey, and the United Kingdom), a response rate of 40%. The most common data source among the databases surveyed was primary care health records ([Figure 2 ]). All databases surveyed were actively updated, most of them on a daily basis ([Figure 3 ]).
Fig. 2 Graph – Contents of databases (sent by responders) (Survey Question 4)
Fig. 3 Graph – Frequency of update (Survey Question 5)
A wide variety of data types has been used to identify people with T2DM from these databases ([Figure 4 ]). The majority of databases used multiple data types ([Table 1 ]) to identify diabetes (three databases used two data types, two databases used three data types, and five databases used four data types). The population denominator of responding databases included four national databases, four practice networks, four locality-specific databases, and two disease registers.
Fig. 4 Methods used for identifying T2DM (Survey Question 10)
Table 1
Methods used for identifying T2DM by respondent (Survey Question 10)
Respondent
Source
Process of care codes
Drug codes
Test results
Diagnosis codes
Number of data types
1
Non-primary care
0
1
0
1
2
2
Non-primary care
0
0
0
0
0
3
Non-primary care
1
0
0
1
2
4
Non-primary care
0
1
0
0
1
5
Non-primary care
1
0
0
0
1
6
Non-primary care
0
0
0
1
1
7
Primary care
1
1
1
1
4
8
Primary care
1
0
1
1
3
9
Primary care
1
1
1
1
4
10
Primary care
0
0
1
1
2
11
Primary care
1
1
1
1
4
12
Primary care
1
1
1
1
4
13
Primary care
1
1
1
0
3
14
Primary care
1
1
1
1
4
The recording of data within the selected healthcare quality areas was highly variable across the databases surveyed. In general, data recording was better in data from primary care sources than in data from other sources ([Figure 5 ]). Recording within the areas related to patient-centred care, timeliness of care, and system efficiency was lowest. Of the 14 databases surveyed, only one had partial or complete data recording in every healthcare quality area assessed.
Fig. 5 Areas being recorded and the level of recording (Survey Question 11)
Discussion
Principal Findings
We have identified a number of large databases which could be used to compare healthcare quality in T2DM across various countries. Most of these databases reported very frequent data uploads and would be capable of real time or near-real time analysis of healthcare quality. The majority of databases recorded data related to safety (particularly medication adverse events) and treatment efficacy (glycaemic control, microvascular disease, and retinopathy). This may potentially be owing to the data being recorded numerically within the CMRs. Data which could be used to measure equity (patient ethnicity and socioeconomic status) were less recorded (just in over 50% of databases). The lowest levels of recording were observed in the areas of patient-centred care, timeliness of care, and system efficiency with the majority of databases containing no data in these areas. These areas are multidimensional as opposed to data associated to hypoglycaemia or glaucoma. Databases using primary care CMR data had higher data quality across the areas measured than those from other sources. Timeliness of care may be simpler to derive from recorded data than to be recorded as such into CMRs. Through these findings, we propose a taxonomy considering the dimensionality of the measures: single and simple measures, derived measures (such as timeliness of care), multidimensional measures (such as an index of patient-centred care or a depression scale or a Bartel’s index) [[27 ]].
Implications of Findings
Whilst healthcare databases provide an excellent opportunity to monitor healthcare quality due to their large size and the high frequency data collection, they were not designed with this purpose in mind. Data recording presently appears to fit the traditional medical model [[28 ]] with data recording focused on the measurement of disease-related parameters rather than on other important healthcare quality outcomes: clear documentation of patient-centred targets and societal considerations is limited in existing datasets. Almost universally, most recent T2DM guidelines emphasize the importance of individualised care [[29 ]–[31 ]]. So this is perhaps a surprising, if not disappointing, finding.
Hypoglycaemia was not at all recorded in a large proportion of databases. This is an extremely important clinical safety issue in people with diabetes which merits close clinical scrutiny [[32 ]]. The absence of data recording in this area is a concern.
These results seem to indicate that primary care sourced data has better levels of recording across all health care quality areas when compared to other data sources, and it is likely that primary care data provides better opportunities for monitoring healthcare outcomes globally. However, further studies need to be conducted in order to validate these findings.
In addition to the findings from the survey, CMRs can be designed to calculate individual risks such as multi-morbidity, polypharmacy, and other elements important in individual risk assessment (e.g. absolute cardiovascular risk calculation).
Comparison with the Literature
In our literature review, we found few specific studies focusing on quality of data for diabetes research. Most works were in relation to generic data quality metrics and assessments where diabetes was considered as a comorbidity or a risk factor. Large scale studies of diabetes related data quality are needed. To the best of our knowledge this is the first assessment of the quality of diabetes data from large datasets provided by different countries. Beyond this initial analysis, further work is needed to quantify data quality in each key area directly from these databases.
Limitations of the Method
The limitations of this analysis are primarily those of survey studies: possible bias caused by incomplete responses to questionnaires and the limitations of self-reported data quality (by those with a role as database representatives). We were also probably only aware of a small selection of all the healthcare databases which exist globally, and therefore our survey provides data only from a small selection of world-wide databases. In addition, we did not conduct a systematic assessment of whether the databases used are comparable with respect to data quality and population demographics. Nevertheless, given these limitations these data provide a starting point for further analyses in this area and a framework for assessing data quality of databases for their value in assessing healthcare quality.
Conclusions
Whilst large databases have great potential for monitoring healthcare quality, the recording of healthcare outcomes data in a number of key areas requires considerable improvement internationally. The areas requiring improved data collection include patient-centred care, timeliness of care, and system efficiency. Improved data quality in these areas would facilitate local healthcare quality improvement and comparison of quality nationally and internationally. Despite these limitations, primary care derived datasets showed high levels of data recording across all domains of healthcare outcomes and therefore may provide the best data sources for healthcare quality analysis.
Appendix
IMIA Primary Health Care Informatics Working Group: Survey of Health Data Sources to Support Diabetes Studies
With the global prevalence of type 2 diabetes continuing to rise, accurately measuring the global disease burden from this condition is vital. Large datasets are the ideal way to collate this information but in order to assess their compatibility and comparability an assessment of data quality is needed across these datasets.
Using this survey, we intend to conduct an international comparison of data sources available for diabetes studies. The results of the study will be published in the Yearbook of Medical Informatics, 2017 edition.
DATABASE DESCRIPTION
1. Country: _____
2. Name of the database/register: _____
3. Database/register website (URL): _____
4. Which of the terms given below could be used to classify the data contents of the database? (select all that apply)
Primary health care
Outpatient electronic medical records
Community /ambulatory care records
Inpatient electronic medical records / hospital
Health care reimbursement claims, including date and place of service, patient, diagnoses, treatment.
Pharmacy dispensing records
Specialized care consultations
Specific registry (inc. chronic or rare disease, cancer registries)
Health surveys
Other (please specify) _____
5. Please indicate how frequently the database is updated: ? Daily (ongoing data entry)
Weekly
Monthly
Three monthly
Six monthly
Annually
Not updated
6. Funding source for database: _____
POPULATION DENOMINATOR
7. Is the database population from (delete as applicable):
8. Population covered: e.g. whole country / defined locality (please specify) _____ / Other (please specify) _____
9. Age range covered: _____
CASE DEFINITION
10. Methods used to identify people with type 2 diabetes (delete as applicable):
MEASURES OF HEALTHCARE QUALITY
11. Are the following areas recorded?
ADDITIONAL INFORMATION
12. Please use the space below for any additional information _____