Methods
This section covers the design and implementation of an interoperable methodology that provides both conceptual and software frameworks for DQ assessment. We first define relevant terms, requirements, and DQ metrics used in our conceptual framework before we present the implemented software framework and validation methods in detail.
Definition of Used Terms
Since the terms used in the literature on DQ often represent different abstraction levels of data management, we propose a detailed clarification of terms and definitions used in this paper as shown in [Tables 1] and [2].
Table 1
Definition of used terms
Term
|
Definition
|
DQ parameter
|
We use this term to denote valueless quantities of observation units, such as cases and patients. DQ Parameter do not allow any evaluation of DQ. However, appropriate DQ indicators are determined based on these parameters.
|
DQ indicator
|
DQ indicators are usually defined as dimensionless relative values or rates that are assigned to different categories of DQ, also called quality dimensions. In this paper, DQ indicators are expressed as percentage rates. A high value indicates high quality of data while a low value indicates possible DQ deficiencies.
|
Data item
|
This term is often used synonymously with the term data element or feature to specify a required atomic data property for a given use case. However, it is sometimes also used to describe a concrete value of this data property. In this paper, we use this term to denote an abstract specification of an atomic data property required for a given use case on the level of metadata (see [Table 2]).
|
Information model
|
We describe the smallest set of required data items for a specific use case as an information model (see [Table 2]). We would like to note that the term information model is also sometimes called data set, for example the MII core data set.[20]
|
Data value
|
This term represents the concrete value of a data item within a given data set. It is also sometimes called a data field (see [Table 2]).
|
Data vector
|
We describe the available set of data values for a specific item as a data vector (see [Table 2]).
|
Data record
|
We use this term to describe a set of data values that is collected in one row to represent required information about an observation unit such as an individual patient or an individual case (see [Table 2]).
|
Subject record
|
A data set can also be divided into multiple records to capture information about involved subjects in a given study. We therefore introduce subject record as a set of data records required to capture information related to an individual subject such as inpatient or outpatient. In this context, we would like to note that the data of an individual patient as a study subject could be recorded in multiple data records, e.g., if a patient has various cases or diagnoses and the observation unit is case and not patient.
|
Data set
|
The set of multiple data vectors available for a given use case represents an instance of the used information model, which we denote as a data set (see [Table 2]). Since the extracted data are in a structural form, the concept in this context is also called data frame in programming environments such as R or Python.
|
Abbreviation: DQ, data quality.
Table 2
Usage of terms in this paper: The terms information model and data item refer to the specification of the data to be collected. They are therefore assigned to the metadata. The terms data value, data vector, and data record refer to the concrete data to be collected, while the term data set represents the entire instance of the used information model
|
Data Quality Challenges and Requirements
The MII-CDS information model defines the semantics of required data items and, as consequence, provides the basis for developing harmonized DQ assessments across the MII including the CORD-MI network. The common data items specified in the MII-CDS are grouped into different modules such as Person, Treatment Case, and Diagnosis and are modeled as Fast Healthcare Interoperability Resources (FHIR) profiles related to the FHIR resources patient, encounter, and condition, respectively. The FHIR is a well-known standard describing data formats, resources, and the application programming interface for transferring EHR data between different software applications. This standard is developed by the Health Level 7 (HL7) international standard organization to achieve health care systems interoperability.[21] It is increasingly used for exchanging medical data for clinical research purposes. As a high outlier rate or missing rate in the required data items and values will raise concerns about the quality of scientific outcomes produced by CORD-MI use cases, the completeness and plausibility of data in the MII-CDS are therefore important aspects of DQ that are to be investigated in this study.
A feature of the large and diverse classes of RDs is their overall poor diagnostic representation in hospital documentation. Less than 10% of distinct RD diagnoses can be codified using a unique code in ICD-10-GM. Often, RDs are subsumed in unspecified ICD-10 codes that encode either both common and rare diseases or a single code for several distinct RDs, consequently rendering the majority of RDs invisible or indistinguishable.[22]
[23] University hospitals in CORD-MI, therefore, advanced the application of specific RD coding with OCs. The mandatory coding system ICD-10-GM and the (so far) voluntary OC are however inherently different in their organization and granularity. According to the possible relationships of the number of ICD-10-GM codes to the number of referring OCs, we can classify ICD-10-GM codes in four types represented as 1:1, n:1, 1:n, and 1:m. Codes of type 1:1 or n:1 are unambiguous, while ICD-10 codes of type 1:n or 1:m are ambiguous because they represent a group of RDs (1:n) or they are mixed into common diseases (1:m). Without additional OCs, it is impossible to determine the correct semantics of RD diagnoses coded using ICD-10-GM codes of type 1:n or 1:m. Such quality issues hamper the secondary use of EHR data for clinical research purposes for many RDs. Hence, the semantic unambiguity and completeness of RDs codification represent essential aspects that are to be covered in our DQ concept.
The Alpha-ID-SE,[24] published annually by the Federal Ministry of Health authority BfArM (Bundesinstitut für Arzneimittel und Medizinprodukte), provides a uniform and standardized mapping of the two coding systems, therefore allows coding of RD according to the ICD-10-GM, on the one hand, and the OC, on the other hand. While certainly not complete in covering all clinical entities and levels of the multihierarchical Orphanet nomenclature, Alpha-ID-SE provides selected OCs for more than 5,000 distinguished RDs.[25] In the past, only very few German hospitals have implemented Orphacoding due to lacking incentives for clinicians to dedicate valuable time for supplemental coding, in addition to shortcomings in the commercial coding software, and the lack of an exhaustive standardized mapping of relevant ICD-10-GM codes and OCs.[6] In those institutions that already introduced the coding system, Orphacoding has been characterized by tailored in-house solutions and workarounds. Therefore, while bearing the potential of a huge and necessary improvement on the visibility of RD patients, Orphacoding in the last years, if performed at all, has been highly heterogeneous in the disease scope of use, the quantity and plausibility of usage, and finally the quality of selected codes in relation to ICD-10-GM. The legislature has responded to the necessity of OCs for all RD patients and from 2023 onwards,[26] coding according to Alpha-ID-SE will become mandatory for all documentation of inpatient cases with RDs. Any services or treatments that require hospitalization are considered as inpatient cases. Alpha-ID-SE will therefore be the gold standard for required plausible mappings of ICD-10-GM and OCs in Germany. Evaluating if and to what extent the legal requirements of an appropriate and complete coding for all RD cases has been met will consequently become a challenge for local as well as regional and national DQ monitoring in hospitals. The basis for this quality control should therefore be a subset of ICD-10-GM codes within Alpha-ID-SE that exclusively code for RDs and consequently must be followed by an OC. In this work, we refer to these ICD-10-GM codes as tracer diagnoses.
Besides DQ metrics, interactive feedback loops to potential users are needed to improve the quality of collected RD data. Potential users in this context are for example medical documentation assistants, medical controlling staff, and data scientists. To establish an interactive DQ improvement process, specific DQ reports on detected DQ issues are required. The user should be able to select desired DQ indicators in order to define flexible DQ reports that focus on particular aspects of DQ. The generated reports should also provide adequate information to find the DQ violations and the causes of these violations. In this context, an interoperable and privacy-preserving solution for evaluating the DQ of RD documentation in distributed data sources is required.
Definition of Data Quality Dimensions
Effective management of DQ in CORD-MI requires appropriate metrics and tools to assess the quality of data extracted from different HISs. In the literature, there is currently no consensus or standard framework for assessing the quality of clinical data.[10]
[11]
[12]
[13]
[14] Various DQ dimensions and related indicators have been proposed in previous related works. However, these metrics do not meet the specific requirements of CORD-MI use cases. In this work, two factors are considered for the selection of DQ metrics: (1) The selected dimensions should cover independent aspects of DQ, and (2) the definitions of indicators should reflect the individual requirements of implemented use cases. Based on the requirements specified above, the following dimensions have been selected for CORD-MI: completeness, plausibility, uniqueness, and concordance. Various synonyms and definitions are already provided in the literature for these dimensions. To avoid confusion, we propose the following definitions and subcategories in order to characterize the selected dimensions. [Fig. 1] shows the ontological structure of used DQ concepts and their semantic relationships.
Fig. 1 Ontology of used DQ concepts. The DQ dimensions defined by Kahn et al[11] are colored in gray. DQ, data quality.
We used the harmonized DQ terminology developed by Kahn et al[11] to denote the core concepts of DQ, namely plausibility and completeness. This harmonized terminology is widely used in international frameworks such as DQA tool and OHDSI.[27]
[28] We further extended these core concepts to specifically address relevant aspects of DQ with the subcategories semantic plausibility, range plausibility, and item completeness as the ontology in [Fig. 1] shows. Moreover, our DQ concept differentiates uniqueness from plausibility to avoid confusion. We focus on semantic uniqueness, which is necessary for the secondary use of clinical data as described in Section “Data Quality Challenges and Requirements”. In contrast, Kahn et al[11] define uniqueness as a subcategory of plausibility called “uniqueness plausibility,” which seeks to determine the frequency of duplicated objects in a given data set. The conformance metrics proposed by Khan et al were not applied in our DQ concept because the FHIR standard implemented in this work already supports conformance checks; instead, we used an additional dimension called concordance, which is important for cross-site DQ assessments and is explained below.
Completeness
Completeness represents the degree to which the relevant data are available. This dimension should be therefore evaluated by means of missing checks. There is wide consensus on this concept in the literature.[10]
[11]
[12] However, besides the well-known value completeness that measures the completeness of data sets for given information models, we introduce here the item completeness (see [Fig. 1]), which is a quality issue that specifically arises from multisite and multisystem data collections: the completeness of the local information models regarding an external reference model such as — in our case — the FHIR profiles of the MII-CDS. While item completeness investigates DQ issues on the level of metadata, value completeness focuses on the data itself. We distinguish these two main subcategories of the completeness dimension because resulting DQ issues would require different actions on different targets: value completeness must be accomplished by those who generate the data, while item completeness must be accomplished by configuring the EHR data entry mask or the Extract-Transform-Load (ETL) and FHIR mapping processes. Clinical data sets usually comprise multiple data vectors and data modules. Data vectors record individual data items, while data modules collect data item groups such as the FHIR resources patient or encounter. We therefore introduce two different categorial parts of value completeness as shown in [Fig. 1]: (1) vector completeness that focuses on the completeness of individual data vectors such as OC, and (2) module completeness which evaluates the completeness of specific data modules such as the case module described in Section “Completeness Indicators”. Similar to data modules, a given data set can also be divided into multiple subject records as described in [Table 1]. We therefore introduce a third categorial part of value completeness called subject completeness that investigates the completeness of specific subject records in a given data set such as inpatient or outpatient records.
Plausibility
Various synonyms are provided in the literature to describe this dimension such as correctness[10] and consistency.[12] The concept always describes deviations from expected values. However, there are three reasons why we think the term “plausibility” best describes the concept. First, consistency checks in logic and set theory are usually carried out by means of inference models.[29] These consistency checks require therefore a formal representation using results from computable set theory, which is time-consuming and typically not available for RD data.[30] Second, the assessment of consistency in mathematics requires a mathematical proof also called consistency proof.[31] Third, we can analyze the plausibility of acquired data using computer-based assessments. However, the correctness of these data can only be judged by domain experts. We have therefore avoided using the terms “consistency” and “correctness” to characterize this concept. In particular, we use this quality dimension to evaluate the plausibility of recorded data regarding the absence of outliers and semantic contradictions. The semantic and temporal dependencies of data values should be therefore evaluated using plausibility checks. In this paper, we differentiate between two subcategories of plausibility: (1) semantic plausibility and (2) range plausibility (see [Fig. 1]). Semantic plausibility represents DQ issues resulting due to violation of semantic models such as reference lists and ontologies, while range plausibility reflects the contravention of expected limits, for example, the violation of expected statistical distributions in a data vector.
Uniqueness
Uniqueness represents the degree to which the data are free from ambiguity and duplications. This dimension is very important for the reuse of collected data for new research purposes. We differentiate between two independent facets of uniqueness: (1) syntactic uniqueness, which investigates duplicated patient data with duplicated identities as well as duplicated events such as case or lab values and (2) semantic uniqueness, which focuses on the unambiguousness of the semantic interpretation. Ontologies and classification systems are usually used to semantically annotate clinical data. The accuracy of such annotations affects the quality of the data and their semantic interpretation. A detailed specification is therefore necessary to avoid the ambiguity of coded RD diagnoses. Moreover, the use of OCs representing specific RDs will improve the quality of RD documentation and captured data, especially on the level of semantic uniqueness. As described in Section “Data Quality Challenges and Requirements”, diagnostic information on RDs is often ambiguous. The ICD-10-GM code Q28.21, for example, is of type 1:n and represents different RDs that include cerebral arteriovenous shunt and cranial dural arteriovenous fistula as shown in [Table 3]. Consequently, it is impossible to determine the right diagnosis using such ICD-10-GM codes, although we can state that it is a patient with RD. Another example of an ambiguous code is the ICD-10-GM code E03.0, which is of type 1:m and therefore represents common diseases as well as RDs (see [Table 3]). This lack of semantic unambiguousness makes the reuse of RD data very difficult. On the contrary, each of these RDs has a unique OC. Hence, it is necessary to use OCs in order to identify patients with RDs in EHR data.
Table 3
Exemplary RD diagnoses from Alpha-ID-SE terminology version 2022 (first four columns) extended with two columns to classify the type of relationship between ICD-10-GM codes and OCs as well as the type of diagnoses
Alpha-ID
|
ICD-Primary Code
|
Orphcode
|
Label
|
Type of Relationship
|
Type of Diagnosis
|
I95787
|
E84.80
|
586
|
Cystic fibrosis
|
n:1
|
UTD
|
I18534
|
E84.9
|
586
|
Cystic fibrosis
|
n:1
|
UTD
|
I125102
|
K62.7
|
70475
|
Radiation proctitis
|
1:1
|
UTD
|
I98990
|
Q28.21
|
46724
|
Cerebral arteriovenous shunt
|
1:n
|
ATD
|
I119801
|
Q28.21
|
97339
|
Cranial dural arteriovenous fistula
|
1:n
|
ATD
|
I127608
|
E03.0
|
95716
|
Familial thyroid dyshormonogenesis
|
1:m
|
AD
|
I2008
|
E03.0
|
|
Congenital goiter
|
1:m
|
AD
|
I95978
|
E03.0
|
|
Congenital diffuse goiter
|
1:m
|
AD
|
I75872
|
E03.0
|
|
Congenital non-toxic goiter
|
1:m
|
AD
|
Abbreviations: AD, ambiguous Diagnosis; ATD, ambiguous Tracer Diagnosis; ICD-10-GM, International Classification of Diseases and Related Health Problems, 10th revision, German Modification; OC, Orphacodes; RD, rare disease; UTD, unambiguous Tracer Diagnosis.
Concordance
There are various definitions of concordance reported in the literature.[32]
[33]
[34] According to Snowden et al,[35] the conceptualization and use of the term concordance differ between the various disciplines because the expression of this concept depends on the political, professional and legal drivers of these disciplines. However, the agreement aspect is a common understanding between different domains. In the context of databases, this concept usually describes the comparison of the data values of a given data set to a local reference source in order to assess the reliability of analyzed data values, for example, to investigate if there is concordance between the data values stored in EHR and another local source. In this paper, we focus however on the concordance of relevant DQ parameters instead of data values. External references are, therefore, required for investigating the level of agreement with the literature and national references on an aggregated level. The results of such a concordance analysis are also used as presented by Iyen-Omofoman et al[36] to evaluate representativeness in comparison with national databases. We take use of measurements provided by the literature and explore the extent to which resulting DQ metrics in one DIC are concordant with external results found in the literature and national references. Hence, new DQ indicators are required to compare local DQ results to those of external data sources and to determine whether they are contradictory (see Section “Concordance Indicator”).
Definition of Data Quality Indicators
CORD-MI use cases require DQ indicators to assess the quality of data quantitatively. Suitable DQ metrics are therefore derived from the dimensions introduced above. In this section, we give definitions of used DQ indicators (I1,..,I10) and related parameters (P1,…,P25) that are listed in [Tables 4] and [5], respectively. We refer to [Table 4] for the mathematical definition of the parameters and give only the equations for the indicators within the paragraphs for better readability. Regarding the tracer diagnoses specifically relevant for CORD-MI, there is no information in the Alpha-ID-SE terminology about whether a given ICD-10-GM code is a tracer diagnosis or not and whether this code specifies an ambiguous RD diagnosis as explained above. We therefore extended this system with required classifications as shown in [Table 3] to make it useful for assessing the completeness of Orphacoding and unambiguity of RD cases. A formal list of tracer diagnoses[37]
[38] was automatically extracted from the Alpha-ID-SE[24] terminology as described under “Implementation of the Software Framework and Data Quality Assessment Methods”. This list provides a classification of tracer diagnosis into unambiguous tracer diagnoses of type (n:1 or 1:1) and ambiguous tracer diagnoses of type (1:n).
Table 4
DQ parameters displayed in the generated reports
No.
|
Name
|
Abr.
|
Definition
|
Mathematical equation
|
P1
|
Mandatory data items
|
im
|
Number of data items that are mandatory in a given information model defined as a set of data items
|
Given an information model M, ai = 1 if i-th data item in M is mandatory else 0. P1: im = Σ ai
|
P2
|
Missing mandatory data items
|
im_misg
|
Number of mandatory data items that are absent in a given data set
|
Given a data set S following the information model M, bi = 1 if i-th mandatory data item in M is absent in S, else 0. P2: im_misg = Σ bi
|
P3
|
Mandatory data values
|
vm
|
Number of possible mandatory data values in a given data set
|
Given a data set S with n mandatory data items and m data records.
P3: vm = n * m
|
P4
|
Missing mandatory data values
|
vm_misg
|
Number of mandatory data values that are empty or NA in a given data set
|
Given a data set S, cij = 1 if i-th data value of mandatory data item j in S is empty or NA else 0.
P4: vm_misg = Σ Σ cij
|
P5
|
Inpatient cases
|
ipatCase
|
Number of inpatient cases in a given hospital
|
Given a data set S of all cases in a hospital including the data item “encounter class” that captures the type of recorded cases, di = 1 if the i-th unique case in S is of type inpatient else 0.
P5: ipatCase = Σ di
|
P6
|
Inpatients
|
ipat
|
Number of inpatients in a given hospital. This number is equal to the number of subject records (s) because in our study we consider inpatients as subjects
|
Given a data set S as introduced in P5 that also includes the data item “patient ID”, ei = 1 if the i-th unique patient in S has a related case of type inpatient else 0. P6: ipat = Σ ei
|
P7
|
Incomplete inpatient records
|
ipat_inc
|
Number of incomplete inpatient records (cf. P6). This number is equal to the number of incomplete subject records (s_inc)
|
Given a data set S as introduced in P6, fi = 1 if the i-th inpatient record in S has at least one missing data value of a mandatory data item else 0. P7: ipat_inc = Σ fi
|
P8
|
Mandatory data values in the case module
|
vm_case
|
Number of data values required for recording all mandatory items of the case module, i.e. the treatment case and diagnosis profiles of the MII-CDS
|
Given a data set S with p data records following an information model M that requires q mandatory data items for the definition of the case module C. P8: vm_case = p * q
|
P9
|
Missing data values in the case module
|
vm_case_misg
|
Number of data values that are absent for recording the mandatory data items of the case module (cf. P8)
|
Given a data set S with a case module C as defined in P8, hij = 1 if ith data value of data item j in C is empty or NA else 0. P9: vm_case_misg = Σ Σ hij
|
P10
|
Data values selected for outlier detection
|
v_slc
|
Number of data values checked for outliers in a given data set
|
Given a data set S including a subset S' of selected data vectors with n' items and m' records. P10: v_slc= n' * m'
|
P11
|
Outliers
|
v_ip
|
Number of detected outliers (implausible data values) in a given data set
|
Given a data set S including a subset S' of selected data vectors, lij = 1 if i-th data value of data item j in S' is an outlier else 0.
P11: v_ip = Σ Σ lij
|
P12
|
Tracer diagnoses
|
icd_tracer
|
Number of diagnoses with ICD-10 codes that exclusively code RD in a given data set
|
Given a list of tracer diagnoses L and a data set S including a data vector of ICD-10 codes v. For all data values x in v, yi =1 if the i-th data value xi ∈ L else 0.
P12: icd_tracer = Σ yi
|
P13
|
Missing Orphacodes
|
oc_misg
|
Number of diagnoses with ICD-10 code indicating an RD, where no Orphacode is present
|
Given a data set S including among other data vectors, the vectors vi and vj, vi captures ICD-10 codes while vj records OCs. pk = 1 if k-th data value in vi is a tracer diagnosis and the kth data value in vj is missed else 0.
P13: oc_misg= Σ pk
|
P14
|
Checked links
|
link
|
Number of ICD-10-GM/OC links in a given data set
|
Given a data set S including the data vectors vi and vj, vi captures ICD-10 codes while vj records OCs. qk = 1 if k-th data value in vi and the k-th data value in vj are not missed else 0.
P14: link= Σ qk
|
P15
|
Implausible links
|
link_ip
|
Number of ICD-10-GM/OC links not present in the respective Alpha-ID terminology in a given data set
|
Given a data set S as defined in P14, rk = 1 if the combination of k-th data value in vi and k-th data value in vj is implausible else 0.
P15: link_ip = Σ rk
|
P16
|
RD cases
|
rdCase
|
Number of cases that are coded using OCs or ICD-10-GM/OC links or ICD-10 codes from the list of tracer diagnoses
|
Given a data set S with n1 cases coded using individual OCs, n2 cases coded using ICD-10-GM/OC links and n3 cases coded using individual ICD-10 tracer diagnoses (see P12).
P16: rdCase = n1 + n2 + n3
|
P17
|
Ambiguous RD cases
|
rdCase_amb
|
Number of RD cases coded using ambiguous ICD-10-GM/OC links or tracer diagnoses
|
Given a data set S with RD cases as defined in P16, si = 1 if the i-th RD case in S is ambiguous else 0. P17: rdCase_amb = Σ si
|
P18
|
Duplicated RD cases
|
rdCase_dup
|
Number of duplicated RD cases in a given data set
|
Given a data set S including RD cases as defined in P16, ti = 1 if the i-th RD case is duplicated else 0. P18: rdCase_dup = Σ ti
|
P19
|
Tracer cases
|
tracerCase
|
Number of RD cases coded at least using an ICD-10 code from the list of tracer diagnoses
|
Given a list L and a data set S as defined in P12, ui =1 if the i-th case in S is at least coded using an ICD-10 code ui and ui ∈ L else 0.
P19: tracerCase = Σ ui
|
P20
|
Orpha cases
|
orphaCase
|
Number of RD cases coded at least using an OC
|
Given a data set S, zi =1 if the i-th case in S is at least coded using an OC else 0.
P20: orphaCase = Σ zi
|
P21
|
RD cases relative frequency
|
rdCase_rel
|
Relative frequency of RD cases normalized to 100,000 inpatient cases
|
Given a data set S with the DQ parameters ipat as defined in P6 and rdCase as defined in P16. P21: rdCase_rel = (rdCase*100,000)/ipat
|
P22
|
Tracer cases relative frequency
|
tracerCase_rel
|
Relative frequency of tracer cases normalized to 100,000 inpatient cases
|
Given a data set S with the DQ parameters ipat as defined in P6 and tracerCase as defined in P19. P22: tracerCase_rel = (tracerCase*100,000)/ipat
|
P23
|
Orpha cases relative frequency
|
orphaCase_rel
|
Relative frequency of Orpha cases normalized to 100,000 inpatient cases
|
Given a data set S with the DQ parameters ipat as defined in P6 and orphaCase as defined in P20. P23: orphaCase_rel = (orphaCase*100,000)/ipat
|
P24
|
Minimal tracer cases in reference values
|
tracerCase_rel_min
|
Minimal relative frequency of tracer cases normalized to 100,000 inpatient cases found in the literature
|
Given a set T of relative tracer cases reported in the literature. P24: tracerCase_rel_min = min(T)
|
P25
|
Maximal tracer cases in reference values
|
tracerCase_rel_max
|
Maximal relative frequency of tracer cases normalized to 100,000 inpatient cases found in the literature
|
Given a set T of relative tracer cases reported in the literature. P25: tracerCase_rel_max = max(T)
|
Abbreviations: DQ, data quality; ICD-10-GM, International Classification of Diseases and Related Health Problems, 10th revision, German Modification; OC, Orphacodes; RD, rare disease.
Table 5
Data quality indicators (DQIs) displayed in the generated reports
No.
|
DQI
|
Abr.
|
DQ Category
|
Mathematical equation
|
I1
|
Item Completeness Rate
|
dqi_co_icr
|
Item Completeness
|
I1: dqi_co_icr = (im-im_misg)/im
|
I2
|
Value Completeness Rate
|
dqi_co_vcr
|
Value Completeness
|
I2: dqi_co_vcr = (vm-vm_misg)/vm
|
I3
|
Subject Completeness Rate
|
dqi_co_scr
|
Subject Completeness
|
I3: dqi_co_scr = (s-s_inc)/s
|
I4
|
Case Completeness Rate
|
dqi_co_ccr
|
Module Completeness
|
I4: dqi_co_ccr = (vm_case-vm_case_misg)/vm_case
|
I5
|
Orphacoding Completeness Rate
|
dqi_co_ocr
|
Vector Completeness
|
I5: dqi_co_ocr = (icd_tracer-oc_misg)/icd_tracer
|
I6
|
Orphacoding Plausibility Rate
|
dqi_pl_opr
|
Semantic Plausibility
|
I6: dqi_pl_opr = (link-link_ip)/link
|
I7
|
Range Plausibility Rate
|
dqi_pl_rpr
|
Range Plausibility
|
I7: dqi_pl_rpr = (v_slc-v_ip)/v_slc
|
I8
|
RD Case unambiguity Rate
|
dqi_un_cur
|
Semantic Uniqueness
|
I8: dqi_un_cur = (rdCase-rdCase_amb)/rdCase
|
I9
|
RD Case Dissimilarity Rate
|
dqi_un_cdr
|
Syntactic Uniqueness
|
I9: dqi_un_cdr = (rdCase-rdCase_dup)/rdCase
|
I10
|
Concordance with Reference Values from Literature
|
dqi_cc_rvl
|
Concordance
|
I10: dqi_cc_rvl = 1 if tracerCase ∈ [traceCase_rel_min, tracerCase_rel_max] else 0
|
Abbreviation: RD, rare disease.
Completeness Indicators
Item Completeness Rate (dqi_co_icr)
This indicator assesses the metadata completeness of a given data set and evaluates whether mandatory data items (im), for example “ICD_Code”, of the information model were collected. The mandatory data items are specified using the FHIR profiles[39] of the MII-CDS. The absence of a mandatory data item in the metadata of a given data set is considered as a missing mandatory data item (im_misg): dqi_co_icr = (im-im_misg)/im (I1).
Value Completeness Rate (dqi_co_vcr)
While dqi_co_icr evaluates the completeness of the mandatory data items, the value completeness rate focuses on the completeness of recorded data itself. This indicator therefore shows whether all data values, for example “E75.2”, of existing mandatory data items such as “ICD_Code” are collected. We describe such data as mandatory data values (vm). The absence of an individual value detected in a given data vector of a mandatory data item is considered as a missing mandatory data value (vm_misg). Missing data values due to missing data items are not considered. Hence, this indicator represents the proportion of uncertainty due to missing data values detected by existing data vectors of mandatory items: dqi_co_vcr = (vm-vm_misg)/vm (I2).
We would like to emphasize that this indicator cannot detect missing FHIR resources, such as a second diagnosis that would be captured as a second FHIR resource (condition), as these are optional in the information model. Furthermore, coded missing values are not considered in this indicator, as they could be arbitrary nonplausible values due to the heterogeneous primary systems and code systems allowed in FHIR. Such coded missing values are to be detected in the range plausibility rate (dqi_pl_rpr), as these are defined for the individual items.
Subject Completeness Rate (dqi_co_scr)
We introduce this indicator to evaluate the completeness of subject records as defined in [Table 1]. This indicator therefore shows whether all mandatory data values in existing subject records (s) are collected. Subject records with at least one missing data value detected by an existing data vector of a mandatory data item are considered as incomplete subject records (s_inc): dqi_co_scr = (s-s_inc)/s (I3).
We would like to note that duplicated subject records and missing data values due to missing data items are not considered in this indicator. In our case, we consider inpatients as subjects. The number of subject records is therefore equal to the number of inpatients (ipat), and as consequence, the number of incomplete subjects is also equal to the number of incomplete inpatient records (ipat_inc) as shown in [Table 4].
Case Completeness Rate (dqi_co_ccr)
This indicator assesses the completeness of data values required for recording the case module in a given data set. Case refers here to mandatory data items for a case in CORD-MI and encompasses both MII-CDS modules treatment case and diagnosis. Case completeness is therefore an instance of module completeness. The case module includes a set of data vectors related to the following data items: patient ID, encounter ID, encounter status, encounter class, admission date, discharge date, diagnosis code, diagnosis role, and diagnosis date. dqi_co_ccr evaluates whether all required data values for mandatory data items (vm_case) are present. In contrast to dqi_co_vcr, dqi_co_ccr also considers missing values of mandatory data items (vm_case_misg) even if not available in the local information model: dqi_co_ccr = (vm_case-vm_case_misg)/vm_case (I4).
Completeness Rate of Orphacoding (dqi_co_ocr)
This indicator evaluates whether all cases with tracer diagnoses (icd_tracer) are coded using OCs. We used the formal list of tracer diagnoses as a reference for detecting available tracer diagnoses and missing OCs (oc_misg) in a given data set: dqi_co_ocr = (icd_tracer-oc_misg)/icd_tracer (I5).
We would like to emphasize that we cannot detect missing OCs in ICD-10-GM codes of type 1:m, as it would require further clinical evaluation to determine if the code was used to code an RD or a common disease.
Plausibility Indicators
Plausibility Rate of Orphacoding (dqi_pl_opr)
This indicator assesses the semantic plausibility of links between ICD-10-GM and OC, that is, concurrent codes in the Diagnosis module. All semantic links available in a given data set (link) are evaluated using the standard Alpha-ID-SE terminology. An implausible ICD-10-GM/OC link (link_ip) is defined as a combination of ICD-10-GM and OC that is absent in the Alpha-ID-SE terminology valid at the time of coding. The valid version of the used terminology depends on the data set itself. For example, the Alpha-ID-SE version published in 2022 should be used for analyzing data collected in 2022: dqi_pl_opr = (link-link_ip)/link (I6).
Range Plausibility Rate (dqi_pl_rpr)
This indicator evaluates the plausibility of data values in selected data vectors (v_slc). In this context, outliers, that is, implausible values (v_ip), are defined as data values within the selected values (v_slc) that do not meet the user expectations (such as an age value over 115):
dpi_pl_rpr = (v_slc-v_ip)/v_slc (I7).
Uniqueness Indicators
Unambiguity Rate of RD Cases (dqi_un_cur)
This indicator assesses the semantic uniqueness of coded RD cases (rdCase) in a given data set. All cases with documented ICD-10-GM/OC links or individual OCs or individual ICD-10-GM codes from the list of tracer diagnoses are considered as RD cases. The unambiguousness of RD cases is evaluated using an appropriate algorithm, which uses the Alpha-ID-SE terminology and the list of tracer diagnoses as references for the classifications of RD diagnoses. Ambiguous RD cases (rdCase_amb) are cases coded using ambiguous ICD-10-GM/OC links or tracer diagnoses of type 1:n:
dqi_un_cur = (rdCase-rdCase_amb)/rdCase (I8).
We would like to note that cases with documented common diseases in the primary diagnosis and RD in the secondary diagnosis are also considered as RD cases. The primary diagnosis is the one responsible for causing the patient's hospitalization, while secondary diagnoses are complications that already coexist with a primary diagnosis or are developed during the inpatient hospitalization.
Dissimilarity Rate of RD Cases (dqi_un_cdr)
This indicator evaluates the syntactic uniqueness of recorded RD cases. A high proportion of duplicate cases (rdCase_dup) in the data set may be due to systematic double documentation or to a systematic error in the used information system:
dqi_un_cdr = (rdCase-rdCase_dup)/rdCase (I9).
Concordance Indicator
Concordance with Reference Values from Literature (dqi_cc_rvl)
RD cases that are coded at least with a tracer diagnosis are called tracer cases. This indicator is a measure, if the relative frequency of reported cases including a tracer diagnosis — here called tracer cases (tracerCase) — lies in the range found in the literature. The relative frequency of tracer cases (tracerCase_rel) is the ratio of coded tracer cases to the total inpatient cases this year in a given hospital normalized to 100,000 inpatient cases. The indicator dqi_cc_rvl evaluates therefore if there is concordance between the relative frequency of tracer cases measured locally in a given DIC and the relative frequency of tracer cases provided by literature references. If there is concordance with the literature the indicator output will be 1, else 0. We define as concordance limits the minimal (traceCase_rel_min) and maximal (traceCase_rel_max) values found in the literature:
dqi_cc_rvl = 1 if tracerCase ∈ [traceCase_rel_min, tracerCase_rel_max] else 0 (I10).
We acknowledge that the choice of the limits is disputable and that statistical measures such as the standard deviation or the quartiles might have been expected here. But in the current situation where only one reference is available, it is our opinion that concordance should be defined generously until more reference data are available that allows quantitative statistics.
Lehne et al[40] investigated 143 tracer diagnoses required for CORD-MI and found that the relative frequency of tracer cases measured using the German National Case Statistics (NCS) was tracerCase_rel_mean = 294.8 per 100,000 cases, while this frequency rate is 3.14 times higher in university hospitals. We cannot therefore use tracerCase_rel_mean as a reference for our concordance analysis; instead, we take the provided consistent pattern of ratios between the relative frequency of tracer cases across different ICD-10-GM chapters to NCS ranging from a minimum (min) of 2.01 for diseases of the nervous system to a maximum (max) of 6.28 for the skin subcutaneous tissue as aggregated reference levels. We use these ratios to define a tolerance interval I = [min*294.8, max*294.8] for assessing the concordance of tracer cases at each hospital, resulting in frequencies of tracer cases that lie in the interval I = [593, 1851] fulfilling our concordance criterion.
Implementation of the Software Framework and Data Quality Assessment Methods
As proof of concept, we provide an open-source implementation of our software framework that can be executed locally or in distributed environments. Our tools include (1) An R package for DQ assessment and reporting, (2) R scripts as an exemplary implementation specific for CORD-MI, (3) a tracer diagnoses list, (4) Personal Health Train (PHT) for distributed DQ assessments, (5) FHIR tools, and (6) a Docker file for local execution. All developed tools and generated DQ reports are available on GitHub.[37]
[41]
[42] In the following, we introduce these implemented tools.
Our DQ concept introduced above is implemented as an R package that provides reusable methods for calculating DQ metrics and generating user-defined DQ reports. The developed package is used as a software framework called DQ library[41] to develop reporting scripts for DQ assessment.[38] Essentially, this software library has a modular design that allows the user to select desired parameters and indicators as well as to generate specific reports that include selected metrics and detected DQ issues. Using this framework, we developed tools for locally and cross-institutional analysis of DQ in CORD-MI.[37]
To make the Alpha-SE-ID terminology useful for assessing the OC indicators presented above, we extended this terminology with required classifications as shown in [Table 3]. A formal list of tracer diagnoses was therefore automatically generated using a computer-based classification approach that identifies tracer diagnoses listed in the Alpha-ID-SE terminology and classifies them into unambiguous and ambiguous tracer diagnoses. We used this list as a reference for detecting tracer diagnoses available in the analyzed data and evaluating the quality of RD documentation. The generated reference list can be downloaded from the GitHub repository.[37]
To enable cross-site reporting on DQ, a distributed DQ analysis was implemented using PHT.[43]
[44] PHT is an infrastructure to support distributed data analytics of medical health data, while the data remain under the control of the data holders. There are two main concepts in PHT, i.e., station and train. Stations are the data holding nodes that expose data in a discoverable format, define data source interfaces to execute queries, and execute analytical tasks in a secure environment. Trains are the encapsulated analytical tasks (including algorithms, queries, and intermediate results) based on containerization technologies and travel from one station to the next to update the results. Between stations, the results inside containers are encrypted to prevent manipulation or disclosure. PHT provides a core component in its architecture for researchers, so-called Central Services, that allows researchers to define and send train job requests, to monitor the execution process (as shown in [Fig. 2]), and to view the results. For distributed DQ analysis, we implemented the algorithms as a Docker image and ran it on the PHT platform in a distributed way. The developed PHT image is available on GitHub repository.[37]
Fig. 2 Train route for distributed DQ assessments over the three German hospitals (UKA, UKK, and UMG). DQ, data quality; UKA, University Hospital RWTH Aachen; UKK, University Hospital Cologne; UMG, University Medical Center Göttingen.
To enable interoperable DQ assessments, we developed a FHIR interface using the fhircrackr package[45] and applied it to different FHIR data sets distributed across multiple hospitals. The used data sets follow the MII-CDS and contain randomly introduced DQ issues as described under “Experiment Settings for Distributed Data Quality Assessments”. The so-called HAPI FHIR server[46] was installed for storing these synthetic FHIR data sets at each hospital.[42] The developed interface ensures an interoperable execution of our DQ methodology that does not depend on local configuration or HIS architectures. In Section “Experiment Settings for Distributed Data Quality Assessments”, we present the standardized data provision using FHIR and the used distribution for evaluating our methodology across multiple hospitals.
Finally, we would like to emphasize that our solution also provides an Excel and CSV interface for importing tabular data as well as a Docker file for local execution (see GitHub repository[37]). This enables an easy execution of local DQ assessments, in order, for example, to evaluate the quality of data directly extracted from HIS data sources using exports in Excel or CSV formats (before the transformation to FHIR format). In Section “Experiment Settings for Local Data Quality Assessment”, we present the experimental validation for local DQ assessments.
Experiment Design and Validation Methods
We used precision and recall as metrics to validate the developed methodology and software solutions for DQ assessment. Precision describes the rate of detected DQ issues that were correct, while recall represents how many of existing DQ issues were detected. We measured precision and recall by comparing the obtained results with the distribution of DQ issues shown in [Table 6]. In the following, we present the used experimental settings and data sets for validating the implemented DQ assessments.
Table 6
Distribution of DQ issues in synthetic data over the three German hospitals (UKA, UKK, and UMG)
DQ Issues
|
Organization
|
UKA
|
UKK
|
UMG
|
Missing of mandatory data items
|
4
|
2
|
3
|
Missing of mandatory data values
|
8
|
1,748
|
518
|
Incomplete inpatient records
|
1
|
819
|
237
|
Missing of OCs
|
4
|
2
|
11
|
Implausible ICD-10-GM/OC links
|
10
|
3
|
22
|
Outlier issues
|
3
|
2
|
8
|
Ambiguous RD cases
|
11
|
3
|
25
|
Duplicated RD cases
|
1
|
1
|
3
|
Abbreviations: DQ, data quality; ICD-10-GM, International Classification of Diseases and Related Health Problems, 10th revision, German Modification; OC, Orphacodes; RD, rare disease; UKA, University Hospital RWTH Aachen; UKK, University Hospital Cologne; UMG, University Medical Center Göttingen.
Experiment Settings for Distributed Data Quality Assessments
In order to test and validate our implementation, we first invited three hospitals from the CORD-MI consortium to join the distributed DQ assessments. The participating institutions are University Hospital RWTH Aachen (UKA), University Hospital Cologne (UKK), and University Medical Center Göttingen (UMG). Each hospital set up a single PHT station as well as a FHIR server for synthetic RD patients. We used the station software and On-Boarding workflow[44] to set up the required IT infrastructure for PHT. We also implemented the algorithms as a Docker image to run the DQ analysis in a distributed way as explained above. Next, we configured the train route for participating in the distributed DQ assessments as shown in [Fig. 2].
In addition, we prepared and transformed the synthetic RD data into FHIR bundles including four types of FHIR resources called organization, patient, encounter, and condition. We first developed FHIR tools for extracting the original data from the MII FHIR server[47] and creating FHIR bundles of around 1,000 patients for each participating hospital. The resulting FHIR collection bundles were stored in JSON files that represent three data sets of different organizations namely UKA (Cynthia), UKK (Bapu), and UMG (Airolo). Each data set includes common data items that capture information about the basic modules of the MII-CDS as specified in the FHIR implementation guide of CORD-MI.[39] In this context, we would like to emphasize that when applying our methodology to real-world data, ETL processes have to be implemented that extract the clinical data sets from different data sources of local HIS and transfer them into FHIR resources.
Next, we randomly added DQ issues in these data sets such as duplication, outliers, and implausible RD codification. [Table 6] displays the distribution of DQ issues over the three hospitals. For example, the UMG synthetic data set contains 997 cases in which three duplicated RD cases and eight outliers (e.g., age above 115) were randomly introduced. Furthermore, we transformed the FHIR bundles into actionable transactions and distributed them over all participating hospitals. We also developed a python script for enabling an easy upload of created transactions to the FHIR server of each location. The modified data sets are then stored on the local FHIR servers, and in the following, we denote the different FHIR servers by their data set, for example, the Ariolo FHIR server at UMG. The tools and data sets used for data curation are available on GitHub.[42]
Finally, we started the distributed DQ assessments using PHT. The developed train travels from one station to another to execute including algorithms for evaluating the quality of data stored in the local FHIR servers. If the execution at one station was successful, the train could visit the next station. The stations are used for executing the DQ assessment in a distributed way. The installed FHIR servers are linked to the PHT stations as shown in [Fig. 3]. In Section “Distributed Data Quality Assessments”, we present the results of distributed DQ assessments carried out using these experimental settings.
Fig. 3 Setting the address of the target FHIR server using the station software of PHT. FHIR, Fast Healthcare Interoperability Resources; PHT, Personal Health Train.
Experiment Settings for Local Data Quality Assessment
We used the Airolo data set in CSV and Excel formats as well as the Airolo FHIR server installed at UMG for testing the local DQ assessment and validating the obtained DQ results. We would like to note that the number of available inpatient cases, for example, in the Airolo FHIR server in 2020 represents all inpatient cases captured at UMG this year. In contrast to the distributed DQ assessment, local DQ assessment generates the report comprising two Excel spreadsheets. The first sheet also called “report on DQ metrics” illustrates the same DQ metrics as the distributed DQ assessment, while the second sheet also called “report on DQ violations” reports the detected DQ issues, which is an additional function provided only for local execution. To enable users to find the DQ violations and causes of these violations, the second report provides sensitive information such as patient identifier (ID) or case ID (see [Fig. 4]). This reporting function however is only available for local execution in order to meet the data privacy requirements. To validate our implementation, we investigate whether there is a discrepancy between the first and second spreadsheets. In addition, we analyze the concordance between the resulting DQ metrics obtained using distributed DQ assessments and those obtained using local executions on different data formats. In Section “Local Data Quality Assessment”, we present the results of DQ assessments carried out locally using these experimental settings and validation methods.
Fig. 4 Report on DQ violations detected by DQ assessment of synthetic data stored in the Airolo FHIR server. Abbrevations: DQ, data quality; FHIR, Fast Healthcare Interoperability Resources.
Results
Our methodology provides both conceptual and software frameworks that enable a harmonized DQ assessment at single-site and cross-site fashion. Four independent quality dimensions have been proposed in the conceptual framework, namely completeness, plausibility, uniqueness, and concordance. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were defined as shown in [Tables 4] and [5]. The implemented software framework provides interoperable tools for calculating required quality metrics and generating local as well as cross-institutional reports on DQ. In this section, we first present the resulting DQ reports as a proof of concept before we demonstrate that our methodology is capable of detecting DQ issues such as outliers or implausibility of coded diagnoses. The results of distributed DQ assessments and local DQ assessments are presented below.
Distributed Data Quality Assessments
The train first visited UKA, UKK, and finally stopped in UMG with no errors. As a result, distributed analysis was computed over three university hospitals for assessing the quality of FHIR data stored in different HISs as described in Section “Experiment Settings for Distributed Data Quality Assessments”. The generated reports and used tools can be downloaded from GitHub repository.[37]
[Table 7] illustrates the most important results of DQ parameters that were displayed in the generated DQ reports. For example, [Table 7] shows that the UKA data set includes the highest relative frequency of RD cases as well as Orpha cases, while the UMG data set has the highest relative frequency of tracer cases. In this context, we would like to note that the relative case frequencies presented in [Table 7] are normalized to 100,000 inpatient cases. The total number of inpatient cases at UKA was 1,000 in 2020. Therefore, 132 cases of those inpatient cases were RD cases, 11 cases were coded with tracer diagnoses, and 128 cases were coded with OCs, that is, 7 tracer cases were coded using ICD-10-GM/OC links.
Table 7
Distribution of DQ assessments over the three German hospitals (UKA, UKK, and UMG), DQ parameters for the report year 2020 on synthetic RD data
DQ parameter
|
Organization
|
UKA
|
UKK
|
UMG
|
ipatCase
|
1,000
|
1,000
|
997
|
ipat
|
949
|
946
|
950
|
rdCase_rel
|
13,200
|
1,700
|
10,030
|
orphaCase_rel
|
12,800
|
1,600
|
9,027
|
tracerCase_rel
|
1,100
|
400
|
1,906
|
im_misg
|
4
|
2
|
3
|
vm_misg
|
8
|
1,748
|
518
|
ipat_inc
|
1
|
819
|
237
|
oc_misg
|
4
|
2
|
11
|
v_ip
|
3
|
2
|
8
|
link_ip
|
10
|
3
|
22
|
rdCase_amb
|
11
|
3
|
25
|
rdCase_dup
|
1
|
1
|
3
|
Abbreviations: DQ, data quality; RD, rare disease; UKA, University Hospital RWTH Aachen; UKK, University Hospital Cologne; UMG, University Medical Center Göttingen.
The generated reports also include DQ assessments for the top dimensions introduced in the conceptual framework as described in the “Methods” Section. [Table 8] gives the resulting DQ indicators obtained for evaluating the completeness, plausibility, uniqueness, and concordance dimensions in each location. We would like to mention that individual DQ indicators are never absolute but should always be seen in the related context and dimensions. [Table 8] shows, for example, that the UKA data set is in full agreement with reference values from the literature and yields the best results on the most indicators such as Orphacoding Completeness Rate and Orphacoding Plausibility Rate. However, the Data Item Completeness Rate achieved the worst results with this data set. Moreover, [Table 8] also shows the independence of used DQ metrics, for example, although the indicator for data item completeness achieved the best result with UKK data, we got the worst result when assessing the subject or case completeness using the same data set. We even found that while the completeness indicator increases over 80% when assessing the completeness of data values available in the UKK data set, it is reduced by around 13% when evaluating the completeness of inpatient records.
Table 8
Distributed DQ assessments over the three German hospitals (UKA, UKK, UMG) using synthetic RD data of year 2020, report on data quality indicators (DQIs) for the top dimensions: completeness, plausibility, uniqueness, and concordance
Top dimension
|
DQI
|
Organization
|
UKA
|
UKK
|
UMG
|
Completeness (co)
|
dqi_co_icr
|
71.43%
|
85.71%
|
78.57%
|
dqi_co_vcr
|
99.92%
|
85.45%
|
96.22%
|
dqi_co_ccr
|
66.62%
|
58.40%
|
62.14%
|
dqi_co_scr
|
99.89%
|
13.42%
|
75.05%
|
dqi_co_ocr
|
63.64%
|
60%
|
45%
|
Plausibility (pl)
|
dqi_pl_opr
|
92.19%
|
81.25%
|
76.34%
|
dqi_pl_rpr
|
99.92%
|
99.94%
|
99.83%
|
Uniqueness (un)
|
dqi_un_cur
|
91.67%
|
82.35%
|
75%
|
dqi_un_cdr
|
99.25%
|
94.44%
|
97.09%
|
Concordance (cc)
|
dqi_cc_rvl
|
1
|
0
|
0
|
Abbreviations: RD, rare disease; UKA, University Hospital RWTH Aachen; UKK, University Hospital Cologne; UMG, University Medical Center Göttingen.
We tested all types of DQ issues required for CORD-MI as shown in [Table 6] and measured the precision and recall values. The developed methods were able to detect all types of randomly introduced DQ issues that were distributed as in [Table 6]. Our methodology yields therefore high precision and recall values up to 100%. We also repeated the execution of our algorithm several times with different distributions of random DQ issues in UMG, UKK, and UKA and got the same validation results. Hence, the resulting DQ parameters and indicators validated the correctness and accuracy of performed DQ assessments.
Local Data Quality Assessment
We applied the developed methods on the Airolo FHIR server located at UMG in order to validate the local and distributed DQ assessments as explained under “Experiment Settings for Local Data Quality Assessment”. As result, two DQ reports were generated automatically (see GitHub repository[37]). The first report illustrates the calculated DQ metrics, while the second one reports on DQ violations. [Fig. 4] shows the DQ issues detected in the second report. The comparison with the report on DQ metrics did not show any discrepancy between the displayed DQ issues and calculated DQ metrics. The generated report on DQ violations is used to establish iterative feedback to potential users and to improve the quality of RD data. Potential users are for example medical documentation assistants or data scientists. We would like to mention that, if applied to real-world data, the spreadsheet for DQ violations cannot be shared as it contains sensitive information that may be traced back to individual patients.
Besides the spreadsheet for DQ violations, the generated reports also provide adequate information about the quality metrics calculated for the top dimensions: completeness, plausibility, uniqueness, and concordance. The obtained results are in full agreement with distributed DQ assessments shown in [Tables 7] and [8] (UMG). We also repeated the execution of local DQ assessments several times with other formats such as CSV or Excel and got the same validation results. Our methodology yields therefore high precision and recall values up to 100% in both experiments for local and distributed DQ assessments. Further, we demonstrated that distributed DQ analysis using PHT can achieve assessment results as good as local execution.