CC BY-NC-ND 4.0 · Methods Inf Med 2023; 62(S 01): e47-e56
DOI: 10.1055/a-2006-1086
Original Article

Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections

Khalid O. Yusuf*
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
Olga Miljukov*
2   Institute for Clinical Epidemiology and Biometry (ICE-B), University of Würzburg, Würzburg, Germany
,
Anne Schoneberg
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
Sabine Hanß
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
Martin Wiesenfeldt
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
,
Melanie Stecher
3   Department I for Internal Medicine, University Hospital Cologne, Cologne, Germany
4   German Centre for Infection Research, Partner Site Bonn-Cologne, Cologne, Germany
,
Lazar Mitrov
5   Department I of Internal Medicine, Faculty of Medicine and University Hospital Cologne, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, University of Cologne, Cologne, Germany
,
Sina Marie Hopff
5   Department I of Internal Medicine, Faculty of Medicine and University Hospital Cologne, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, University of Cologne, Cologne, Germany
,
Sarah Steinbrecher
6   Department of Infectious Diseases and Respiratory Medicine, Charité-Universitaetsmedizin Berlin, Berlin, Germany
,
Florian Kurth
6   Department of Infectious Diseases and Respiratory Medicine, Charité-Universitaetsmedizin Berlin, Berlin, Germany
,
Thomas Bahmer
7   Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel, Kiel, Germany
8   Airway Research Center North (ARCN), German Center for Lung Research (DZL), Wöhrendamm Großhansdorf, Germany
,
Stefan Schreiber
7   Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel, Kiel, Germany
,
Daniel Pape
7   Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel, Kiel, Germany
,
Anna-Lena Hofmann
2   Institute for Clinical Epidemiology and Biometry (ICE-B), University of Würzburg, Würzburg, Germany
,
Mirjam Kohls
2   Institute for Clinical Epidemiology and Biometry (ICE-B), University of Würzburg, Würzburg, Germany
,
Stefan Störk
9   Department Clinical Research & Epidemiology, University Hospital Würzburg, Comprehensive Heart Failure Center, and Department Internal Medicine I, Würzburg, Germany
,
Hans Christian Stubbe
10   Department of Medicine II, University Hospital, LMU Munich, Munich, Germany
,
Johannes J. Tebbe
11   Department of Gastroenterology and Infectious Diseases, University Medical Center East Westphalia-Lippe, Klinikum Lippe, Lemgo, Germany
,
Johannes C. Hellmuth
12   Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
13   COVID-19 Registry of the LMU Munich (CORKUM), University Hospital, LMU Munich, Munich, Germany
,
Johanna Erber
14   Department II of Internal Medicine, Technical University of Munich, School of Medicine, Germany
,
Lilian Krist
15   Institute of Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin, Berlin, Germany
,
Siegbert Rieg
16   Department of Medicine II, Division of Infectious Diseases, Medical Centre – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
,
Lisa Pilgram
17   Department II of Internal Medicine, Hematology/Oncology, Goethe University, Frankfurt, Frankfurt am Main, Germany
18   Department of Nephrology and Medical Intensive Care, Charité - Universitaetsmedizin Berlin, Berlin, Germany
,
Jörg J. Vehreschild
3   Department I for Internal Medicine, University Hospital Cologne, Cologne, Germany
4   German Centre for Infection Research, Partner Site Bonn-Cologne, Cologne, Germany
18   Department of Nephrology and Medical Intensive Care, Charité - Universitaetsmedizin Berlin, Berlin, Germany
,
Jens-Peter Reese
2   Institute for Clinical Epidemiology and Biometry (ICE-B), University of Würzburg, Würzburg, Germany
,
Dagmar Krefting
1   Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
19   Campus Institute Data Science (CIDAS), Georg-August-University, Göttingen, Germany
› Author Affiliations
Funding The study was carried out using the clinical-scientific infrastructure and data of NUKLEUS, NAPKON, and CODEX of the Network University Medicine (NUM, grant number 01KX2121), with support from the German Center for Cardiovascular Research (DZHK, grant number 81Z0300108) both funded by the Federal Ministry of Education and Research (BMBF) .

Abstract

Background As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.

Objectives The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.

Methods All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their-defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.

Results Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.

Conclusion An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.

* These authors contributed equally.


Availability of Materials and Data

The GECCO83 dataset used for the study can be accessed through the normal use and access procedure of the NAPKON.1 Also, the source code of the implementation is available in the gitlab repository of the project.2

1 NAPKON-Proskive: https://proskive.napkon.de/

2 dqGecco Project: https://gitlab.gwdg.de/medinfpub/dqgecco.git




Publication History

Received: 18 July 2022

Accepted: 31 October 2022

Accepted Manuscript online:
03 January 2023

Article published online:
30 January 2023

© 2023. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Nonnemacher M, Nasseh D, Stausberg J, Bauer U. Datenqualität in der medizinischen Forschung: Leitlinie zum adaptiven Management von Datenqualität in Kohortenstudien und Registern. 2., aktualisierte und erw. Aufl. Med. Wiss. Verl.- Ges; 2014
  • 2 Mezzanzanica M, Boselli R, Cesarini M, Mercorio F. Data quality sensitivity analysis on aggregate indicators: In: Proceedings of the International Conference on Data Technologies and Applications. Setúbal: SciTePress - Science and Technology Publications; 2012: 97-108
  • 3 Johnson SG, Pruinelli L, Hoff A. et al. A framework for visualizing data quality for predictive models and clinical quality measures. AMIA Jt Summits Transl Sci Proc 2019; 2019: 630-638
  • 4 Schmidt CO, Struckmann S, Enzenbach C. et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol 2021; 21 (01) 63
  • 5 Schons MJ, Pilgram L, Reese JP. et al. The German National Pandemic Cohort Network (NAPKON): rationale, study design and baseline characteristics. Eur J Epidemiol 2022; 37 (08) 849-870
  • 6 Sass J, Bartschke A, Lehne M. et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Med Inform Decis Mak 2020; 20 (01) 341
  • 7 Costa-Santos C, Neves AL, Correia R. et al. COVID-19 surveillance data quality issues: a national consecutive case series. BMJ Open 2021; 11 (12) e047623
  • 8 Muzoora MR, Schaarschmidt M, Krefting D, Oehm J, Riepenhausen S, Thun S. Towards FAIR patient reported outcome: application of the interoperability principle for mobile pandemic apps. In: Delgado J, Benis A, de Toledo P, et al., eds. Studies in Health Technology and Informatics. Amsterdam: IOS Press; 2021: 85-86
  • 9 Yusuf K, Rainers M, Hanß S, Krefting D. Medizinische Informatik - Öffentliche Projekte / mi-num-public / NAPKON-to-Gecco-Convert. GitLab. Accessed April 12, 2022 at: https://gitlab.gwdg.de/medinfpub/mi-num-public/napkon-to-gecco
  • 10 Kahn MG, Callahan TJ, Barnard J. et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244
  • 11 Embury SM, Brandt SM, Robinson JS. et al. Adapting integrity enforcement techniques for data reconciliation. Inf Syst 2001; 26 (08) 657-689
  • 12 Yusuf K, Tahar K, Sax U, Hoffmann W, Krefting D. Assessment of the consistency of categorical features within the DZHK biobanking basic set. In: Röhrig R, Grabe N, Hoffmann VS, et al., eds. Studies in Health Technology and Informatics. Amsterdam: IOS Press; 2022: 98-106
  • 13 Herzinger S, Gu W, Satagopam V. et al; eTRIKS Consortium. SmartR: an open-source platform for interactive visual analytics for translational research data. Bioinformatics 2017; 33 (14) 2229-2231
  • 14 Covid-19 Research-Dataset - Datasets. Accessed May 22, 2022 at: https://art-decor.org/art-decor/decor-datasets–covid19f-?id=2.16.840.1.113883.3.1937.777.53.1.1&effectiveDate=2020-04-08T13%3A04%3A13&language=de-DE
  • 15 Nakamura K. Central circuitries for body temperature regulation and fever. Am J Physiol Regul Integr Comp Physiol 2011; 301 (05) R1207-R1228
  • 16 Mackowiak PA. Concepts of fever. Arch Intern Med 1998; 158 (17) 1870-1881
  • 17 Geneva II, Cuzzo B, Fazili T, Javaid W. Normal body temperature: a systematic review. Open Forum Infect Dis 2019; 6 (04) ofz032