Subscribe to RSS
DOI: 10.1055/s-0040-1713684
From Raw Data to FAIR Data: The FAIRification Workflow for Health Research
Funding This work was performed in the scope of FAIR4Health project31. FAIR4Health has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number 824666.Publication History
31 July 2019
06 May 2020
Publication Date:
03 July 2020 (online)
Abstract
Background FAIR (findability, accessibility, interoperability, and reusability) guiding principles seek the reuse of data and other digital research input, output, and objects (algorithms, tools, and workflows that led to that data) making them findable, accessible, interoperable, and reusable. GO FAIR - a bottom-up, stakeholder driven and self-governed initiative - defined a seven-step FAIRification process focusing on data, but also indicating the required work for metadata. This FAIRification process aims at addressing the translation of raw datasets into FAIR datasets in a general way, without considering specific requirements and challenges that may arise when dealing with some particular types of data.
Objectives This scientific contribution addresses the architecture design of an open technological solution built upon the FAIRification process proposed by “GO FAIR” which addresses the identified gaps that such process has when dealing with health datasets.
Methods A common FAIRification workflow was developed by applying restrictions on existing steps and introducing new steps for specific requirements of health data. These requirements have been elicited after analyzing the FAIRification workflow from different perspectives: technical barriers, ethical implications, and legal framework. This analysis identified gaps when applying the FAIRification process proposed by GO FAIR to health research data management in terms of data curation, validation, deidentification, versioning, and indexing.
Results A technological architecture based on the use of Health Level Seven International (HL7) FHIR (fast health care interoperability resources) resources is proposed to support the revised FAIRification workflow.
Discussion Research funding agencies all over the world increasingly demand the application of the FAIR guiding principles to health research output. Existing tools do not fully address the identified needs for health data management. Therefore, researchers may benefit in the coming years from a common framework that supports the proposed FAIRification workflow applied to health datasets.
Conclusion Routine health care datasets or data resulting from health research can be FAIRified, shared and reused within the health research community following the proposed FAIRification workflow and implementing technical architecture.
-
References
- 1 Wilkinson MD, Dumontier M, Aalbersberg IJ. , et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3: 160018-160018
- 2 Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Inf Serv Use 2017; 37: 49-56
- 3 European Commission, Directorate-General for Research & Innovation. H2020 Programme: Guidelines on FAIR Data Management in Horizon 2020. Available at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf . Accessed April 7, 2020
- 4 Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information. Available at: https://eur-lex.europa.eu/eli/dir/2019/1024/oj . Accessed April 7, 2020
- 5 Office of Strategic Coordination—The Common Fund, National Institutes of Health. New Models of Data Stewardship. Program snapshot. Available at: https://commonfund.nih.gov/data . Accessed April 7, 2020
- 6 GO FAIR Initiative. Available at: https://www.go-fair.org/go-fair-initiative/ . Accessed April 7, 2020
- 7 FAIRification process. Available at: https://www.go-fair.org/fair-principles/fairification-process/ . Accessed April 7, 2020
- 8 Skovgaard LL, Wadmann S, Hoeyer K. A review of attitudes towards the reuse of health data among people in the European Union: the primacy of purpose and the common good. Health Policy 2019; 123 (06) 564-571
- 9 Federer LM, Lu YL, Joubert DJ, Welsh J, Brandys B. Biomedical data sharing and reuse: attitudes and practices of clinical and scientific research staff. PLoS One 2015; 10 (06) e0129506-e0129506
- 10 Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical data reuse or secondary use: current status and potential future progress. Yearb Med Inform 2017; 26 (01) 38-52
- 11 World Medical Association. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA 2013; 310 (20) 2191-2194
- 12 WMA declaration of taipei on ethical considerations regarding health databases and biobanks. Available at: https://www.wma.net/policies-post/wma-declaration-of-taipei-on-ethical-considerations-regarding-health-databases-and-biobanks/ . Accessed April 7, 2020
- 13 The Health Information Technology for Economic and Clinical Health (HITECH) Act Enforcement Interim Final Rule. U.S. Department of Health & Human Services, Health Information Privacy. Available at: https://www.hhs.gov/hipaa/for-professionals/special-topics/hitech-act-enforcement-interim-final-rule/index.html . Accessed April 7, 2020
- 14 Cohen IG, Mello MM. HIPAA and protecting health information in the 21st century. JAMA 2018; 320 (03) 231-232
- 15 Recommendations on de-identification of protected health information under HIPAA. U.S. Department of Health & Human Services, National Committee on Vital and Health Statistics. Available at: https://www.ncvhs.hhs.gov/wp-content/uploads/2013/12/2017-Ltr-Privacy-DeIdentification-Feb-23-Final-w-sig.pdf . Accessed April 7, 2020
- 16 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Available at: https://eur-lex.europa.eu/eli/reg/2016/679/oj . Accessed April 7, 2020
- 17 Carrell DS, Schoen RE, Leffler DA. , et al. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc 2017; 24 (05) 986-991
- 18 Pons E, Braun LMM, Hunink MGM, Kors JA. Natural Language Processing in Radiology: A Systematic Review. Radiology 2016; 279 (02) 329-343
- 19 Liao KP, Cai T, Savova GK. , et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015; 350: h1885-h1885
- 20 Chen L, Song L, Shao Y, Li D, Ding K. Using natural language processing to extract clinically useful information from Chinese electronic medical records. Int J Med Inform 2019; 124: 6-12
- 21 Ong T, Pradhananga R, Holve E, Kahn MG. A framework for classification of electronic health data extraction-transformation-loading challenges in data network participation. EGEMS (Wash DC) 2017; 5 (01) 10-10
- 22 Hamrouni H, Brahmia Z, Bouaziz R. A systematic approach to efficiently managing the effects of retroactive updates of time-varying data in multiversion XML databases. International Journal of Intelligent Information and Database Systems 2018; 11: 1-26
- 23 Yenni GM, Christensen EM, Bledsoe EK. , et al. Developing a modern data workflow for regularly updated data. PLoS Biol 2019; 17 (01) e3000125-e3000125
- 24 Wilkinson MD, Verborgh R, Bonino da Silva Santos LO. , et al. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Comput Sci 2017; 3: e110
- 25 Collins S, Genova F, Harrower N. , et al. Turning FAIR into reality. European Commission Directorate General for Research and Innovation 2018; 1: 1-76
- 26 European Commission. Ethics and data protection. Available at: https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ethics/h2020_hi_ethics-data-protection_en.pdf . Accessed April 7, 2020
- 27 Canham S, Ohmann C, Matei M. , et al. White paper 4: ethics, supporting document to D3.3 draft policy recommendations. Available at: https://eoscpilot.eu/sites/default/files/eoscpilot_d3.3_whitepaper_4_ethics.pdf . Accessed April 7, 2020
- 28 Ienca M, Ferretti A, Hurst S, Puhan M, Lovis C, Vayena E. Considerations for ethics review of big data health research: A scoping review. PLoS One 2018; 13 (10) e0204937-e0204937
- 29 Council of the European Union Outcome of proceedings. Available at: http://data.consilium.europa.eu/doc/document/ST-14853-2015-INIT/en/pdf . Accessed April 7, 2020
- 30 Floridi L, Taddeo M. What is data ethics?. Philos Trans A Math Phys Eng Sci 2016; 374 (2083): 20160360
- 31 FAIR4Health Project. FAIR4Health Project Website. Available at: https://www.fair4health.eu/ . Accessed April 7, 2020
- 32 HL7 Clinical Document Architecture (CDA). Health Level Seven International (HL7). Available at: http://www.hl7.org/implement/standards/product_brief.cfm?product_id=7 . Accessed April 7, 2020
- 33 HL7 FHIR. Available at: http://hl7.org/fhir/ . Accessed April 7, 2020
- 34 Open industry specifications, models and software for e-health (OpenEHR). Available at: https://www.openehr.org/ . Accessed April 7, 2020
- 35 Observational Health Data Sciences and Informatics. OMOP common data model. Available at: https://www.ohdsi.org/data-standardization/the-common-data-model/ . Accessed April 7, 2020
- 36 U.S. Department of Health & Human Services. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Available at: https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/De-identification/hhs_deid_guidance.pdf . Accessed April 7, 2020
- 37 Burrows JH. Secure hash standard. In: Federal Information Processing Standards Publication. National Institute of Standards and Technology; 1994
- 38 Lakshmanan T, Madheswaran M. A novel secure hash algorithm for public key digital signature schemes. Int Arab J Inf Technol 2012; 9: 262-267
- 39 Dalenius T. Finding a needle in a haystack or identifying anonymous census records. J Off Stat 1986; 2: 329-336
- 40 World Health Organization. Classification of Diseases (ICD)-11. Available at: https://www.who.int/classifications/icd/en/ . Accessed April 7, 2020
- 41 SNOMED International. Available at: http://www.snomed.org/ . Accessed April 7, 2020
- 42 LONIC. The international standard for identifying health measurements, observations, and documents. Available at: https://loinc.org/ . Accessed April 7, 2020
- 43 Rauber A, Asmi A, van Uytvanck D, Proell S. Data citation of evolving data: Recommendations of the Working Group on Data Citation (WGDC). Result of the RDA Data Citation WG 2015; 20: 1-2
- 44 ISO 14721:2003 Space data and information transfer systems—open archival information system—Reference model. Available at: https://www.iso.org/standard/24683.html . Accessed April 7, 2020
- 45 Deserno TM, Welter P, Horsch A. Towards a repository for standardized medical image and signal case data annotated with ground truth. J Digit Imaging 2012; 25 (02) 213-226
- 46 Canham S, Ohmann C. A metadata schema for data objects in clinical research. Trials 2016; 17 (01) 557-557
- 47 Cross-Enterprise Document Sharing. Available at: https://wiki.ihe.net/index.php/Cross-Enterprise_Document_Sharing . Accessed April 7, 2020
- 48 Digital Imaging and Communications in Medicine (DICOM). Available at: https://www.dicomstandard.org/ . Accessed April 7, 2020
- 49 C-CDA (HL7 CDA R2 Implementation Guide: Consolidated CDA Templates for Clinical Notes—US Realm). Available at: http://www.hl7.org/implement/standards/product_brief.cfm?product_id=492 . Accessed April 7, 2020
- 50 FHIR HL7. Resource ConceptMap—Content. Available at: https://www.hl7.org/fhir/conceptmap.html . Accessed April 7, 2020
- 51 OECD Principles and Guidelines for Access to Research Data from Public Funding. Available at: http://www.oecd.org/sti/inno/38500813.pdf . Accessed April 7, 2020
- 52 The Royal Society. Science as an open enterprise: open data for open science. Available at: https://royalsociety.org/-/media/policy/projects/sape/2012-06-20-saoe.pdf . Accessed April 7, 2020
- 53 European Commission. Horizon 2020, Work Programme 2018–2020, Health, demographic change and wellbeing. Available at: https://ec.europa.eu/programmes/horizon2020/sites/horizon2020/files/health_h2020_draft_sc1_wp_18-20_0.pdf . Accessed April 7, 2020
- 54 Notice Announcing Funding Opportunity Issued for the NIH Data Commons Pilot Phase. Available at: https://grants.nih.gov/grants/guide/notice-files/NOT-RM-17-031.html . Accessed April 7, 2020
- 55 2016 National Research Infrastructure Roadmap. Available at: https://docs.education.gov.au/system/files/doc/other/ed16-0269_national_research_infrastructure_roadmap_report_internals_acc.pdf . Accessed April 7, 2020
- 56 University of Nebraska–Lincoln. The african open science platform: the future of science and the science of the future. Available at: https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1092&context=scholcom . Accessed April 7, 2020
- 57 Musen MA, Sansone S-A, Cheung K-H. , et al. CEDAR: Semantic Web Technology to Support Open Science. In WWW'18Companion: The 2018 Web Conference Companion, April 23–27, 2018, Lyon, France. 2018 ;2: 427 428
- 58 Musen MA, Bean CA, Cheung KH. , et al; CEDAR team. The center for expanded data annotation and retrieval. J Am Med Inform Assoc 2015; 22 (06) 1148-1152
- 59 Thompson M, Bonino L, Wilkinson MD. , et al. Overview of a suite of middle-ware services for implementing FAIR data principles. CEUR Workshop Proc 2017