Methods Inf Med 2015; 54(03): 276-282
DOI: 10.3414/ME13-01-0133
Original Articles
Schattauer GmbH

Secure Secondary Use of Clinical Data with Cloud-based NLP Services

Towards a Highly Scalable Research Infrastructure
J. Christoph
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
L. Griebel
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
I. Leb
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
I. Engel
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
F. Köpcke
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
D. Toddenroth
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
H. -U. Prokosch
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
,
J. Laufer
2   Rhön-Klinikum AG, Bad Neustadt/Saale, Germany
,
K. Marquardt
2   Rhön-Klinikum AG, Bad Neustadt/Saale, Germany
,
M. Sedlmayr
1   Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
› Author Affiliations
Further Information

Publication History

received: 01 December 2013

accepted: 08 October 2014

Publication Date:
22 January 2018 (online)

Summary

Objectives: The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases.

Methods: The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis.

Results: Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed.

Conclusions: Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.

 
  • References

  • 1 Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV. et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care 2013; 51 (Suppl. 08) Suppl 3 S30-7. Epub 2013/06/19.
  • 2 Cuggia M, Besana P, Glasspool D. Comparing semi-automatic systems for recruitment of patients to clinical trials. Int J Med Inform 2011; 80 (06) 371-388. Epub 2011/04/05.
  • 3 Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S. et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. JAMIA 2009; 16 (05) 624-630. Epub 2009/07/02.
  • 4 McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB. et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics 2011; 4: 13 Epub 2011/01/29.
  • 5 Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE. et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform 2012; 45 (04) 763-771. Epub 2012/02/14.
  • 6 Ethier JF, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN. et al. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm. JAMIA 2013; 20 (05) 986-994. Epub 2013/04/11.
  • 7 Oliveira JL, Lopes P, Nunes T, Campos D, Boyer S, Ahlberg E. et al. The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiology and drug safety 2013; 22 (05) 459-467. Epub 2012/12/05.
  • 8 Coorevits P, Sundgren M, Klein GO, Bahr A, Claerhout B, Daniel C. et al. Electronic health records: new opportunities for clinical research. J Int Med 2013; 274 (06) 547-560. Epub 2013/08/21.
  • 9 Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. JAMIA 2011; 18 (05) 539 Epub 2011/08/19.
  • 10 Thomas AA, Zheng C, Jung H, Chang A, Kim B, Gelfond J. et al. Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results. World journal of urology. 2013. Epub 2013/02/19.
  • 11 Chard K, Russell M, Lussier YA, Mendonca EA, Silverstein JC. A cloud-based approach to medical NLP. In: Proc AMIA Ann Symp. 2011: 207-216. Epub 2011/12/24.
  • 12 Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC. Terminology extraction from text to build an ontology in surgical intensive care. In: Proc AMIA Ann Symp. 2002: 430-434. Epub 2002/12/05.
  • 13 Mell P, Grance T. The NIST Definition of Cloud Computing. Commun Acm 2010; 53 (06) 50.
  • 14 Stingl C, Slamanig D. Health Records and the Cloud Computing Paradigm from a Privacy Perspective. Journal of Healthcare Engineering 2011; 2 (04) 487-508.
  • 15 Glock J, Herold R, Pommerening K. Personal identifiers in medical research networks: evaluation of the personal identifier generator in the Competence Network Paediatric Oncology and Haematology GMS Med Inform Biom Epidemiol 2006 [Internet]. 2006; 2: 2.
  • 16 Feldman H, Reti S, Kaldany E, Safran C. Deployment of a highly secure clinical data repository in an insecure international environment. Studies in health technology and informatics 2010; 160 Pt 2 869.
  • 17 Neamatullah I, Douglass MM, Li-wei HL, Reisner A, Villarroel M, Long WJ. et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8 (01) 32.
  • 18 Pfitzmann A, Hansen M. Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management - A Consolidated Proposal for Terminology 2008 2013–03–08. 2013. Available from http://dud.inf.tu-dresden.de/Anon_Terminology.shtml.
  • 19 Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002; 10 (05) 557-570.
  • 20 McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clinical chemistry 2003; 49 (04) 624-633.
  • 21 Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. JAMIA 2013; 20 (01) 117-121. Epub 2012/09/08.
  • 22 Talend-Germany. Talend Open Studio. 2013. [29.11.2013]; Available from http://en.talend. com/products/talend-open-studio.
  • 23 Zunner C, Burkle T, Prokosch HU, Ganslandt T. Mapping local laboratory interface terms to LOINC at a German university hospital using RELMA V.5: a semi-automated approach. JAMIA 2013; 20 (02) 293-297. Epub 2012/07/18.
  • 24 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). JAMIA 2010; 17 (02) 124-130.
  • 25 Sax U, Winter A, Prokosch H-U. Integrated Data Repository Toolkit (IDRT).
  • 26 Milojièiæ D, Llorente IM, Montero RS. Open-nebula: A cloud management tool. Internet Computing, IEEE 2011; 15 (02) 11-14.
  • 27 Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 2004; 10 (03) (04) 327-348.
  • 28 CDISC. Clinical Data Interchange Standards Consortium - Operational Data Model. Available from. http://www.cdisc.org/odm.
  • 29 Eder J, Gottweis H, Zatloukal K. It solutions for privacy protection in biobanking. Public Health Genomics 2012; 15 (05) 254-262.
  • 30 Stark K, Eder J, Zatloukal K. Achieving k-anonymity in DataMarts used for gene expressions exploitation. J Integr Bioinform 2007; 4 (01) 57.
  • 31 Payne P, Ervin D, Dhaval R, Borlawsky T, Lai A. TRIAD: The Translational Research Informatics and Data Management Grid. Applied clinical informatics 2011; 2 (03) 331-344. Epub 2011/01/01.
  • 32 Heath AP. et al. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets. JAMIA. 2014. Epub 2014/01/28.
  • 33 EHR4CR-Konsortium. Electronic Health Rec- ords for Clinical Research. 2011. [cited 2013 16.01.2013]; Available from http://www.ehr4cr.eu.
  • 34 Shivade C. et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. JAMIA 2013. Epub 2013/11/10.
  • 35 Meystre SM, Deshmukh VG, Mitchell J. A clinical use case to evaluate the i2b2 Hive: predicting asthma exacerbations. In: Proc AMIA Annual Symposium. 2009: 442-446. Epub 2009/01/01.
  • 36 Hurdle JF. et al. Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database. JAMIA 2013; 20 (01) 164-171. Epub 2012/10/13.
  • 37 Gesundheit Bf. Gesundheitssystem. 2013. [cited 2013 17.1.2013]; Available from http://www.bmg.bund.de/gesundheitssystem.html.
  • 38 TMF. Arbeitsgruppe Datenschutz. [28.11.2013]; Available from. http://www.tmf-ev.de/Arbeitsgruppen_Foren/AGDS.aspx.
  • 39 Tomanek K, Enders F, Daumke P, Müller ML, Sedlmayr M. Prokosch H-U. Ein System zur De-Identifikation medizinischer Rohdaten. GMDS 2012. 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie eV (GMDS). Braunschweig: German Medical Science GMS Publishing House; 2012
  • 40 Tomanek K. et al. An Interactive De-Identifica- tion-System. 2014. (9 April 2014). Available from http://www.zora.uzh.ch/64476/16/11_An_interactive_de-identification-system.pdf.
  • 41 Ziegler W. et al. Experience made using public cloud infrastructure to analyse clinical patient data. In Cunningham P. editor eChallenges; Dublin. 2013
  • 42 Senger P, Klenner A, Fluck J. A Business Logic System for Mining German Patient Records. 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie eV (GMDS); Lübeck. 2013
  • 43 Ahuja SP, Mani S, Zambrano J. A Survey of the State of Cloud Computing in Healthcare. Network and Communication Technologies 2012; 1 (02) 12-19.
  • 44 Briscoe G, Marinos A. editors Digital ecosystems in the clouds: towards community cloud computing. 3rd IEEE International Conference on Digital Ecosystems and Technologies (DEST’09). 2009. IEEE.;
  • 45 Fette G, Ertl M, Wörner A, Klügl P, Störk S, Puppe F. Information Extraction from Unstructured Electronic Health Records and Integration into a Data Warehouse. In Goltz U, Magnor MA, Appelrath H-J, Matthies HK, Balke W-T, Wolf LC. editors Informatik 2012. Braunschweig: GI; 2012
  • 46 Anderson N. et al. Implementation of a deidentified federated data network for population-based cohort discovery. JAMIA 2012; 19 e (01) e60-7. Epub 2011/08/30.