Yearb Med Inform 2014; 23(01): 42-47
DOI: 10.15265/IY-2014-0018
Original Article
Georg Thieme Verlag KG Stuttgart

Technical Challenges for Big Data in Biomedicine and Health: Data Sources, Infrastructure, and Analytics

N. Peek
1   Dept. of Medical Informatics, Academic Medical Center, University of Amsterdam, The Netherlands
2   Centre for Health Informatics, Institute of Population Health , University of Manchester, Manchester, UK
,
J. H. Holmes
3   Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
,
J. Sun
4   College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
› Author Affiliations
Further Information

Publication History

29 January 2015

Publication Date:
05 March 2018 (online)

Zoom Image

Summary

Objectives: To review technical and methodological challenges for big data research in biomedicine and health.

Methods: We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. Results: The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. Conclusions: The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.