Yearb Med Inform 2014; 23(01): 14-20
DOI: 10.15265/IY-2014-0020
Original Article
Georg Thieme Verlag KG Stuttgart

Big Data in Medicine Is Driving Big Changes

F. Martin-Sanchez
1   Health and Biomedical Informatics Centre, The University of Melbourne, Parkville VIC 3010 Australia
2   Department of Computing and Information Systems, The University of Melbourne, Parkville VIC 3010 Australia
,
K. Verspoor
2   Department of Computing and Information Systems, The University of Melbourne, Parkville VIC 3010 Australia
1   Health and Biomedical Informatics Centre, The University of Melbourne, Parkville VIC 3010 Australia
› Author Affiliations
Further Information

Publication History

15 August 2014

Publication Date:
05 March 2018 (online)

Summary

Objectives: To summarise current research that takes advantage of “Big Data” in health and biomedical informatics applications.

Methods:Survey of trends in this work, and exploration of literature describing how large-scale structured and unstructured data sources are being used to support applications from clinical decision making and health policy, to drug design and pharmacovigilance, and further to systems biology and genetics.

Results: The survey highlights ongoing development of powerful new methods for turning that large-scale, and often complex, data into information that provides new insights into human health, in a range of different areas. Consideration of this body of work identifies several important paradigm shifts that are facilitated by Big Data resources and methods: in clinical and translational research, from hypothesis-driven research to data-driven research, and in medicine, from evidence-based practice to practice-based evidence.

Conclusions: The increasing scale and availability of large quantities of health data require strategies for data management, data linkage, and data integration beyond the limits of many existing information systems, and substantial effort is underway to meet those needs. As our ability to make sense of that data improves, the value of the data will continue to increase. Health systems, genetics and genomics, population and public health; all areas of biomedicine stand to benefit from Big Data and the associated technologies.

 
  • References

  • 1 O’Driscoll A, Daugelaite J, Sleator RD. ‘Big data’, Hadoop and cloud computing in genomics. J Biomed Inform 2013; 46 (Suppl. 05) 774-81.
  • 2 Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds for big data manipulation. Biol Direct 2012; 7: 43 discussion 43.
  • 3 de Lissovoy G. Big data meets the electronic medical record: a commentary on “identifying patients at increased risk for unplanned readmission”. Med Care 2013; 51 (Suppl. 09) 759-60.
  • 4 Lusher SJ, McGuire R, van Schaik RC, Nicholson CD, de Vlieg J. Data-driven medicinal chemistry in the era of big data. Drug Discov Today 2014; Jul 19 (Suppl. 07) 859-68.
  • 5 Jalali A, Olabode OA, Bell CM. Leveraging Cloud Computing to Address Public Health Disparities: An Analysis of the SPHPS. Online J Public Health Inform 2012 4. 03
  • 6 Wang LW, Qu AP, Yuan JP, Chen C, Sun SR, Hu MB. et al. Computer-based image studies on tumor nests mathematical features of breast cancer and their clinical prognostic value. PLoS One 2013; 8 (Suppl. 12) e82314.
  • 7 Duncan LE, Keller MC. A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am J Psychiatry 2011; 168 (Suppl. 10) 1041-9.
  • 8 Bromley D, Rysavy SJ, Su R, Toofanny RD, Schmidlin T, Daggett V. DIVE: A Data Intensive Visualization Engine. Bioinformatics 2014 Feb 15 30 (Suppl. 04) 593-5.
  • 9 Orchard S, Binz PA, Jones AR, Vizcaino JA, Deutsch EW, Hermjakob H. Preparing to work with big data in proteomics - a report on the HUPO-PSI Spring Workshop: April 15-17, 2013, Liverpool, UK. Proteomics 2013; 13 (Suppl. 20) 2931-7.
  • 10 Tenenbaum JD, Sansone SA, Haendel M. A sea of standards for omics data: sink or swim?. J Am Med Inform Assoc 2014; Mar-Apr 21 (Suppl. 02) 200-3.
  • 11 Iwasaki Y, Abe T, Wada Y, Wada K, Ikemura T. Novel bioinformatics strategies for prediction of directional sequence changes in influenza virus genomes and for surveillance of potentially hazardous strains. BMC Infect Dis 2013; 13: 386.
  • 12 Mewes HW. Perspectives of a systems biology of the brain: the big data conundrum understanding psychiatric diseases. Pharmacopsychiatry 2013;46 Suppl 1 S2-9.
  • 13 Fan J, Liu H. Statistical analysis of big data on pharmacogenomics. Adv Drug Deliv Rev 2013; 65 (Suppl. 07) 987-1000.
  • 14 Mohammed Y, Mostovenko E, Henneman AA, Marissen RJ, Deelder AM, Palmblad M. Cloud parallel processing of tandem mass spectrometry based proteomics data. J Proteome Res 2012; 11 (Suppl. 10) 5101-8.
  • 15 Shah NH. Mining the ultimate phenome repository. Nat Biotech 2013; 31 (Suppl. 12) 1095-7.
  • 16 Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H. et al. A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk. Cell 2013; 155 (Suppl. 01) 70-80.
  • 17 Choi M, Lee J, Ahn MJ, Kim Y. Nursing critical patient severity classification system predicts outcomes in patients admitted to surgical intensive care units: use of data from clinical data repository. Stud Health Technol Inform 2013; 192: 1063.
  • 18 Dong X, Bahroos N, Sadhu E, Jackson T, Chukhman M, Johnson R. et al. Leverage hadoop framework for large scale clinical informatics applications. AMIA Summits Transl Sci Proc 2013; 2013: 53.
  • 19 Hsieh JC, Li AH, Yang CC. Mobile, cloud, and big data computing: contributions, challenges, and new directions in telecardiology. Int J Environ Res Public Health 2013; 10 (Suppl. 11) 6131-53.
  • 20 Shen CP, Zhou W, Lin FS, Sung HY, Lam YY, Chen W. et al. Epilepsy analytic system with cloud computing. Conf Proc IEEE Eng Med Biol Soc 2013; 2013: 1644-7.
  • 21 Dixon BE, Rosenman M, Xia Y, Grannis SJ. A vision for the systematic monitoring and improvement of the quality of electronic health data. Stud Health Technol Inform 2013; 192: 884-8.
  • 22 Chai KE, Anthony S, Coiera E, Magrabi F. Using statistical text classification to identify health information technology incidents. J Am Med Inform Assoc 2013; 20 (Suppl. 05) 980-5.
  • 23 Sepulveda MJ. From worker health to citizen health: moving upstream. J Occup Environ Med 2013; 55 (Suppl. 12) S52-57.
  • 24 Kum HC, Krishnamurthy A, Machanavajjhala A, Reiter MK, Ahalt S. Privacy preserving interactive record linkage (PPIRL). J Am Med Inform Assoc 2014; Mar-Apr 21 (Suppl. 02) 212-20.
  • 25 Hay SI, George DB, Moyes CL, Brownstein JS. Big data opportunities for global infectious disease surveillance. PLoS Med 2013; 10 (Suppl. 04) e1001413.
  • 26 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010; 17 (Suppl. 05) 507-13.
  • 27 Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010; 17 (Suppl. 03) 229-236.
  • 28 Doan S, Conway M, Phuong TM, Ohno-Machado L. Natural Language Processing in Biomedicine: A Unified System Architecture Overview. In: Trent RJA. editor. Clinical Bioinformatics. Springer; 2014
  • 29 Voorhees EM, Hersh W. Overview of the TREC 2012 Medical Records Track. In: The 21st Text REtrieval Conference (TREC 2012) 2012
  • 30 Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara-Eves A. et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Research Synthesis Methods 2013 online preprint.
  • 31 Arighi CN, Carterette B, Cohen KB, Krallinger M, Wilbur WJ, Fey P. et al. An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database 2013 2013
  • 32 Cairns BL, Nielsen RD, Masanz JJ, Martin JH, Palmer MS, Ward WH. et al. The MiPACQ Clinical Question Answering System. In: AMIA Annu Symp Proceedings; 2011
  • 33 Lu Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011(baq036).
  • 34 Baumgartner Jr. WA, Cohen KB, Fox L, Acquaah-Mensah GK, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 2007; 23: i41-i48.
  • 35 Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 2006; 7 (Suppl. 02) 119-29.
  • 36 Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 2012; 13 (Suppl. 12) 829-39.
  • 37 Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 1999; 27 (Suppl. 06) 1216-7.
  • 38 Johansson M, Roberts A, Chen D, Li Y, Delahaye-Sourdeix M, Aswani N. et al. Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method. PLoS One 2012; 7 (Suppl. 05) e36888.
  • 39 Raychaudhuri S, Plenge RM, Rossin EJ, Ng ACY, Purcell SM, Sklar P. et al. Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions. PLoS Genet 2009; 5 (Suppl. 06) e1000534.
  • 40 Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature: A survey of the state of the art. Briefings in Bioinformatics 2012; 13 (Suppl. 04) 460-94.
  • 41 Hakenberg J, Voronov D, Nguyen VH, Liang S, Anwar S, Lumpkin B. et al. A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J Biomed Inform 2012; 45 (Suppl. 05) 842-50.
  • 42 Verspoor K, Jimeno Yepes A, Cavedon L, McIntosh T, Herten-Crabb A, Thomas Z. et al. Annotating the biomedical literature for the human variome. Database 2013 2013
  • 43 Li C, Liakata M, Rebholz-Schuhmann D. Biological network extraction from scientific literature: state of the art and challenges. Brief Bioinform 2013
  • 44 Roden DM, Tyndale RF. Genomic Medicine, Precision Medicine, Personalized Medicine: What’s in a Name?. Clin Pharmacol Ther 2013; 94 (Suppl. 02) 169-72.
  • 45 Sackett DL, Rosenberg WMC, Muir Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ 1996; 312 7023 71-2.
  • 46 Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ 2013; 346: f139.
  • 47 Cohen AM, Adams CE, Davis JM, Yu C, Yu PS, Meng W. et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. In: Proceedings of the 1st ACM International Health Informatics Symposium; Arlington, Virginia, USA. 1883046: ACM; 2010. p. 376-80
  • 48 Pincus T, Sokka T. Evidence-based practice and practice-based evidence. Nat Clin Pract Rheum 2006; 2 (Suppl. 03) 114-5.
  • 49 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13 (Suppl. 06) 395-405.
  • 50 Hurdle JF, Smith KR, Mineau GP. Mining electronic health records: an additional perspective. Nat Rev Genet 2013; 14 (Suppl. 01) 75.
  • 51 Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE – An Integrated Standards-Based Translational Research Informatics Platform. In: AMIA Annual Symposium Proceedings 2009; p. 391-5.
  • 52 Frankovich J, Longhurst CA, Sutherland SM. Evidence-Based Medicine in the EMR Era. N Engl J Med 2011; 365 (Suppl. 19) 1758-9
  • 53 LePendu P, Iyer SV, Bauer-Mehren A, Harpaz R, Mortensen JM, Podchiyska T. et al. Pharmacovigilance Using Clinical Notes. Clin Pharmacol Ther 201 93 (Suppl. 06) 547-55.
  • 54 Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G. Detecting Adverse Events Using Information Technology. J Am Med Inform Assoc 2003; 10 (Suppl. 02) 115-28.
  • 55 Leeper NJ, Bauer-Mehren A, Iyer SV, LePendu P, Olson C, Shah NH. Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes. PLoS One 2013; 8 (Suppl. 05) e63499.
  • 56 Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T. et al. Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts. PLoS Comput Biol 2011; 7 (Suppl. 08) e1002141.
  • 57 Korkontzelos I, Mu T, Restificar A, Ananiadou S. Text mining for efficient search and assisted creation of clinical trials. In: Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics Glasgow, Scotland, UK. 2064706: ACM; 2011. p. 43-50
  • 58 Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc 2013; 20 e2 e206-e211.
  • 59 Lyalina S, Percha B, LePendu P, Iyer SV, Altman RB, Shah NH. Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. J Am Med Inform Assoc 2013; 20 e2 e297-e305.
  • 60 Gundlapalli AV, Redd A, Carter M, Divita G, Shen S, Palmer M. et al. Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. J Am Med Inform Assoc 2013; 20 e2 e355-e364.
  • 61 Davis MF, Sriram S, Bush WS, Denny JC, Haines JL. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. J Am Med Inform Assoc 2013; 20 e2 e334-e340.
  • 62 Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DS, Chen PJ. et al. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc 2013; 20 e2 e341-e348.
  • 63 Deleger L, Brodzinski H, Zhai H, Li Q, Lingren T, Kirkendall ES. et al. Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department. J Am Med Inform Assoc 2013; 20 e2 e212-e220.
  • 64 Resnik P, Niv M, Nossal M, Kapit A, Toren R. Communication of Clinically Relevant Information in Electronic Health Records: A Comparison between Structured Data and Unrestricted Physician Language. In: Perspectives in Health Information Management. CAC Proceedings; 2008
  • 65 Shapiro J, Bakken S, Hyun S, Melton G, Schlegel C, SB J. Document ontology: supporting narrative documents in electronic health records. In: AMIA Annual Symposium Proceedings: 2005. p. 684-6
  • 66 Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature 2009; 457 7232 1012-4.
  • 67 Bernardo TM, Rajic A, Young I, Robiadek K, Pham MT, Funk JA. Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation. J Med Internet Res 2013; 15 (Suppl. 07) e147.
  • 68 Halevy A, Norvig P, Pereira F. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 2009; 24 (Suppl. 02) 8-12.
  • 69 Collier N. Uncovering text mining: a survey of current work on web-based epidemic intelligence. Glob Public Health 2012; 7 (Suppl. 07) 731-49.
  • 70 Paul MJ, Dredze M. You are what you Tweet: Analyzing Twitter for Public Health. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM 2011); Barcelona, Spain: 2011
  • 71 Gesualdo F, Stilo G, Agricola E, Gonfiantini MV, Pandolfi E, Velardi P. et al. Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language. PLoS One 2013; 8 (Suppl. 12) e82489.
  • 72 Wu H, Fang H, Stanhope S. Exploiting online discussions to discover unrecognized drug side effects. Methods Inf Med 2013; 52 (Suppl. 02) 152-9.
  • 73 Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G. Towards Internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In: Proceedings of the Workshop on Biomedical Natural Language Processing Uppsala, Sweden: Association for Computational Linguistics; 2010. p. 117-25
  • 74 White RW, Tatonetti NP, Shah NH, Altman RB, Horvitz E. Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med Inform Assoc 2013; 20 (Suppl. 03) 404-408.
  • 75 Mudunuri US, Khouja M, Repetski S, Venkataraman G, Che A, Luke BT. et al. Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data. PLoS One 2013; 8 (Suppl. 12) e80503.
  • 76 Kum HC, Ahalt S. Privacy-by-Design: Understanding Data Access Models for Secondary Data. AMIA Summits Transl Sci Proc 2013; 2013: 126-30.