Yearb Med Inform 2011; 20(01): 146-155
DOI: 10.1055/s-0038-1638754
Survey
Georg Thieme Verlag KG Stuttgart

Trends and Developments in Bioinformatics in 2010: Prospects and Perspectives

C. F. Aliferis
1   Center for Health Informatics and Bioinformatics, New York University
,
A. V. Alekseyenko
1   Center for Health Informatics and Bioinformatics, New York University
,
Y. Aphinyanaphongs
1   Center for Health Informatics and Bioinformatics, New York University
,
S. Brown
1   Center for Health Informatics and Bioinformatics, New York University
,
D. Fenyo
1   Center for Health Informatics and Bioinformatics, New York University
,
L. Fu
1   Center for Health Informatics and Bioinformatics, New York University
,
S. Shen
1   Center for Health Informatics and Bioinformatics, New York University
,
A. Statnikov
1   Center for Health Informatics and Bioinformatics, New York University
,
J. Wang
1   Center for Health Informatics and Bioinformatics, New York University
› Author Affiliations
Supported in part by grant 1 UL1 RR029893 from the National Center for Research Resources, National Institutes of Health.
Further Information

Publication History

Publication Date:
06 March 2018 (online)

Summary

Objectives

To survey major developments and trends in the field of Bioinformatics in 2010 and their relationships to those of previous years, with emphasis on long-term trends, on best practices, on quality of the science of informatics, and on quality of science as a function of informatics.

Methods

A critical review of articles in the literature of Bioinformatics over the past year.

Results

Our main results suggest that Bioinformatics continues to be a major catalyst for progress in Biology and Translational Medicine, as a consequence of new assaying technologies, most predominantly Next Generation Sequencing, which are changing the landscape of modern biological and medical research. These assays critically depend on bioinformatics and have led to quick growth of corresponding informatics methods development. Clinical-grade molecular signatures are proliferating at a rapid rate. However, a highly publicized incident at a prominent university showed that deficiencies in informatics methods can lead to catastrophic consequences for important scientific projects. Developing evidence-driven protocols and best practices is greatly needed given how serious are the implications for the quality of translational and basic science.

Conclusions

Several exciting new methods have appeared over the past 18 months, that open new roads for progress in bioinformatics methods and their impact in biomedicine. At the same time, the range of open problems of great significance is extensive, ensuring the vitality of the field for many years to come.

 
  • References

  • 1 Kohane I. Ten thousand views of bioinformatics: a bibliome perspective. Yearb Med Inform 2009; 113-6.
  • 2 Lo YM, Chan KC, Sun H, Chen EZ, Jiang P, Lun FM. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci Transl Med 2010; 02: 61ra91.
  • 3 Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J. et al. Carrier testing for severe childhood recessive diseases by nextgeneration sequencing. Sci Transl Med 2011; 03: 65ra4.
  • 4 Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Durbin RM. et al. A map of human genome variation from population-scale sequencing. Nature 2010; 467: 1061-73.
  • 5 Kapranov P, St LG, Raz T, Ozsolak F, Reynolds CP, Sorensen PH. et al. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA. BMC Biol 2010; 08: 149.
  • 6 Plessy C, Bertin N, Takahashi H, Simone R, Salimullah M, Lassmann T. et al. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 2010; 07: 528-34.
  • 7 Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods 2011; 08: 61-65.
  • 8 Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 2011; 108: 1513-8.
  • 9 Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 2010; 20: 265-72.
  • 10 Li R, Fan W, Tian G, Zhu H, He L, Cai J. et al. The sequence and de novo assembly of the giant panda genome. Nature 2010; 463: 311-7.
  • 11 Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM. et al. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38: D346-54.
  • 12 Nelson KE, Weinstock GM, Highlander SK, Worley KC, Creasy HH, Wortman JR. et al. A catalog of reference genomes from the human microbiome. Science 2010; 328: 994-9.
  • 13 Kottmann R, Kostadinov I, Duhaime MB, Buttigieg PL, Yilmaz P, Hankeln W. et al. Megx.net: integrated database resource for marine ecological genomics. Nucleic Acids Res 2010; 38: D391-5.
  • 14 Chen T, Yu WH, Izard J, Baranova OV, Lakshmanan A, Dewhirst FE. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database (Oxford). 2010 2010. baq013
  • 15 Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK. et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010; 07: 335-6.
  • 16 Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methe BA. et al. METAREP: JCVI metagenomics reports—an open source tool for highperformance comparative metagenomics. Bioinformatics 2010; 26: 2631-2.
  • 17 Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010 2010. db
  • 18 A gene-centric human proteome project: HUPO— the Human Proteome organization. Mol Cell Proteomics 2010; 09: 427-9.
  • 19 Fenyo D, Eriksson J, Beavis R. Mass spectrometric protein identification using the global proteome machine. Methods Mol Biol 2010; 673: 189-202.
  • 20 Lam H, Aebersold R. Using spectral libraries for peptide identif ication from tandem mass spectrometry (MS/MS) data. Curr Protoc Protein Sci. 2010 Chapter 25:Unit 25.5
  • 21 Cox J, Matic I, Hilger M, Nagaraj N, Selbach M, Olsen JV. et al. A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 2009; 04: 698-705.
  • 22 MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010; 26: 966-8.
  • 23 Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 2005; 74: 491-503.
  • 24 Fananapazir N, Statnikov A, Aliferis CF. The FASTAIMS Clinical Mass Spectrometry Analysis System. Adv Bioinformatics 2009; 598241.
  • 25 Narendra V, Lytkin NI, Aliferis CF, Statnikov A. A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. Genomics 2011; 97: 7-18.
  • 26 Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multi-category classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005; 21: 631-43.
  • 27 Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008; 09: 319.
  • 28 Aliferis CF, Statnikov A, Tsamardinos I, Schildcrout JS, Shepherd BE, Harrell FE. Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data. PLoS ONE 2009; 04: e4922.
  • 29 Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research 2010; 11: 171-234.
  • 30 Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 2005; 365: 488-92.
  • 31 Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics 2009; 03: 1309-34.
  • 32 Coombes KR, Wang J, Baggerly KA. Microarrays: retracing steps. Nat Med 2007; 13: 1276-7.
  • 33 Bovelstad HM, Nygard S, Storvold HL, Aldrin M, Borgan O, Frigessi A. et al. Predicting survival from microarray data—a comparative study. Bioinformatics 2007; 23: 2080-7.
  • 34 Jiang X, Neapolitan RE, Barmada M, Visweswaran S, Cooper GF. A Fast Algorithm for Learning Epistatic Genomic Relationships. AMIA 2010 Annual Symposium Proceedings 2010; : 341-5.
  • 35 Cooper GF, Hennings-Yeomans P, Visweswaran S, Barmada M. An Efficient Bayesian Method for Predicting Clinical Outcomes from Genome-Wide Data. AMIA 2010 Annual Symposium Proceedings 2010; : 127-31.
  • 36 Alekseyenko AV, Lytkin NI, Ai J, Ding B, Padyukov L, Aliferis CF. et al. Causal Graph-Based Analysis of Genome-Wide Association Data in Rheumatoid Arthritis. CHIBI Technical Report 2010-2, New York University Langone Medical Center; 2010
  • 37 Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C. et al. From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet 2009; 05: e1000678.
  • 38 Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 2009; 106: 9362-7.
  • 39 Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE. et al. Tackling the widespread and critical impact of batch effects in highthroughput data. Nat Rev Genet 2010; 11: 733-9.
  • 40 Somorjai RL, Dolenko B, Baumgartner R. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 2003; 19: 1484-91.
  • 41 Azuaje F, Dopazo J. Data analysis and visualization in genomics and proteomics. Hoboken, NJ: John Wiley; 2005
  • 42 Statnikov A, Aliferis CF. Analysis and Computational Dissection of Molecular Signature Multiplicity. PLoS Comput Biol 2010; 06: e1000790.
  • 43 Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ. et al. Rewiring of genetic networks in response to DNA damage. Science 2010; 330: 1385-9.
  • 44 Geiger T, Cox J, Mann M. Proteomic changes resulting from gene copy number variations in cancer cells. PLoS Genet 2010; 6.
  • 45 Koulov AV, Lapointe P, Lu B, Razvi A, Coppinger J, Dong MQ. et al. Biological and structural basis for Aha1 regulation of Hsp90 ATPase activity in maintaining proteostasis in the human disease cystic fibrosis. Mol Biol Cell 2010; 21: 871-84.
  • 46 Terhune SS, Moorman NJ, Cristea IM, Savaryn JP, Cuevas-Bennett C, Rout MP. et al. Human cytomegalovirus UL29/28 protein interacts with components of the NuRD complex which promote accumulation of immediate-early RNA. PLoS Pathog 2010; 06: e1000965.
  • 47 Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010; 17: 98-110.
  • 48 Cerami E, Demir E, Schultz N, Taylor BS, Sander C. Automated network analysis identifies core pathways in glioblastoma. PLoS One 2010; 05: e8918.
  • 49 Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE. et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med 2010; 363: 2424-33.
  • 50 Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A. et al. Five-vertebrate ChIPseq reveals the evolutionary dynamics of transcription factor binding. Science 2010; 328: 1036-40.
  • 51 Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S. et al. PICS: Probabilistic Inference for ChIP-seq. Biometrics. 2010
  • 52 Hu M, Yu J, Taylor JM, Chinnaiyan AM, Qin ZS. On the detection and refinement of transcription factor binding sites using ChIP-Seq data. Nucleic Acids Res 2010; 38: 2154-67.
  • 53 Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS. et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010; 11: 237.
  • 54 Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I. et al. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res 2011; Mar; 39 (06) e35.
  • 55 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25: 2078-9.
  • 56 Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 2009; 25: 2283-5.
  • 57 McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20: 1297-303.
  • 58 Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB. Rare variants create synthetic genome-wide associations. PLoS Biol 2010; 08: e1000294.
  • 59 Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010; 11: 415-25.
  • 60 Cooper GF, Yoo C. Causal Discovery from a Mixture of Experimental and Observational Data. Proceedings of the Fifteenth Conference Annual Conference on Uncertainty inArtificial Intelligence (UAI99) 1999; : 116-25.
  • 61 Yoo C, Thorsson V, Cooper GF. Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data. Proceedings of the 2002 Pacific Symposium on Biocomputing 2002; : 498-509.
  • 62 Yoo C, Cooper GF. Discovery of gene-regulation pathways using local causal search. Proc AMIA Symp 2002; : 914-8.
  • 63 Yoo C, Brilz EM. The five-gene-network data analysis with local causal discovery algorithm using causal Bayesian networks. Ann N Y Acad Sci 2009; 1158: 93-101.
  • 64 Yoo C, Cooper GF. A computer-based microarray experiment design-system for generegulation pathway discovery. AMIA Annu Symp Proc 2003; : 733-7.
  • 65 Yoo C, Cooper GF. An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways. Artif Intell Med 2004; 31: 169-82.
  • 66 Yoo C, Cooper GF, Schmidt M. A control study to evaluate a computer-based microarray experiment design recommendation system for generegulation pathways discovery. J Biomed Inform 2006; 39: 126-46.
  • 67 Tong S, Koller D. Active learning for structure in Bayesian networks. Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001) 2001; 17: 863-9.
  • 68 Murphy KP. Active learning of causal Bayes net structure. Technical Report. University of California; erkeley: 2001
  • 69 Eberhardt F, Hoyer PO, Scheines R. Combining Experiments to Discover Linear Cyclic Models with Latent Variables. Journal of Machine Learning Research, Workshop and Conference Proceedings (AISTATS 2010) 2010; 09: 185-92.
  • 70 Pournara I, Wernisch L. Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 2004; 20: 2934-42.
  • 71 Meganck S, Leray P, Manderick B. Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach. Modeling Decisions in Artificial Intelligence, LNCS 2006; 58-69.
  • 72 He Y, Geng Z. Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 2008; 09: 2523-47.
  • 73 King RD, Rowland J, Oliver SG, Young M, Aubrey W, Byrne E. et al. The automation of science. Science 2009; 324: 85-9.
  • 74 King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH. et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 2004; 427: 247-52.
  • 75 Sparkes A, Aubrey W, Byrne E, Clare A, Khan MN, Liakata M. et al. Towards Robot Scientists for autonomous scientific discovery. Autom Exp 2010; 02: 1.
  • 76 Wolinsky H. I, scientist. Will robots at the bench leave scientists free to think? EMBO Rep 2007; 08: 720-2.
  • 77 Demsar J, Zupan B, Bratko I, Kuspa A, Halter JA, Beck RJ. et al. GenePath: a computer program for genetic pathway discovery from mutant data. Stud Health Technol Inform 2001; 84: 956-9.
  • 78 Juvan P, Demsar J, Shaulsky G, Zupan B. GenePath: from mutations to genetic networks and back. Nucleic Acids Res 2005; 33: W749-52.
  • 79 Zupan B, Bratko I, Demsar J, Juvan P, Curk T, Borstnik U. et al. GenePath: a system for inference of genetic networks and proposal of genetic experiments. Artif Intell Med 2003; 29: 107-30.
  • 80 Zupan B, Demsar J, Bratko I, Juvan P, Halter JA, Kuspa A. et al. GenePath: a system for automated construction of genetic networks from mutant data. Bioinformatics 2003; 19: 383-9.
  • 81 Ideker TE, Thorsson V, Karp RM. Discovery of regulatory interactions through perturbation: inference and experimental design. Pac Symp Biocomput 2000; 305-16.
  • 82 Szczurek E, Gat-Viks I, Tiuryn J, Vingron M. Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments. Mol Syst Biol 2009; 05: 287.
  • 83 Tegner J, Yeung MK, Hasty J, Collins JJ. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci U S A 2003; 100: 5944-9.
  • 84 Steinke F, Seeger M, Tsuda K. Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models. BMC Syst Biol 2007; 01: 51.
  • 85 Eberhardt F, Glymour C, Scheines R. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI-2005) 2005; : 178-83.
  • 86 Eberhardt F, Glymour C, Scheines R. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables. Innovations in Machine Learning: Theory And Applications. 2006; 97-112.
  • 87 Eberhardt F, Scheines R. Interventions and Causal Inference. Philosophy of Science 2007; 74: 981-95.
  • 88 Eberhardt F. Almost Optimal Intervention Sets for Causal Discovery. Proceedings of 24th Conference in Uncertainty in Artificial Intelligence (UAI-2008) 2008; : 161-8.
  • 89 Eberhardt F. Causal Discovery as a Game. Journal of Machine Learning Research, Workshop and Conference Proceedings (NIPS 2008 causality workshop) 2010; 06: 87-96.
  • 90 Peters J, Janzing D, Schölkopf B. Identifying Cause and Effect on Discrete Data using Additive Noise Models. Journal of Machine Learning Research, Workshop and Conference Proceedings (AISTATS 2010) 2010; 09: 597-604.
  • 91 Daniusis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K. et al. Inferring deterministic causal relations. Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI2010) 2010; : 143-50.
  • 92 Hoyer PO, Janzing D, Mooij J, Peters J, Schölkopf B. Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems 2009; 21: 689-96.
  • 93 Janzing D, Sun X, Schölkopf B. Distinguishing Cause and Effect via Second Order Exponential Models.: arXiv:0910.5561v1 [stat.ML]; 2009
  • 94 Zhang K, Hyvärinen A. Distinguishing causes from effects using nonlinear acyclic causal models. Journal of Machine Learning Research, Workshop and Conference Proceedings (NIPS 2008 causality workshop) 2008; 06: 157-64.