Subscribe to RSS
DOI: 10.1055/s-0038-1625129
Biomedical Data Mining
Publication History
Publication Date:
22 January 2018 (online)
Summary
Objective: To introduce the special topic of Methods of Information in Medicine on data mining in biomedicine, with selected papers from two workshops on Intelligent Data Analysis in bioMedicine (IDAMAP) held in Verona (2006) and Amsterdam (2007).
Methods: Defining the field of biomedical data mining. Characterizing current developments and challenges for researchers in the field. Reporting on current and future activities of IMIA’s working group on Intelligent Data Analysis and Data Mining. Describing the content of the selected papers in this special topic.
Results and Conclusions: In the biomedical field, data mining methods are used to develop clinical diagnostic and prognostic systems, to interpret biomedical signal and image data, to discover knowledge from biological and clinical databases, and in biosurveillance and anomaly detection applications. The main challenges for the field are i) dealing with very large search spaces in a both computationally efficient and statistically valid manner, ii) incorporating and utilizing medical and biological background knowledge in the data analysis process, iii) reasoning with time-oriented data and temporal abstraction, and iv) developing end-user tools for interactive presentation, interpretation, and analysis of large datasets.
-
References
- 1 Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process for extracting useful knowledge from volumes of data. Commun ACM 1996; 39 (11) 27-34.
- 2 Hand DJ, Mannila H, Smyth P. Principles of Data Mining. Cambridge, Massachusetts: MIT Press; 2001
- 3 Giudici P. Applied Data Mining Statistical Methods for Business and Industry. London: John Wiley & Sons; 2003
- 4 Han J, Kamber M. Data Mining. Concepts and Techniques. San Francisco, California: Morgan Kaufmann Publishers; 2006
- 5 Abu-Hanna A, Lucas PJ. Prognostic models in medicine: AI and statistical approaches. Methods Inf Med 2001; 40 (01) 1-5.
- 6 Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 2008; 77 (02) 81-97.
- 7 Lavrac N, Kononenko I, Keravnou E, Kukar M, Zupan B. Intelligent data analysis for medical diagnosis: using machine learning and temporal abstraction. AI Commun 1998; 11: 191-218.
- 8 Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 2001; 23: 89-109.
- 9 Sakai S, Kobayashi K, Nakamura J, Toyabe S, Akazawa K. Accuracy in the diagnostic prediction of acute appendicitis based on the Bayesian network model. Methods Inf Med 2007; 46 (06) 723-726.
- 10 Pfaff M, Weller K, Woetzel D, Guthke R, Schroeder K, Stein G, Pohlmeier R, Vienken J. Prediction of cardiovascular risk in hemodialysis patients by data mining. Methods Inf Med 2004; 43 (01) 106-113.
- 11 Tjortjis C, Saraee M, Theodoulidis B, Keane JA. Using T3, an improved decision tree classifier, for mining stroke-related medical data. Methods Inf Med 2007; 46 (05) 523-529.
- 12 Verduijn M, Peek N, de Keizer NF, van Lieshout EJ, de Pont AC, Schultz MJ, de Jonge E, de Mol BA. Individual and joint expert judgments as reference standards in artifact detection. J Am Med Inform Assoc 2008; 15 (02) 227-234.
- 13 Jakkula V, Cook DJ. Anomaly detection using temporal data mining in a smart home environment. Methods Inf Med 2008; 47 (01) 70-75.
- 14 Nannings B, Bosman RJ, Abu-Hanna A. A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. Methods Inf Med 2008; 47 (06) 480-488.
- 15 Lessmann B, Nattkemper TW, Hans VH, Degen-hard A. A method for linking computed image features to histological semantics in neuropathology. J Biomed Inform 2007; 40 (06) 631-641.
- 16 www.who.int/whosis/icd10. Last accessed Mar 3, 2009
- 17 Hernán MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health 2004; 58: 265-271.
- 18 Barker G, Batley J, O’Sullivan H, Edwards KJ, Edwards D. Redundancy based detection of sequence polymorphisms in expressed sequence tag data using auto SNP. Bioinformatics 2003; 19 (03) 421-422.
- 19 Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol 2000; 7 3–4 601-620.
- 20 Lobley A, Swindells MB, Orengo CA, Jones DT. Inferring function using patterns of native disorder in proteins. PLoS Comput Biol 2007; 3 (08) e162.
- 21 Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 2003; 19 (01) i84-i90.
- 22 Jelier R, Schuemie MJ, Roes PJ, van Mulligen EM, Kors JA. Literature-based concept profiles for gene annotation: the issue of weighting. Int J Med Inform 2008; 77 (05) 354-362.
- 23 Steele E, Tucker A. Consensus and meta-analysis regulatory networks for combining multiple micro-array gene expression datasets. J Biomed Inform 2008; 41 (06) 914-926.
- 24 Husmeier D, Werhli AV. Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. In: Markstein P, Xu Y. editors. Computational Systems Bioinformatics, Volume 6: Proceedings of the CSB 2007 Conference. World Scientific; 2007. pp 85-95.
- 25 Kitano H. Computational systems biology. Nature 2002; 420: 206-210.
- 26 Zhang M. Interactive analysis of systems biology molecular expression data. BMC Systems Biol 2008; 2: 2-23.
- 27 Witten IH, Frank E. Data Mining. Practical Machine Learning Tools and Techniques. San Franciso, California: Morgan Kaufmann Publishers; 2005
- 28 http://www.ailab.si/orange. Last accessed Mar 3, 2009
- 29 Zupan B, Holmes JH, Bellazzi R. Knowledge-based data analysis and interpretation. Artif Intell Med 2006; 37 (03) 163-165.
- 30 Shahar Y. Dimensions of time in illness: an objective view. Ann Intern Med 2000; 132 (01) 45-53.
- 31 Combi C, Shahar Y. Temporal reasoning and temporal data maintenance in medicine: issues and challenges. Comput Biol Med 1997; 27 (05) 353-368.
- 32 Adlassnig KP, Combi C, Das AK, Keravnou ET, Pozzi G. Temporal representation and reasoning in medicine: Research directions and challenges. Artif Intell Med 2006; 38 (02) 101-113.
- 33 Stacey M, McGregor C. Temporal abstraction in intelligent clinical data analysis: a survey. Artif Intell Med 2007; 39 (01) 1-24.
- 34 http://magix.fri.uni-lj.si/idadm. Last accessed Mar 3, 2009
- 35 http://www.idamap.org. Last accessed Mar 3, 2009
- 36 http://www.amia.org/mbrcenter/wg/kddm. Last accessed Mar 3, 2009
- 37 Curk T, Petrovic U, Shaulsky G, Zupan B. Rule-based clustering for gene promoter structure discovery. Methods Inf Med 2009; 48: 229-235.
- 38 Bielza C, Robles V, Larrañaga P. Estimation of distribution algorithms as logistic regression regularizers of microarray classifiers. Methods Inf Med 2009; 48: 236-241.
- 39 Andreassen S, Zalounina A, Leibovici L, Paul M. Learning susceptibility of a pathogen to antibiotics using data from similar pathogens. Methods Inf Med 2009; 48: 242-247.
- 40 Castellani U, Cristani M, Daducci A, Farace P, Marzola P, Murino V, Sbarbati V. DCE-MRI data analysis for cancer area classification. Methods Inf Med 2009; 48: 248-253.
- 41 Klimov D, Shahar Y, Taieb-Maimon M. Intelligent interactive visual exploration of temporal associations among multiple time-oriented patient records. Methods Inf Med 2009; 48: 254-262.