RSS-Feed abonnieren
DOI: 10.1055/s-0038-1638651
Clinical Data Mining: a Review
Summary
Objective Clinical data mining is the application of data mining techniques using clinical data. We review the literature in order to provide a general overview by identifying the status-of-practice and the challenges ahead.
Methods The nine data mining steps proposed by Fayyad in 1996 [4] were used as the main themes of the review. MEDLINE was used as primary source and 84 papers were retained based on our inclusion criteria.
Results Clinical data mining has three objectives: understanding the clinical data, assist healthcare professionals, and develop a data analysis methodology suitable for medical data. Classification is the most frequently used data mining function with a predominance of the implementation of Bayesian classifiers, neural networks, and SVMs (Support Vector Machines). A myriad of quantitative performance measures were proposed with a predominance of accuracy, sensitivity, specificity, and ROC curves. The latter are usually associated with qualitative evaluation.
Conclusion Clinical data mining respects its commitment to extracting new and previously unknown knowledge from clinical databases. More efforts are still needed to obtain a wider acceptance from the healthcare professionals and for generalization of the knowledge and reproducibility of its extraction process: better description of variables, systematic report of algorithm parameters including the method to obtain them, use of easy-to-understand models and comparisons of the efficiency of clinical data mining with traditional statistical analyses. More and more data will be available for data miners and they have to develop new methodologies and infrastructures to analyze the increasingly complex medical data.
#
#
-
References
- 1 Chen R, Mongkolwat P, Channin DS. RadMonitor: radiology operations data mining in real time. J Digit Imaging 2008; 21: 257-68.
- 2 Obenshain MK. Application of data mining techniques to healthcare data. Infect Control Hosp Epidemiol 2004; 25: 690-5.
- 3 Zhu X. Semi-Supervised Learning Literature Survey. University of Wisconsin-Madison. 2007
- 4 Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press; 1996: 82-8.
- 5 Huang Y, McCullagh P, Black N, Harper R. Feature selection and classification model construction on type 2 diabetic patients’ data. Artif Intell Med 2007; 41: 251-62.
- 6 Olson DL, Delen D. Advanced data mining techniques. Springer; 2008
- 7 Holena M, Sochorova A, Zvarova J. Increasing the diversity of medical data mining through distributed object technology. Stud Health Technol Inform 1999; 68: 442-7.
- 8 Smyth P. Data mining: data analysis on a grand scale. In: Statistical Methods in Medical Research. 2000; 309-327.
- 9 Patel JL, Goyal RK. Applications of artificial neural networks in medical science. Curr Clin Pharmacol 2007; 02: 217-26.
- 10 Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 128-44.
- 11 Zhou L, Hripcsak G. Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J Biomed Inform 2007; 40: 183-202.
- 12 Stacey M, McGregor C. Temporal abstraction in intelligent clinical data analysis: A survey. Artif Intell Med 2007; 39 (01) 1-24.
- 13 Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006; 98: 311-3.
- 14 Zupan B, Demsar J. Open-Source Tools for Data Mining. Clinics in Laboratory Medicine 2008; 28 (01) 37-54.
- 15 Bellazzi R, Zupan B. Predictive data mining in clinical medicine: Current issues and guidelines. Int J Med Inform 2008; 77 (02) 81-97.
- 16 Bayat S, Cuggia M, Kessler M, Briançon S, Le Beux P, Frimat L. Modelling access to renal transplantation waiting list in a French healthcare network using a Bayesian method. Stud HealthTechnol Inform 2008; 136: 605-10.
- 17 Garcia-Gomez JM, Vidal C, Marti-Bonmati L, Galant J, Sans N, Robles M. et al. Benign/malignant classifier of soft tissue tumors using MR imaging. MAGMA 2004; 16: 194-201.
- 18 Juhola M, Laurikkala J. On distance computation in space of mixed-type variables in medical data mining. Stud Health Technol Inform 2002; 90: 425-30.
- 19 Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2007; 37: 296-304.
- 20 Grant A, Moshyk A, Diab H, Caron P, de Lorenzi F, Bisson G. et al. Integrating feedback from a clinical data warehouse into practice organisation. Int J Med Inform 2006; 75: 232-9.
- 21 Klimov D, Shahar Y. A framework for intelligent visualization of multiple time-oriented medical records. AMIA Annu Symp Proc 2005; : 405-9.
- 22 Atzmueller M. Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. In: Proc. 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 2005; 647-52.
- 23 Kwasnicka H, Katejan S. Discovery of association rules from medical data classical and evolutionary approaches. In: XXI Autumn Meeting of Polish Information Processing Society. 2005; 163-77.
- 24 Li J, Fu AW, Fahey P. Efficient discovery of risk patterns in medical data. Artif Intell Med 2009; 45 (01) 77-89.
- 25 Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comput Biol Med 2006; 36: 1351-77.
- 26 Richards G, Rayward-Smith V, Sonksen P. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22: 215-31.
- 27 Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med 1999; 16: 3-23.
- 28 Juhola M. On machine learning classification of otoneurological data. Stud Health Technol Inform 2008; 136: 211-6.
- 29 Ramoni M, Sebastiani P. Robust Bayes classifiers. Artificial Intelligence 2001; 125 1-2 209-26.
- 30 Goodwin LK, Prather JC. Protecting patient privacy in clinical data mining. J Healthc Inf Manag 2002; 16: 62-67.
- 31 Jannin P, Morandi X. Surgical models for computer-assisted neurosurgery. Neuroimage 2007; 37: 783-91.
- 32 Rao BR, Sandilya S, Niculescu R, Germond C, Goel A. Mining time-dependent patient outcomes from hospital patient records. Proc AMIA Symp 2002; : 632-6.
- 33 Rost TB, Edsberg O, Grimsmo A, Nytro O. Comparing medical code usage with the compressionbased dissimilarity measure. Stud Health Technol Inform 2007; 129: 684-8.
- 34 Spangler WE, May JH, Strum DP, Vargas LG. A data mining approach to characterizing medical code usage patterns. J Med Syst 2002; 26: 255-75.
- 35 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37: 120-7.
- 36 Goldstein I, Arzrumtsyan A, Uzuner O. Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: AMIA Annu Symp Proc 2007; 279-83.
- 37 Shortliffe EH, Davis R, Axline SG, Buchanan BG, Green CC, Cohen SN. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Comput Biomed Res 1975; 08 (04) 303-20.
- 38 Miller RA, Pople HE, Myers JD. INTERNIST-1: An experimental computer-based diagnostic consultant for general internal medecine. N Engl J Med 1982; 307: 468-76.
- 39 Antonie M, Zaïane O, Coman A. Application of data mining techniques for medical image classification. In: Proceedings of Second International Workshop on Multimedia Data Mining (MDM/ KDD’2001); 2001; 94-101.
- 40 Bohm N, Wales L, Dunckley M, Morgan R, Loftus I, Thompson M. Objective risk-scoring systems for repair of abdominal aortic aneurysms: applicability in endovascular repair?. Eur J Vasc Endovasc Surg 2008; 36: 172-7.
- 41 Daemen A, Gevaert O, De Moor B. Integration of clinical and microarray data with kernel methods. Conf Proc IEEE Eng Med Biol Soc 2007; 5411-5.
- 42 Dahlstrom O, Timpka T, Hass U, Skogh T, Thyberg I. A simple method for heuristic modeling of expert knowledge in chronic disease: identification of prognostic subgroups in rheumatology. Stud Health Technol Inform 2008; 136: 157-62.
- 43 Gellerstedt M, Glymour C, Madigan D, Pregibon D, Smyth P. Statistical inference and data mining. Communications of ACM 1996; 39 (11) 35-41.
- 44 Goletsis Y, Papaloukas C, Fotiadis DI, Likas A, Michalis LK. Automated ischemic beat classification using genetic algorithms and multicriteria decision analysis. IEEE Trans Biomed Eng 2004; 51: 1717-25.
- 45 Jesneck JL, Nolte LW, Baker JA, Floyd CE, Lo JY. Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis. Med Phys 2006; 33: 2945-54.
- 46 Pakhomov SV, Buntrock J, Chute CG. Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier. J Biomed Inform 2005; 38: 145-53.
- 47 Varpa K, Iltanen K, Juhola M. Machine learning method for knowledge discovery experimented with otoneurological data. Comput Methods Programs Biomed 2008; 91: 154-64.
- 48 Lin JH, Haug PJ. Data preparation framework for preprocessing clinical data in data mining. AMIA Annu Symp Proc 2006; : 489-93.
- 49 Alvarez SM, Poelstra BA, Burd RS. Evaluation of a Bayesian decision network for diagnosing pyloric stenosis. J Pediatr Surg 2000; 41: 155-61.
- 50 Cohen G, Hilario M, Sax H, Hugo S, Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006; 37: 7-18.
- 51 Bellazzi R, Larizza C, Magni P, Bellazzi R. Temporal data mining for the quality assessment of hemodialysis services. Artif Intell Med 2005; 34 (01) 25-39.
- 52 Nannings B, Bosman RJ, Abu-Hanna A. A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. Methods Inf Med 2008; 47 (06) 480-8.
- 53 Jalloh OB, Waitman LR. Improving Computerized Provider Order Entry (CPOE) usability by data mining users’queries from access logs. AMIA Annu Symp Proc 2006; : 379-83.
- 54 Korhonen M, Salo S, Suni J, Larmas M. Computed online determination of life-long mean index values for carious, extracted, and/or filled permanent teeth. Acta Odontol Scand 2007; 65: 214-8.
- 55 Nguyen A, Moore D, McCowan I, Courage MJ. Multi-class classification of cancer stages from freetext histology reports using support vector machines. Conf Proc IEEE Eng Med Biol Soc 2007; 5140-3.
- 56 Spat S, Cadonna B, Rakovac I, Gütl C, Leitner H, Stark G. et al. Enhanced information retrieval from narrative German-language clinical text documents using automated document classification. Stud Health Technol Inform 2008; 136: 473-8.
- 57 Szarvas G, Farkas R, Busa-Fekete R. State-of-theart anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc 2007; 14: 574-80.
- 58 Harper PR. A review and comparison of classification algorithms for medical decision making. Health Policy 2005; 71: 315-31.
- 59 Tusch G, Bretl CE, Connor M, Das A. SPOT Towards Temporal Data Mining in Medicine and Bioinformatics. In: AMIA Annu Symp Proc 2008; 1157.
- 60 Raj R, O’Connor MJ, Das AK. An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research. AMIA Annu Symp Proc 2007; : 614-9.
- 61 Lin JH, Haug PJ. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J Biomed Inform 2008; 41: 1-14.
- 62 Huang K, Yang H, King I, Lyu MR. Maximizing sensitivity in medical diagnosis using biased minimax probability machine. IEEE Trans Biomed Eng 2006; 53: 821-31.
- 63 Barnes J, Chambers I, Piper I, Citerio G, Contant C, Enblad P. et al. Accurate data collection for head injury monitoring studies: a data validation methodology. Acta Neurochir Suppl 2005; 95: 39-41.
- 64 Le Duff F, Happe A, Burgun A, Levionnois S, Bremond M, Le Beux P. Sharing medical data for patient path analysis with data mining method. Stud Health Technol Inform 2001; 84: 1364-8.
- 65 Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS. et al. Grand challenges in clinical decision support. J Biomed Inform 2008; 41: 387-92.
- 66 Pakhomov SV, Hanson PL, Bjornsen SS, Smith SA. Automatic classification of foot examination findings using clinical notes and machine learning. J Am Med Inform Assoc 2008; 15: 198-202.
- 67 Cios KJ, Moore GW. Uniqueness of medical data mining. Artif Intell Med 2002; 26: 1-24.
- 68 Depeursinge A, Iavindrasana J, Hidki A, Cohen G, Geissbuhler A, Platon A. et al. Comparative Performance Analysis of State-of-the-Art Classification Algorithms Applied to Lung Tissue Categorization. J Digit Imaging. 2008
- 69 Iavindrasana J, Cohen G, Depeursinge A, Meyer R, Geissbuhler A. Minimal Set of Attributes Required to Report Hospital-Acquired Infection Cases. In: IDAMAP. 2008
- 70 Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 2008; 15: 25-8.
- 71 Suckling J, Parker J, Dance DR, Astley S, Hutt I, Boggis CR. et al. The Mammographic Image Analysis Society digital mammogram database. In: Exerpta Medica. International Congress; 1994; 375-8.
- 72 Bath PA. Data mining in health and medical information. Annual Review of Information Science and Technology 2004; 38 (01) 331-69.
- 73 Autio L, Juhola M, Laurikkala J. On the neural network classification of medical data and an endeavour to balance non-uniform data sets with artificial data extension. Comput Biol Med 2007; 37: 388-97.
- 74 Berman JJ. Confidentiality issues for medical data miners. Artif Intell Med 2002; 26: 25-36.
- 75 Awaya T, Ohtaki K, Yamada T, Yamamoto K, Miyoshi T, Itagaki Y. et al. Automation in drug inventory management saves personnel time and budget. Yakugaku Zasshi 2005; 125: 427-32.
- 76 Bernstein SL, Whitaker D, Winograd J, Brennan JA. An electronic chart prompt to decrease proprietary antibiotic prescription to self-pay patients. Acad Emerg Med 2005; 12: 225-31.
- 77 Brenneman SK, Lacroix AZ, Buist DS, Chen YT, Abbott TA. Evaluation of decision rules to identify postmenopausal women for intervention related to osteoporosis. Dis Manag 2003; 06: 159-68.
- 78 Bilska-Wolak AO, Floyd CE. Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer. Phys Med Biol 2004; 49: 4219-37.
- 79 Berner ES, Moss J. Informatics challenges for the impending patient information explosion. J Am Med Inform Assoc 2005; 12: 614-7.
- 80 Fayyad U, Piatetsky-shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Magazine 1996; 17: 37-54.
- 81 Bilska-Wolak AO, Floyd CE, Lo JY, Baker JA. Computer aid for decision to biopsy breast masses on mammography: validation on new cases. Acad Radiol 2005; 12: 671-80.
- 82 Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc 2006; : 399-403.
- 83 Aronsky D, Kasworm E, Jacobson JA, Haug PJ, Dean NC. Electronic screening of dictated reports to identify patients with do-not-resuscitate status. J Am Med Inform Assoc 2004; 11: 403-9.
- 84 Richardson M, Domingos P. Learning with knowledge from multiple experts. In: In ICML 20. 2003; 624-31.
- 85 Cohen G, Sax H, Geissbuhler A. Novelty detection using one-class Parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inform 2008; 136: 21-6.
- 86 Jakkula V, Cook DJ. Anomaly detection using temporal data mining in a smart home environment. Methods Inf Med 2008; 47 (01) 70-5.
- 87 Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. 2005
- 88 Bennett KP, Blue JA. A Support Vector Machine Approach to Decision Trees. In: Department of Mathematical Sciences Math Report No. 97-100, Rensselaer Polytechnic Institute. 1997; 2396-401.
- 89 Berner ES, Maisiak RS, Heuderbert GR, Young KR. Clinician performance and prominence of diagnoses displayed by a clinical diagnostic decision support system. AMIA Annu Symp Proc 2003; : 76-80.
- 90 Hyun S, Bakken S, Johnson SB. Markup of temporal information in electronic health records. Stud Health Technol Inform 2006; 122: 907-8.
- 91 Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 1998; 10 (07) 1895-923.
- 92 Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 2006; 07: 1-30.
- 93 Jin HW, Chen J, He H, Williams GJ, Kelman C, O’Keefe CM. Mining unexpected temporal associations: applications in detecting adverse drug reactions. IEEE Trans Inf Technol Biomed 2008; 12: 488-500.
- 94 Mitchell DR, Mitchell JA. Status of clinical gene sequencing data reporting and associated risks for information loss. J Biomed Inform 2007; 40 (01) 47-54.
- 95 Pierson JM, Gossa J, Wehrle P, Cardenas Y, Cahon S, El Samad M. et al. GGM: efficient navigation and mining in distributed genomedical data. IEEE Trans Nanobioscience 2007; 06: 110-6.
- 96 McSherry D. Dynamic and static approaches to clinical data mining. Artif Intell Med 1999; 16: 97-115.
- 97 Lee IN, Liao SC, Embrechts M. Data mining techniques applied to medical information. Med Inform Internet Med 2000; 25: 81-102.
- 98 Imai T, Aramaki E, Kajino M, Miyo K, Onogi Y, Ohe K. Finding malignant findings from radiological reports using medical attributes and syntactic information. Stud Health Technol Inform 2007; 129: 540-4.
- 99 Sauleau EA, Paumier JP, Buemi A. Medical record linkage in health information systems by approximate string matching and clustering. BMC Med Inform Decis Mak 2005; 05: 32.
- 100 Ordonez C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed 2006; 10: 334-43.
- 101 Roddick JF, Fule P, Graco WJ. Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explorations 2003; 05 (01) 94-99.
Correspondence to
-
References
- 1 Chen R, Mongkolwat P, Channin DS. RadMonitor: radiology operations data mining in real time. J Digit Imaging 2008; 21: 257-68.
- 2 Obenshain MK. Application of data mining techniques to healthcare data. Infect Control Hosp Epidemiol 2004; 25: 690-5.
- 3 Zhu X. Semi-Supervised Learning Literature Survey. University of Wisconsin-Madison. 2007
- 4 Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press; 1996: 82-8.
- 5 Huang Y, McCullagh P, Black N, Harper R. Feature selection and classification model construction on type 2 diabetic patients’ data. Artif Intell Med 2007; 41: 251-62.
- 6 Olson DL, Delen D. Advanced data mining techniques. Springer; 2008
- 7 Holena M, Sochorova A, Zvarova J. Increasing the diversity of medical data mining through distributed object technology. Stud Health Technol Inform 1999; 68: 442-7.
- 8 Smyth P. Data mining: data analysis on a grand scale. In: Statistical Methods in Medical Research. 2000; 309-327.
- 9 Patel JL, Goyal RK. Applications of artificial neural networks in medical science. Curr Clin Pharmacol 2007; 02: 217-26.
- 10 Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 128-44.
- 11 Zhou L, Hripcsak G. Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J Biomed Inform 2007; 40: 183-202.
- 12 Stacey M, McGregor C. Temporal abstraction in intelligent clinical data analysis: A survey. Artif Intell Med 2007; 39 (01) 1-24.
- 13 Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006; 98: 311-3.
- 14 Zupan B, Demsar J. Open-Source Tools for Data Mining. Clinics in Laboratory Medicine 2008; 28 (01) 37-54.
- 15 Bellazzi R, Zupan B. Predictive data mining in clinical medicine: Current issues and guidelines. Int J Med Inform 2008; 77 (02) 81-97.
- 16 Bayat S, Cuggia M, Kessler M, Briançon S, Le Beux P, Frimat L. Modelling access to renal transplantation waiting list in a French healthcare network using a Bayesian method. Stud HealthTechnol Inform 2008; 136: 605-10.
- 17 Garcia-Gomez JM, Vidal C, Marti-Bonmati L, Galant J, Sans N, Robles M. et al. Benign/malignant classifier of soft tissue tumors using MR imaging. MAGMA 2004; 16: 194-201.
- 18 Juhola M, Laurikkala J. On distance computation in space of mixed-type variables in medical data mining. Stud Health Technol Inform 2002; 90: 425-30.
- 19 Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2007; 37: 296-304.
- 20 Grant A, Moshyk A, Diab H, Caron P, de Lorenzi F, Bisson G. et al. Integrating feedback from a clinical data warehouse into practice organisation. Int J Med Inform 2006; 75: 232-9.
- 21 Klimov D, Shahar Y. A framework for intelligent visualization of multiple time-oriented medical records. AMIA Annu Symp Proc 2005; : 405-9.
- 22 Atzmueller M. Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. In: Proc. 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 2005; 647-52.
- 23 Kwasnicka H, Katejan S. Discovery of association rules from medical data classical and evolutionary approaches. In: XXI Autumn Meeting of Polish Information Processing Society. 2005; 163-77.
- 24 Li J, Fu AW, Fahey P. Efficient discovery of risk patterns in medical data. Artif Intell Med 2009; 45 (01) 77-89.
- 25 Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comput Biol Med 2006; 36: 1351-77.
- 26 Richards G, Rayward-Smith V, Sonksen P. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22: 215-31.
- 27 Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med 1999; 16: 3-23.
- 28 Juhola M. On machine learning classification of otoneurological data. Stud Health Technol Inform 2008; 136: 211-6.
- 29 Ramoni M, Sebastiani P. Robust Bayes classifiers. Artificial Intelligence 2001; 125 1-2 209-26.
- 30 Goodwin LK, Prather JC. Protecting patient privacy in clinical data mining. J Healthc Inf Manag 2002; 16: 62-67.
- 31 Jannin P, Morandi X. Surgical models for computer-assisted neurosurgery. Neuroimage 2007; 37: 783-91.
- 32 Rao BR, Sandilya S, Niculescu R, Germond C, Goel A. Mining time-dependent patient outcomes from hospital patient records. Proc AMIA Symp 2002; : 632-6.
- 33 Rost TB, Edsberg O, Grimsmo A, Nytro O. Comparing medical code usage with the compressionbased dissimilarity measure. Stud Health Technol Inform 2007; 129: 684-8.
- 34 Spangler WE, May JH, Strum DP, Vargas LG. A data mining approach to characterizing medical code usage patterns. J Med Syst 2002; 26: 255-75.
- 35 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37: 120-7.
- 36 Goldstein I, Arzrumtsyan A, Uzuner O. Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: AMIA Annu Symp Proc 2007; 279-83.
- 37 Shortliffe EH, Davis R, Axline SG, Buchanan BG, Green CC, Cohen SN. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Comput Biomed Res 1975; 08 (04) 303-20.
- 38 Miller RA, Pople HE, Myers JD. INTERNIST-1: An experimental computer-based diagnostic consultant for general internal medecine. N Engl J Med 1982; 307: 468-76.
- 39 Antonie M, Zaïane O, Coman A. Application of data mining techniques for medical image classification. In: Proceedings of Second International Workshop on Multimedia Data Mining (MDM/ KDD’2001); 2001; 94-101.
- 40 Bohm N, Wales L, Dunckley M, Morgan R, Loftus I, Thompson M. Objective risk-scoring systems for repair of abdominal aortic aneurysms: applicability in endovascular repair?. Eur J Vasc Endovasc Surg 2008; 36: 172-7.
- 41 Daemen A, Gevaert O, De Moor B. Integration of clinical and microarray data with kernel methods. Conf Proc IEEE Eng Med Biol Soc 2007; 5411-5.
- 42 Dahlstrom O, Timpka T, Hass U, Skogh T, Thyberg I. A simple method for heuristic modeling of expert knowledge in chronic disease: identification of prognostic subgroups in rheumatology. Stud Health Technol Inform 2008; 136: 157-62.
- 43 Gellerstedt M, Glymour C, Madigan D, Pregibon D, Smyth P. Statistical inference and data mining. Communications of ACM 1996; 39 (11) 35-41.
- 44 Goletsis Y, Papaloukas C, Fotiadis DI, Likas A, Michalis LK. Automated ischemic beat classification using genetic algorithms and multicriteria decision analysis. IEEE Trans Biomed Eng 2004; 51: 1717-25.
- 45 Jesneck JL, Nolte LW, Baker JA, Floyd CE, Lo JY. Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis. Med Phys 2006; 33: 2945-54.
- 46 Pakhomov SV, Buntrock J, Chute CG. Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier. J Biomed Inform 2005; 38: 145-53.
- 47 Varpa K, Iltanen K, Juhola M. Machine learning method for knowledge discovery experimented with otoneurological data. Comput Methods Programs Biomed 2008; 91: 154-64.
- 48 Lin JH, Haug PJ. Data preparation framework for preprocessing clinical data in data mining. AMIA Annu Symp Proc 2006; : 489-93.
- 49 Alvarez SM, Poelstra BA, Burd RS. Evaluation of a Bayesian decision network for diagnosing pyloric stenosis. J Pediatr Surg 2000; 41: 155-61.
- 50 Cohen G, Hilario M, Sax H, Hugo S, Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006; 37: 7-18.
- 51 Bellazzi R, Larizza C, Magni P, Bellazzi R. Temporal data mining for the quality assessment of hemodialysis services. Artif Intell Med 2005; 34 (01) 25-39.
- 52 Nannings B, Bosman RJ, Abu-Hanna A. A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. Methods Inf Med 2008; 47 (06) 480-8.
- 53 Jalloh OB, Waitman LR. Improving Computerized Provider Order Entry (CPOE) usability by data mining users’queries from access logs. AMIA Annu Symp Proc 2006; : 379-83.
- 54 Korhonen M, Salo S, Suni J, Larmas M. Computed online determination of life-long mean index values for carious, extracted, and/or filled permanent teeth. Acta Odontol Scand 2007; 65: 214-8.
- 55 Nguyen A, Moore D, McCowan I, Courage MJ. Multi-class classification of cancer stages from freetext histology reports using support vector machines. Conf Proc IEEE Eng Med Biol Soc 2007; 5140-3.
- 56 Spat S, Cadonna B, Rakovac I, Gütl C, Leitner H, Stark G. et al. Enhanced information retrieval from narrative German-language clinical text documents using automated document classification. Stud Health Technol Inform 2008; 136: 473-8.
- 57 Szarvas G, Farkas R, Busa-Fekete R. State-of-theart anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc 2007; 14: 574-80.
- 58 Harper PR. A review and comparison of classification algorithms for medical decision making. Health Policy 2005; 71: 315-31.
- 59 Tusch G, Bretl CE, Connor M, Das A. SPOT Towards Temporal Data Mining in Medicine and Bioinformatics. In: AMIA Annu Symp Proc 2008; 1157.
- 60 Raj R, O’Connor MJ, Das AK. An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research. AMIA Annu Symp Proc 2007; : 614-9.
- 61 Lin JH, Haug PJ. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J Biomed Inform 2008; 41: 1-14.
- 62 Huang K, Yang H, King I, Lyu MR. Maximizing sensitivity in medical diagnosis using biased minimax probability machine. IEEE Trans Biomed Eng 2006; 53: 821-31.
- 63 Barnes J, Chambers I, Piper I, Citerio G, Contant C, Enblad P. et al. Accurate data collection for head injury monitoring studies: a data validation methodology. Acta Neurochir Suppl 2005; 95: 39-41.
- 64 Le Duff F, Happe A, Burgun A, Levionnois S, Bremond M, Le Beux P. Sharing medical data for patient path analysis with data mining method. Stud Health Technol Inform 2001; 84: 1364-8.
- 65 Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS. et al. Grand challenges in clinical decision support. J Biomed Inform 2008; 41: 387-92.
- 66 Pakhomov SV, Hanson PL, Bjornsen SS, Smith SA. Automatic classification of foot examination findings using clinical notes and machine learning. J Am Med Inform Assoc 2008; 15: 198-202.
- 67 Cios KJ, Moore GW. Uniqueness of medical data mining. Artif Intell Med 2002; 26: 1-24.
- 68 Depeursinge A, Iavindrasana J, Hidki A, Cohen G, Geissbuhler A, Platon A. et al. Comparative Performance Analysis of State-of-the-Art Classification Algorithms Applied to Lung Tissue Categorization. J Digit Imaging. 2008
- 69 Iavindrasana J, Cohen G, Depeursinge A, Meyer R, Geissbuhler A. Minimal Set of Attributes Required to Report Hospital-Acquired Infection Cases. In: IDAMAP. 2008
- 70 Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 2008; 15: 25-8.
- 71 Suckling J, Parker J, Dance DR, Astley S, Hutt I, Boggis CR. et al. The Mammographic Image Analysis Society digital mammogram database. In: Exerpta Medica. International Congress; 1994; 375-8.
- 72 Bath PA. Data mining in health and medical information. Annual Review of Information Science and Technology 2004; 38 (01) 331-69.
- 73 Autio L, Juhola M, Laurikkala J. On the neural network classification of medical data and an endeavour to balance non-uniform data sets with artificial data extension. Comput Biol Med 2007; 37: 388-97.
- 74 Berman JJ. Confidentiality issues for medical data miners. Artif Intell Med 2002; 26: 25-36.
- 75 Awaya T, Ohtaki K, Yamada T, Yamamoto K, Miyoshi T, Itagaki Y. et al. Automation in drug inventory management saves personnel time and budget. Yakugaku Zasshi 2005; 125: 427-32.
- 76 Bernstein SL, Whitaker D, Winograd J, Brennan JA. An electronic chart prompt to decrease proprietary antibiotic prescription to self-pay patients. Acad Emerg Med 2005; 12: 225-31.
- 77 Brenneman SK, Lacroix AZ, Buist DS, Chen YT, Abbott TA. Evaluation of decision rules to identify postmenopausal women for intervention related to osteoporosis. Dis Manag 2003; 06: 159-68.
- 78 Bilska-Wolak AO, Floyd CE. Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer. Phys Med Biol 2004; 49: 4219-37.
- 79 Berner ES, Moss J. Informatics challenges for the impending patient information explosion. J Am Med Inform Assoc 2005; 12: 614-7.
- 80 Fayyad U, Piatetsky-shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Magazine 1996; 17: 37-54.
- 81 Bilska-Wolak AO, Floyd CE, Lo JY, Baker JA. Computer aid for decision to biopsy breast masses on mammography: validation on new cases. Acad Radiol 2005; 12: 671-80.
- 82 Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc 2006; : 399-403.
- 83 Aronsky D, Kasworm E, Jacobson JA, Haug PJ, Dean NC. Electronic screening of dictated reports to identify patients with do-not-resuscitate status. J Am Med Inform Assoc 2004; 11: 403-9.
- 84 Richardson M, Domingos P. Learning with knowledge from multiple experts. In: In ICML 20. 2003; 624-31.
- 85 Cohen G, Sax H, Geissbuhler A. Novelty detection using one-class Parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inform 2008; 136: 21-6.
- 86 Jakkula V, Cook DJ. Anomaly detection using temporal data mining in a smart home environment. Methods Inf Med 2008; 47 (01) 70-5.
- 87 Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. 2005
- 88 Bennett KP, Blue JA. A Support Vector Machine Approach to Decision Trees. In: Department of Mathematical Sciences Math Report No. 97-100, Rensselaer Polytechnic Institute. 1997; 2396-401.
- 89 Berner ES, Maisiak RS, Heuderbert GR, Young KR. Clinician performance and prominence of diagnoses displayed by a clinical diagnostic decision support system. AMIA Annu Symp Proc 2003; : 76-80.
- 90 Hyun S, Bakken S, Johnson SB. Markup of temporal information in electronic health records. Stud Health Technol Inform 2006; 122: 907-8.
- 91 Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 1998; 10 (07) 1895-923.
- 92 Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 2006; 07: 1-30.
- 93 Jin HW, Chen J, He H, Williams GJ, Kelman C, O’Keefe CM. Mining unexpected temporal associations: applications in detecting adverse drug reactions. IEEE Trans Inf Technol Biomed 2008; 12: 488-500.
- 94 Mitchell DR, Mitchell JA. Status of clinical gene sequencing data reporting and associated risks for information loss. J Biomed Inform 2007; 40 (01) 47-54.
- 95 Pierson JM, Gossa J, Wehrle P, Cardenas Y, Cahon S, El Samad M. et al. GGM: efficient navigation and mining in distributed genomedical data. IEEE Trans Nanobioscience 2007; 06: 110-6.
- 96 McSherry D. Dynamic and static approaches to clinical data mining. Artif Intell Med 1999; 16: 97-115.
- 97 Lee IN, Liao SC, Embrechts M. Data mining techniques applied to medical information. Med Inform Internet Med 2000; 25: 81-102.
- 98 Imai T, Aramaki E, Kajino M, Miyo K, Onogi Y, Ohe K. Finding malignant findings from radiological reports using medical attributes and syntactic information. Stud Health Technol Inform 2007; 129: 540-4.
- 99 Sauleau EA, Paumier JP, Buemi A. Medical record linkage in health information systems by approximate string matching and clustering. BMC Med Inform Decis Mak 2005; 05: 32.
- 100 Ordonez C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed 2006; 10: 334-43.
- 101 Roddick JF, Fule P, Graco WJ. Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explorations 2003; 05 (01) 94-99.