Yearbook of Medical Informatics, Inhaltsverzeichnis Yearb Med Inform 2009; 18(01): 121-133DOI: 10.1055/s-0038-1638651 Original Article Georg Thieme Verlag KG Stuttgart Clinical Data Mining: a Review J. Iavindrasana 1 University and Hospitals of Geneva, Switzerland , G. Cohen 1 University and Hospitals of Geneva, Switzerland , A. Depeursinge 1 University and Hospitals of Geneva, Switzerland , H. Müller 1 University and Hospitals of Geneva, Switzerland , R. Meyer 1 University and Hospitals of Geneva, Switzerland , A. Geissbuhler 1 University and Hospitals of Geneva, Switzerland › Institutsangaben Artikel empfehlen Abstract Volltext als PDF herunterladen Keywords KeywordsMedical records systems - computerized; data mining Referenzen References 1 Chen R, Mongkolwat P, Channin DS. RadMonitor: radiology operations data mining in real time. J Digit Imaging 2008; 21: 257-68. 2 Obenshain MK. Application of data mining techniques to healthcare data. Infect Control Hosp Epidemiol 2004; 25: 690-5. 3 Zhu X. Semi-Supervised Learning Literature Survey. University of Wisconsin-Madison. 2007 4 Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press; 1996: 82-8. 5 Huang Y, McCullagh P, Black N, Harper R. Feature selection and classification model construction on type 2 diabetic patients’ data. Artif Intell Med 2007; 41: 251-62. 6 Olson DL, Delen D. Advanced data mining techniques. Springer; 2008 7 Holena M, Sochorova A, Zvarova J. Increasing the diversity of medical data mining through distributed object technology. Stud Health Technol Inform 1999; 68: 442-7. 8 Smyth P. Data mining: data analysis on a grand scale. In: Statistical Methods in Medical Research. 2000; 309-327. 9 Patel JL, Goyal RK. Applications of artificial neural networks in medical science. Curr Clin Pharmacol 2007; 02: 217-26. 10 Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 128-44. 11 Zhou L, Hripcsak G. Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J Biomed Inform 2007; 40: 183-202. 12 Stacey M, McGregor C. Temporal abstraction in intelligent clinical data analysis: A survey. Artif Intell Med 2007; 39 (01) 1-24. 13 Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006; 98: 311-3. 14 Zupan B, Demsar J. Open-Source Tools for Data Mining. Clinics in Laboratory Medicine 2008; 28 (01) 37-54. 15 Bellazzi R, Zupan B. Predictive data mining in clinical medicine: Current issues and guidelines. Int J Med Inform 2008; 77 (02) 81-97. 16 Bayat S, Cuggia M, Kessler M, Briançon S, Le Beux P, Frimat L. Modelling access to renal transplantation waiting list in a French healthcare network using a Bayesian method. Stud HealthTechnol Inform 2008; 136: 605-10. 17 Garcia-Gomez JM, Vidal C, Marti-Bonmati L, Galant J, Sans N, Robles M. et al. Benign/malignant classifier of soft tissue tumors using MR imaging. MAGMA 2004; 16: 194-201. 18 Juhola M, Laurikkala J. On distance computation in space of mixed-type variables in medical data mining. Stud Health Technol Inform 2002; 90: 425-30. 19 Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2007; 37: 296-304. 20 Grant A, Moshyk A, Diab H, Caron P, de Lorenzi F, Bisson G. et al. Integrating feedback from a clinical data warehouse into practice organisation. Int J Med Inform 2006; 75: 232-9. 21 Klimov D, Shahar Y. A framework for intelligent visualization of multiple time-oriented medical records. AMIA Annu Symp Proc 2005; : 405-9. 22 Atzmueller M. Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. In: Proc. 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 2005; 647-52. 23 Kwasnicka H, Katejan S. Discovery of association rules from medical data classical and evolutionary approaches. In: XXI Autumn Meeting of Polish Information Processing Society. 2005; 163-77. 24 Li J, Fu AW, Fahey P. Efficient discovery of risk patterns in medical data. Artif Intell Med 2009; 45 (01) 77-89. 25 Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comput Biol Med 2006; 36: 1351-77. 26 Richards G, Rayward-Smith V, Sonksen P. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22: 215-31. 27 Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med 1999; 16: 3-23. 28 Juhola M. On machine learning classification of otoneurological data. Stud Health Technol Inform 2008; 136: 211-6. 29 Ramoni M, Sebastiani P. Robust Bayes classifiers. Artificial Intelligence 2001; 125 1-2 209-26. 30 Goodwin LK, Prather JC. Protecting patient privacy in clinical data mining. J Healthc Inf Manag 2002; 16: 62-67. 31 Jannin P, Morandi X. Surgical models for computer-assisted neurosurgery. Neuroimage 2007; 37: 783-91. 32 Rao BR, Sandilya S, Niculescu R, Germond C, Goel A. Mining time-dependent patient outcomes from hospital patient records. Proc AMIA Symp 2002; : 632-6. 33 Rost TB, Edsberg O, Grimsmo A, Nytro O. Comparing medical code usage with the compressionbased dissimilarity measure. Stud Health Technol Inform 2007; 129: 684-8. 34 Spangler WE, May JH, Strum DP, Vargas LG. A data mining approach to characterizing medical code usage patterns. J Med Syst 2002; 26: 255-75. 35 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37: 120-7. 36 Goldstein I, Arzrumtsyan A, Uzuner O. Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: AMIA Annu Symp Proc 2007; 279-83. 37 Shortliffe EH, Davis R, Axline SG, Buchanan BG, Green CC, Cohen SN. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Comput Biomed Res 1975; 08 (04) 303-20. 38 Miller RA, Pople HE, Myers JD. INTERNIST-1: An experimental computer-based diagnostic consultant for general internal medecine. N Engl J Med 1982; 307: 468-76. 39 Antonie M, Zaïane O, Coman A. Application of data mining techniques for medical image classification. In: Proceedings of Second International Workshop on Multimedia Data Mining (MDM/ KDD’2001); 2001; 94-101. 40 Bohm N, Wales L, Dunckley M, Morgan R, Loftus I, Thompson M. Objective risk-scoring systems for repair of abdominal aortic aneurysms: applicability in endovascular repair?. Eur J Vasc Endovasc Surg 2008; 36: 172-7. 41 Daemen A, Gevaert O, De Moor B. Integration of clinical and microarray data with kernel methods. Conf Proc IEEE Eng Med Biol Soc 2007; 5411-5. 42 Dahlstrom O, Timpka T, Hass U, Skogh T, Thyberg I. A simple method for heuristic modeling of expert knowledge in chronic disease: identification of prognostic subgroups in rheumatology. Stud Health Technol Inform 2008; 136: 157-62. 43 Gellerstedt M, Glymour C, Madigan D, Pregibon D, Smyth P. Statistical inference and data mining. Communications of ACM 1996; 39 (11) 35-41. 44 Goletsis Y, Papaloukas C, Fotiadis DI, Likas A, Michalis LK. Automated ischemic beat classification using genetic algorithms and multicriteria decision analysis. IEEE Trans Biomed Eng 2004; 51: 1717-25. 45 Jesneck JL, Nolte LW, Baker JA, Floyd CE, Lo JY. Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis. Med Phys 2006; 33: 2945-54. 46 Pakhomov SV, Buntrock J, Chute CG. Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier. J Biomed Inform 2005; 38: 145-53. 47 Varpa K, Iltanen K, Juhola M. Machine learning method for knowledge discovery experimented with otoneurological data. Comput Methods Programs Biomed 2008; 91: 154-64. 48 Lin JH, Haug PJ. Data preparation framework for preprocessing clinical data in data mining. AMIA Annu Symp Proc 2006; : 489-93. 49 Alvarez SM, Poelstra BA, Burd RS. Evaluation of a Bayesian decision network for diagnosing pyloric stenosis. J Pediatr Surg 2000; 41: 155-61. 50 Cohen G, Hilario M, Sax H, Hugo S, Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006; 37: 7-18. 51 Bellazzi R, Larizza C, Magni P, Bellazzi R. Temporal data mining for the quality assessment of hemodialysis services. Artif Intell Med 2005; 34 (01) 25-39. 52 Nannings B, Bosman RJ, Abu-Hanna A. A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. Methods Inf Med 2008; 47 (06) 480-8. 53 Jalloh OB, Waitman LR. Improving Computerized Provider Order Entry (CPOE) usability by data mining users’queries from access logs. AMIA Annu Symp Proc 2006; : 379-83. 54 Korhonen M, Salo S, Suni J, Larmas M. Computed online determination of life-long mean index values for carious, extracted, and/or filled permanent teeth. Acta Odontol Scand 2007; 65: 214-8. 55 Nguyen A, Moore D, McCowan I, Courage MJ. Multi-class classification of cancer stages from freetext histology reports using support vector machines. Conf Proc IEEE Eng Med Biol Soc 2007; 5140-3. 56 Spat S, Cadonna B, Rakovac I, Gütl C, Leitner H, Stark G. et al. Enhanced information retrieval from narrative German-language clinical text documents using automated document classification. Stud Health Technol Inform 2008; 136: 473-8. 57 Szarvas G, Farkas R, Busa-Fekete R. State-of-theart anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc 2007; 14: 574-80. 58 Harper PR. A review and comparison of classification algorithms for medical decision making. Health Policy 2005; 71: 315-31. 59 Tusch G, Bretl CE, Connor M, Das A. SPOT Towards Temporal Data Mining in Medicine and Bioinformatics. In: AMIA Annu Symp Proc 2008; 1157. 60 Raj R, O’Connor MJ, Das AK. An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research. AMIA Annu Symp Proc 2007; : 614-9. 61 Lin JH, Haug PJ. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J Biomed Inform 2008; 41: 1-14. 62 Huang K, Yang H, King I, Lyu MR. Maximizing sensitivity in medical diagnosis using biased minimax probability machine. IEEE Trans Biomed Eng 2006; 53: 821-31. 63 Barnes J, Chambers I, Piper I, Citerio G, Contant C, Enblad P. et al. Accurate data collection for head injury monitoring studies: a data validation methodology. Acta Neurochir Suppl 2005; 95: 39-41. 64 Le Duff F, Happe A, Burgun A, Levionnois S, Bremond M, Le Beux P. Sharing medical data for patient path analysis with data mining method. Stud Health Technol Inform 2001; 84: 1364-8. 65 Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS. et al. Grand challenges in clinical decision support. J Biomed Inform 2008; 41: 387-92. 66 Pakhomov SV, Hanson PL, Bjornsen SS, Smith SA. Automatic classification of foot examination findings using clinical notes and machine learning. J Am Med Inform Assoc 2008; 15: 198-202. 67 Cios KJ, Moore GW. Uniqueness of medical data mining. Artif Intell Med 2002; 26: 1-24. 68 Depeursinge A, Iavindrasana J, Hidki A, Cohen G, Geissbuhler A, Platon A. et al. Comparative Performance Analysis of State-of-the-Art Classification Algorithms Applied to Lung Tissue Categorization. J Digit Imaging. 2008 69 Iavindrasana J, Cohen G, Depeursinge A, Meyer R, Geissbuhler A. Minimal Set of Attributes Required to Report Hospital-Acquired Infection Cases. In: IDAMAP. 2008 70 Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 2008; 15: 25-8. 71 Suckling J, Parker J, Dance DR, Astley S, Hutt I, Boggis CR. et al. The Mammographic Image Analysis Society digital mammogram database. In: Exerpta Medica. International Congress; 1994; 375-8. 72 Bath PA. Data mining in health and medical information. Annual Review of Information Science and Technology 2004; 38 (01) 331-69. 73 Autio L, Juhola M, Laurikkala J. On the neural network classification of medical data and an endeavour to balance non-uniform data sets with artificial data extension. Comput Biol Med 2007; 37: 388-97. 74 Berman JJ. Confidentiality issues for medical data miners. Artif Intell Med 2002; 26: 25-36. 75 Awaya T, Ohtaki K, Yamada T, Yamamoto K, Miyoshi T, Itagaki Y. et al. Automation in drug inventory management saves personnel time and budget. Yakugaku Zasshi 2005; 125: 427-32. 76 Bernstein SL, Whitaker D, Winograd J, Brennan JA. An electronic chart prompt to decrease proprietary antibiotic prescription to self-pay patients. Acad Emerg Med 2005; 12: 225-31. 77 Brenneman SK, Lacroix AZ, Buist DS, Chen YT, Abbott TA. Evaluation of decision rules to identify postmenopausal women for intervention related to osteoporosis. Dis Manag 2003; 06: 159-68. 78 Bilska-Wolak AO, Floyd CE. Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer. Phys Med Biol 2004; 49: 4219-37. 79 Berner ES, Moss J. Informatics challenges for the impending patient information explosion. J Am Med Inform Assoc 2005; 12: 614-7. 80 Fayyad U, Piatetsky-shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Magazine 1996; 17: 37-54. 81 Bilska-Wolak AO, Floyd CE, Lo JY, Baker JA. Computer aid for decision to biopsy breast masses on mammography: validation on new cases. Acad Radiol 2005; 12: 671-80. 82 Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc 2006; : 399-403. 83 Aronsky D, Kasworm E, Jacobson JA, Haug PJ, Dean NC. Electronic screening of dictated reports to identify patients with do-not-resuscitate status. J Am Med Inform Assoc 2004; 11: 403-9. 84 Richardson M, Domingos P. Learning with knowledge from multiple experts. In: In ICML 20. 2003; 624-31. 85 Cohen G, Sax H, Geissbuhler A. Novelty detection using one-class Parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inform 2008; 136: 21-6. 86 Jakkula V, Cook DJ. Anomaly detection using temporal data mining in a smart home environment. Methods Inf Med 2008; 47 (01) 70-5. 87 Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. 2005 88 Bennett KP, Blue JA. A Support Vector Machine Approach to Decision Trees. In: Department of Mathematical Sciences Math Report No. 97-100, Rensselaer Polytechnic Institute. 1997; 2396-401. 89 Berner ES, Maisiak RS, Heuderbert GR, Young KR. Clinician performance and prominence of diagnoses displayed by a clinical diagnostic decision support system. AMIA Annu Symp Proc 2003; : 76-80. 90 Hyun S, Bakken S, Johnson SB. Markup of temporal information in electronic health records. Stud Health Technol Inform 2006; 122: 907-8. 91 Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 1998; 10 (07) 1895-923. 92 Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 2006; 07: 1-30. 93 Jin HW, Chen J, He H, Williams GJ, Kelman C, O’Keefe CM. Mining unexpected temporal associations: applications in detecting adverse drug reactions. IEEE Trans Inf Technol Biomed 2008; 12: 488-500. 94 Mitchell DR, Mitchell JA. Status of clinical gene sequencing data reporting and associated risks for information loss. J Biomed Inform 2007; 40 (01) 47-54. 95 Pierson JM, Gossa J, Wehrle P, Cardenas Y, Cahon S, El Samad M. et al. GGM: efficient navigation and mining in distributed genomedical data. IEEE Trans Nanobioscience 2007; 06: 110-6. 96 McSherry D. Dynamic and static approaches to clinical data mining. Artif Intell Med 1999; 16: 97-115. 97 Lee IN, Liao SC, Embrechts M. Data mining techniques applied to medical information. Med Inform Internet Med 2000; 25: 81-102. 98 Imai T, Aramaki E, Kajino M, Miyo K, Onogi Y, Ohe K. Finding malignant findings from radiological reports using medical attributes and syntactic information. Stud Health Technol Inform 2007; 129: 540-4. 99 Sauleau EA, Paumier JP, Buemi A. Medical record linkage in health information systems by approximate string matching and clustering. BMC Med Inform Decis Mak 2005; 05: 32. 100 Ordonez C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed 2006; 10: 334-43. 101 Roddick JF, Fule P, Graco WJ. Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explorations 2003; 05 (01) 94-99.