Subscribe to RSS
DOI: 10.3414/ME10-02-0019
Lessons Learned from Data Mining of WHO Mortality Database
Publication History
received:
02 March 2010
accepted:
17 June 2010
Publication Date:
18 January 2018 (online)
Summary
Objectives: The objectives of this research were to test the ability of classification algorithms to predict the cause of death in the mortality data with unknown causes, to find association between common causes of death, to identify groups of countries based on their common causes of death, and to extract knowledge gained from data mining of the World Health Organization mortality database.
Methods: The WEKA software version 3.5.3 was used for classification, clustering and association analysis of the World Health Organization mortality database which contained 1,109,537 records. Three major steps were performed: Step 1 – preprocessing of data to convert all records into suitable formats for each type of analysis algorithm; Step 2 – analyzing data using the C4.5 decision tree and Naïve Bayes classification algorithm, K-means clustering algorithm and Apriori association analysis algorithm; Step 3 – interpretation of results and hypothesis testing after clustering analysis.
Results: Using a C4.5 decision tree classifier to predict cause of death, we obtained 440 leaf nodes that correctly classify death instances with an accuracy of 40.06%. Naïve Bayes classification algorithm calculated probability of death from each disease that correctly classify death instances with an accuracy of 28.13%. K means clustering divided the data into four clusters with 189, 59, 65, 144 country-years in each cluster. A Chi-square was used to test discriminate disease differences found in each cluster which had different diseases as predominant causes of death. Apriori association analysis produced association rules of linkage among cancer of the lung, hypertension and cerebrovascular diseases. These were found in the top five leading causes of death with 99–100% confidence level.
Conclusion: Classification tools produced the poorest results in predicting cause of death. Given the inadequacy of variables in the WHO database, creation of a classification model to predict specific cause of death was impossible. Clustering and association tools yielded interesting results that could be used to identify new areas of interest in mortality data analysis. This can be used in data mining analysis to help solve some quality problems in mortality data.
-
References
- 1 Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. CA: Elsevier Inc; 2007. pp 5-27.
- 2 Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. MA: Pearson Education Inc; 2006
- 3 Peek N, Combi C, Tucker A. Biomedical Data Mining. Methods Inf Med 2009; 48: 225-228.
- 4 Thailand Ministry of Public Health. Public Health Statistics, A. D. 1996-2005. 2005
- 5 Hanmer L, Lensink R, White H. Infant and child mortality in developing countries: Analysing the data for Robust determinants. Journal of Development Studies 2003; 40 (01) 101-118.
- 6 McMichael AJ, McKee M, Shkolnikov V, Valkonen T. Mortality trends and setbacks: global convergence or divergence?. Lancet 2004; 363: 1155-1159.
- 7 Patil BM, Joshi RC, Toshniwal D, Biradar S. A New Approach: Role of Data Mining in Prediction of Survival of Burn Patients. Journal of Medical System, Online First 2010; February 20, 2010
- 8 Ramon J, Fierens D, Guiza F, Meyfroidt G, Blockeel H, Bruynooghe M, Berghe GVD. Mining data from intensive care patients. Advanced Engineering Informatics 2007; 23: 243-256.
- 9 Zhang D, Ha QC, Lu M. Mining California Vital Statistics Data. Data Mining 2001 ICDM 2001. Proceeding of IEEE Conference on Data Mining 2001 pp 671-672.
- 10 Murillo J, Min S. An Outcome Discovery System to Determine Mortality Factors in Primary Care Facilities. Proceeding of the third international workshop on Data and text mining in bioinformatics, 2009. Hong Kong. Association for Computing Machinery. New York: 2009. pp 95-96.
- 11 WHO Mortality Database (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/morttables/en/index.html
- 12 WHO. Reported information on the mortality statistics. (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/mort2005survey/en/index.html
- 13 Moser K, Shkolnikov V, Leon DA. World Mortality 1950-2000:divergence replaces convergence from the late 1980s. Bulletin of the World Health Organization 2005; 83: 202-209.
- 14 Witten IH, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd ed. CA: Elsevier Inc; 2005. pp 365-368.
- 15 World Health Organization.. International Classification of Disease and Related Health Problems, 10th Revision. 2nd ed. 2004 pp 1163-1166.
- 16 Richards G, Rayward VJ, Sonksen PH, Carey S, Weng C. Data minings for indicators or early mortality in a databases of clinical records. Artificial Intelligence in Medicine 2001; 22: 215-231.
- 17 Young MC, Hye SK, Kwan CT, Hyun JP, Seung HH. Analysis of healthcare quality indicator using data mining and decision support system. Expert Systems with Applications 2003; 24: 167-172.
- 18 Mullins IM, Siatady MS, Lyman J, Scully K, Garrette CT. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine 2006; 36: 1351-1357.
- 19 Chen YW, Larbani M, Cheng-Yen H, Chao-Wen C. Introduction of affinity set and its application in data-mining example of delayed diagnosis. Expert Systems with Applications 2009; 36: 10883-10889.
- 20 Bratu CV, Muresan T. Improving classification accuracy through feature selection. In: Proceedings of the 4th International Conference on Intelligent Computer Communication and Processing; 2008 Aug 28-30; Cluj-Napoca, Romania: IEEE; 2008
- 21 Mathers CD, Ma Fat D, Inoue M, Rao C, Lopez AD. Counting the dead and what they died of: an assessment of the global status of cause of death data. Bulletin of the World Health Organization 2005; 83: 171-177.