Lessons Learned from Data Mining of WHO Mortality Database

W. Paoin

doi:10.3414/ME10-02-0019

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Methods Inf Med 2011; 50(04): 380-385
DOI: 10.3414/ME10-02-0019

Special Topic – Original Articles

Schattauer GmbH

Lessons Learned from Data Mining of WHO Mortality Database

W. Paoin

¹Faculty of Medicine, Thammasat University, Pathumthani, Thailand

› Author Affiliations

Further Information

Publication History

received: 02 March 2010

accepted: 17 June 2010

Publication Date:
18 January 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Summary

Objectives: The objectives of this research were to test the ability of classification algorithms to predict the cause of death in the mortality data with unknown causes, to find association between common causes of death, to identify groups of countries based on their common causes of death, and to extract knowledge gained from data mining of the World Health Organization mortality database.

Methods: The WEKA software version 3.5.3 was used for classification, clustering and association analysis of the World Health Organization mortality database which contained 1,109,537 records. Three major steps were performed: Step 1 – preprocessing of data to convert all records into suitable formats for each type of analysis algorithm; Step 2 – analyzing data using the C4.5 decision tree and Naïve Bayes classification algorithm, K-means clustering algorithm and Apriori association analysis algorithm; Step 3 – interpretation of results and hypothesis testing after clustering analysis.

Results: Using a C4.5 decision tree classifier to predict cause of death, we obtained 440 leaf nodes that correctly classify death instances with an accuracy of 40.06%. Naïve Bayes classification algorithm calculated probability of death from each disease that correctly classify death instances with an accuracy of 28.13%. K means clustering divided the data into four clusters with 189, 59, 65, 144 country-years in each cluster. A Chi-square was used to test discriminate disease differences found in each cluster which had different diseases as predominant causes of death. Apriori association analysis produced association rules of linkage among cancer of the lung, hypertension and cerebrovascular diseases. These were found in the top five leading causes of death with 99–100% confidence level.

Conclusion: Classification tools produced the poorest results in predicting cause of death. Given the inadequacy of variables in the WHO database, creation of a classification model to predict specific cause of death was impossible. Clustering and association tools yielded interesting results that could be used to identify new areas of interest in mortality data analysis. This can be used in data mining analysis to help solve some quality problems in mortality data.

Keywords

Mortality statistics - data mining - classification - clustering - association analysis

References
1 Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. CA: Elsevier Inc; 2007. pp 5-27.

MissingFormLabel
Search in Google Scholar
2 Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. MA: Pearson Education Inc; 2006

MissingFormLabel
Search in Google Scholar
3 Peek N, Combi C, Tucker A. Biomedical Data Mining. Methods Inf Med 2009; 48: 225-228.

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
4 Thailand Ministry of Public Health. Public Health Statistics, A. D. 1996-2005. 2005

MissingFormLabel
PubMed Search in Google Scholar
5 Hanmer L, Lensink R, White H. Infant and child mortality in developing countries: Analysing the data for Robust determinants. Journal of Development Studies 2003; 40 (01) 101-118.

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 McMichael AJ, McKee M, Shkolnikov V, Valkonen T. Mortality trends and setbacks: global convergence or divergence?. Lancet 2004; 363: 1155-1159.

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Patil BM, Joshi RC, Toshniwal D, Biradar S. A New Approach: Role of Data Mining in Prediction of Survival of Burn Patients. Journal of Medical System, Online First 2010; February 20, 2010

MissingFormLabel
PubMed Search in Google Scholar
8 Ramon J, Fierens D, Guiza F, Meyfroidt G, Blockeel H, Bruynooghe M, Berghe GVD. Mining data from intensive care patients. Advanced Engineering Informatics 2007; 23: 243-256.

MissingFormLabel
PubMed Search in Google Scholar
9 Zhang D, Ha QC, Lu M. Mining California Vital Statistics Data. Data Mining 2001 ICDM 2001. Proceeding of IEEE Conference on Data Mining 2001 pp 671-672.

MissingFormLabel
PubMed Search in Google Scholar
10 Murillo J, Min S. An Outcome Discovery System to Determine Mortality Factors in Primary Care Facilities. Proceeding of the third international workshop on Data and text mining in bioinformatics, 2009. Hong Kong. Association for Computing Machinery. New York: 2009. pp 95-96.

MissingFormLabel
Search in Google Scholar
11 WHO Mortality Database (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/morttables/en/index.html

MissingFormLabel
PubMed
12 WHO. Reported information on the mortality statistics. (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/mort2005survey/en/index.html

MissingFormLabel
PubMed
13 Moser K, Shkolnikov V, Leon DA. World Mortality 1950-2000:divergence replaces convergence from the late 1980s. Bulletin of the World Health Organization 2005; 83: 202-209.

MissingFormLabel
PubMed Search in Google Scholar
14 Witten IH, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd ed. CA: Elsevier Inc; 2005. pp 365-368.

MissingFormLabel
Search in Google Scholar
15 World Health Organization.. International Classification of Disease and Related Health Problems, 10th Revision. 2nd ed. 2004 pp 1163-1166.

MissingFormLabel
PubMed Search in Google Scholar
16 Richards G, Rayward VJ, Sonksen PH, Carey S, Weng C. Data minings for indicators or early mortality in a databases of clinical records. Artificial Intelligence in Medicine 2001; 22: 215-231.

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Young MC, Hye SK, Kwan CT, Hyun JP, Seung HH. Analysis of healthcare quality indicator using data mining and decision support system. Expert Systems with Applications 2003; 24: 167-172.

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Mullins IM, Siatady MS, Lyman J, Scully K, Garrette CT. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine 2006; 36: 1351-1357.

MissingFormLabel
Crossref PubMed Search in Google Scholar
19 Chen YW, Larbani M, Cheng-Yen H, Chao-Wen C. Introduction of affinity set and its application in data-mining example of delayed diagnosis. Expert Systems with Applications 2009; 36: 10883-10889.

MissingFormLabel
Crossref PubMed Search in Google Scholar
20 Bratu CV, Muresan T. Improving classification accuracy through feature selection. In: Proceedings of the 4th International Conference on Intelligent Computer Communication and Processing; 2008 Aug 28-30; Cluj-Napoca, Romania: IEEE; 2008

MissingFormLabel
Search in Google Scholar
21 Mathers CD, Ma Fat D, Inoue M, Rao C, Lopez AD. Counting the dead and what they died of: an assessment of the global status of cause of death data. Bulletin of the World Health Organization 2005; 83: 171-177.

MissingFormLabel
PubMed Search in Google Scholar

Subscribe to RSS

Share / Bookmark

Lessons Learned from Data Mining of WHO Mortality Database

Publication History

Summary

Keywords

References