Generating Reference Models for Structurally Complex Data

F. Alonso; J. A. Lara; L. Martinez; A. Pérez; J. P. Valente

doi:10.3414/ME12-01-0106

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2013; 52(05): 441-453
DOI: 10.3414/ME12-01-0106

Original Articles

Schattauer GmbH

Generating Reference Models for Structurally Complex Data

Application to the Stabilometry Medical Domain

Authors

F. Alonso

¹CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
J. A. Lara

²Facultad de Enseñanzas Técnicas, Universidad a Distancia de Madrid, Madrid, Spain
L. Martinez

¹CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
A. Pérez

¹CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
J. P. Valente

¹CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain

Further Information

Publication History

received: 20 November 2012

accepted: 16 April 2013

Publication Date:
20 January 2018 (online)

Permissions and Reprints

Summary

Objectives: We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of single-valued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.

Methods: We have developed a conceptual model to represent structurally complex data hierarchically. Additionally, we have devised a method that uses the similarity tree concept to measure how similar two structurally complex individuals are, plus an outlier detection and filtering method. These methods provide the groundwork for the method that we have designed for generating reference models of a set of structurally complex individuals. A key idea of this method is to use event-based analysis for modeling time series.

Results: The proposed framework has been applied to the medical field of stabilometry. To validate the outlier detection method we used 142 individuals, and there was a match between the outlier ratings by the experts and by the system for 139 individuals (97.8%). To validate the reference model generation method, we applied k-fold cross validation (k = 5) with 60 athletes (basket-ball players and ice-skaters), and the system correctly classified 55 (91.7%). We then added 30 non-athletes as a control group, and the method output the correct result in a very high percentage of cases (96.6%).

Conclusions: We have achieved very satisfactory results for the tests on data from such a complex domain as stabilometry and for the comparison of the reference model generation method with other methods. This supports the validity of this framework.

Keywords

Data mining - time series - reference models - structurally complex data - outlier detection

References
1 Bellazzi R, Diomidous M, Sarkar IN, Takabayashi K, Ziegler A, McCray AT. Data Analysis and Data Mining: Current Issues in Biomedical Informatics. Methods Inf Med 2001; 50 (06) 536-544.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
2 Jouhet V, Defossez G, Burgun A, Le Beux P, Levillain P, Ingrand P. et al. Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer. Methods Inf Med 2012; 51 (03) 242-251.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
3 Rantner LJ, Stühlinger MC, Nowak CN, Spuller K, Etsadashvili K, Stühlinger X. et al. Localizing the Accessory Pathway in Ventricular Preexcitation Patients Using a Score Based Algorithm. Methods Inf Med 2012; 51 (01) 3-12.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
4 Harle CA, Downs JS, Padman R. A Clustering Approach to Segmenting Users of Internet-based Risk Calculators. Methods Inform Med 2011; 50 (03) 244-252.

Search in Google Scholar
Download RIS citation
5 Paoin W. Lessons Learned from Data Mining of WHO Mortality Database. Methods Inf Med 2011; 50 (04) 380-385.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
6 Bethel CL, Hall LO, Goldgof D. Mining for Implications in Medical Data. In: Tang YY, Wang SP, Lorette G, Yeung DS, Han T. editors. Proceedings of the 18th International Conference on Pattern Recognition; Aug 20-24. 2006. Hong Kong, China. Washington, DC: IEEE Computer Society; 1212-1215.

Search in Google Scholar
Download RIS citation
7 Lama E, Mello P, Nanetti A, Riguzzi F, Storari S, Valastro G. Artificial Intelligence Techniques for Monitoring Dangerous Infections. IEEE T Inf Technol B 2006; 10 (01) 143-155.

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Cios KJ. editor. Medical Data Mining and Knowledge Discovery. Heidelberg: Springer; 2001.

Search in Google Scholar
Download RIS citation
9 Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008; 8: 37-49.

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Terekhov Y. Stabilometry as a diagnostic tool in clinical medicine. Can Med Assoc J 1976; 115 (07) 631-633.

PubMed Search in Google Scholar
Download RIS citation
11 Embley DW, Thalheim B. editors. Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges. Berlin: Springer; 2011.

Search in Google Scholar
Download RIS citation
12 Booch G, Rumbaugh J, Jacobson I. The Unified Modeling Language User Guide. 2nd ed.. Reading, MA: Addison-Wesley; 2005.

Search in Google Scholar
Download RIS citation
13 OMG Unified Modeling Language (OMG UML) Infrastructure specification. Version 2.4.1. 2011 Aug (cited 2011). Available from: http://www.omg.org/spec/UML/2.4.1/Infrastructure.

Download RIS citation
14 OMG Unified Modeling Language (OMG UML) Superstructure specification. Version 2.4.1. 2011 Aug (cited 2011). Available from: http://www.omg.org/spec/UML/2.4.1/Superstructure.

Download RIS citation
15 Yang R, Kalnis P, Tung AKH. Similarity evaluation on tree-structured data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 14-16. 2005. Baltimore, MA. New York, NY: ACM; 754-765.

Search in Google Scholar
Download RIS citation
16 Li G, Liu X, Feng J, Zhou L. Efficient Similarity Search for Tree-Structured. Data. In: Ludäscher B, Mamoulis N. editors. Proceedings of 20th International Conference, SSDBM; July 9-11. 2008. Hong Kong, China. Lecture Notes in Computer Science 5069 Berlin/Heidelberg: Springer-Verlag; 2008: 131-149.

Search in Google Scholar
Download RIS citation
17 Alonso F, Martínez L, Pérez A, Santamaría A, Valente JP. Modelling Medical Time Series Using Grammar-Guided Genetic Programming. In: Perner P. editor. ICDM 2008; Proceedings of the Industrial Conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing and Theoretical Aspects; July 16-18 2008. Leipzig, Germany. Lecture Notes in Computer Science 5077 Berlin/Heidelberg: Springer-Verlag; 2008: 32-46.

Search in Google Scholar
Download RIS citation
18 Agrawal R, Faloutsos C, Swami A. Efficient Similarity Search in Sequence Databases. FODO Conference; Oct 13-15 19935; Evanston, IL

Download RIS citation
19 Chan K-P, Fu AW-C. Efficient Time Series Matching by Wavelets. In: Kitsuregawa M, Papazoglou MP, Pu C. editors. Proceedings of the 15th International Conference on Data Engineering; March 23-26; 1999. Sydney, Australia. Washington, DC: IEEE Computer Society 1999: 126-133.

Search in Google Scholar
Download RIS citation
20 Kahveci T, Singh AK, Gürel A. An Efficient Index Structure for Shift and Scale Invariant Search of Multi-Attribute Time Sequences. In: Agrawal R, Dittrich KR. editors. Proceedings of the 18th International Conference on Data Engineering; Feb 26-March 1, 2002 ; San Jose, CA. Washington DC: IEEE Computer Society; 2002: 266

Search in Google Scholar
Download RIS citation
21 Perng C-S. Wang H, Zhang SR, Parker DS. Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases. In: Lomet DB, Weikum G. editors. Proceedings of the 16th International Conference on Data Engineering; Feb 28-March 3> . 2000. San Diego, USA. Washington, DC: IEEE Computer Society; 33-44.

Search in Google Scholar
Download RIS citation
22 Negi T, Bansal V. Time Series: Similarity Search and its Applications. In: Proceedings of the International Conference on Systemics, Cybernetics and Informatics; Jan 7-9. 2005. Hyderabad, India. Hyderabad, India: Pentagram Research Centre Pvt. Ltd; 2005: 528-533.

Search in Google Scholar
Download RIS citation
23 Rakthanmanon T, Keogh EJ, Lonardi S, Evans S. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data. In: Cook D, Pei J, Wang W, Zaïane O, Wu X. editors. ICDM 2011; Proceedings of the 11th IEEE International Conference on Data Mining; Dec 11-14,. 2011: Vancouver, Canada. Washington, DC: IEEE Computer Society; 547-556.

Search in Google Scholar
Download RIS citation
24 Lara JA, Pérez A, Valente JP, López-Illescas A. Comparing time series through event clustering. In: Corchado JM, De Paz JF, Rocha MP, Fernández Riverola F. editors. IWPACBB’08: Proceedings of the 2nd International Workshop on Practical Applications of Computational Biology & Bioinformatics; Oct 22-24 2008. University of Salamanca, Spain Berlin: Springer; 2009: 1-9.

Search in Google Scholar
Download RIS citation
25 Kuhnt S, Griefahn B. Annoyance from multiple transportation noise: Statistical models and outlier detection. Methods Inf Med 2004; 43 (05) 510-515.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
26 Krusinska E, Mathiesen UL, Franzen L, Bodemar G, Wigertz O. Influence of Outliers on the Association between Laboratory Data and Histopathological Findings in Liver-Biopsy. Methods Inf Med 1993; 32 (05) 388-395.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
27 Stefatos G, Hamza AB. Cluster PCA for Outliers Detection in High-Dimensional Data. Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics; Oct 7-10,. 2007. Montréal, Canada: IEEE; 2007; 3961-3966.

Search in Google Scholar
Download RIS citation
28 Wang J-S, Chiang J-C. A Cluster Validity Measure with Outlier Detection for Support Vector Clustering. IEEE T Syst Man Cy B 2008; 38 (01) 78-89.

PubMed Search in Google Scholar
Download RIS citation
29 Yang P, Huang B. A Spectral Clustering Algorithm for Outlier Detection. In: Tan H. editor. FITMW’08: Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering; Nov 20, 2008. Washington, DC: IEEE Computer Society; 2008: 33-36.

Search in Google Scholar
Download RIS citation
30 Zhang T, Ramakhrishnan R, Livny M. Birch: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS. editors. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data; June 4-6. 1996. Montréal, Canada: ACM Press; 1996: 103-114.

Search in Google Scholar
Download RIS citation
31 Li XY, Ye N. A supervised clustering algorithm for computer intrusion detection. Knowl Inform Syst 2005; 8 (04) 498-509.

Crossref Search in Google Scholar
Download RIS citation
32 Yoon K-A, Kwon O-S, Bae D-H. An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method. In: ESEM ’07. Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement; Sep. 20-21. 2007. Madrid, Spain. Washington, DC: IEEE Computer Society; 2007: 443-445.

Search in Google Scholar
Download RIS citation
33 Torgo L, Soares C. Resource-bounded Outlier Detection using Clustering Methods. Proceedings of the 2010 Conference on Data Mining for Business Applications. 2010. The Netherlands: IOS Press Amsterdam 84-98.

Search in Google Scholar
Download RIS citation
34 Chan PK, Mahoney MV. Modeling Multiple Time Series for Anomaly Detection. In: Proceedings of the 5th IEEE International Conference on Data Mining; Nov 27-30. 2005. Houston, TX. Washington, DC: IEEE Computer Society; 2005; 90-97.

Search in Google Scholar
Download RIS citation
35 Rombo S, Terracina G. Discovering Representative Models in Large Time Series Databases. In: Christiansen H, Hacid M-S, Andreasen T, Larsen HL. editors. Proceedings of the 6th International Conference on Flexible Query Answering Systems; June 24-26. 2004. Lyon, France Lecture Notes in Computer Science 2004, 3055 Berlin: Springe; 84-97.

Search in Google Scholar
Download RIS citation
36 Lara JA, López-Illescas A, Pérez A, Valente JP. A Language for Defining Events in Multi-Dimensional Time Series: Application to a Medical Domain. In: Troncoso A, Arias M. editors. Proceedings of the 1st International Workshop on Mining of Non-Conventional Data; Nov 13, 2009. 2009. Seville, Spain. Seville: Universidad de Sevilla: 2009.

Search in Google Scholar
Download RIS citation
37 Jahankhani P, Lara JA, Pérez A, Valente JP. Two Different Approaches of Feature Extraction for Classifying the EEG Signals. In: Iliadis L, Jayne C. (editors). Engineering Applications of Neural Networks. Proceedings of the 12th INNS EANN-SIG International Conference (EANN 2011) and 7th IFIP WG 12.5 International Conference (AIAI 2011); Sep 15-18 2011. Corfu, Greece. IFIP Advances in Information and Communication Technology Series; 363. Berlin: Springer; 2011: 229-239.

Search in Google Scholar
Download RIS citation
38 Jahankhani P, Revett K, Kodogiannis V. EEG Signal Classification Using Wavelet Feature Extraction and Neural Networks. In: IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing; Oct 3-6 2006. Sofia, Bulgaria: 120-125.

Search in Google Scholar
Download RIS citation
39 Bo Zhu. Applying Event-based Data Mining to Traffic Flow Forecasting. Master Thesis, Universidad Politécnica de Madrid, Julio 2012.

Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Generating Reference Models for Structurally Complex Data

Authors

Publication History

Summary

Keywords

References