Methods Inf Med 2013; 52(05): 441-453
DOI: 10.3414/ME12-01-0106
Original Articles
Schattauer GmbH

Generating Reference Models for Structurally Complex Data

Application to the Stabilometry Medical Domain
F. Alonso
1   CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
,
J. A. Lara
2   Facultad de Enseñanzas Técnicas, Universidad a Distancia de Madrid, Madrid, Spain
,
L. Martinez
1   CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
,
A. Pérez
1   CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
,
J. P. Valente
1   CETTICO Research Group, Departamento de Lenguajes y Sistemas Informáticos e Ingenieria del Software, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
› Author Affiliations
Further Information

Publication History

received: 20 November 2012

accepted: 16 April 2013

Publication Date:
20 January 2018 (online)

Summary

Objectives: We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of single-valued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.

Methods: We have developed a conceptual model to represent structurally complex data hierarchically. Additionally, we have devised a method that uses the similarity tree concept to measure how similar two structurally complex individuals are, plus an outlier detection and filtering method. These methods provide the groundwork for the method that we have designed for generating reference models of a set of structurally complex individuals. A key idea of this method is to use event-based analysis for modeling time series.

Results: The proposed framework has been applied to the medical field of stabilometry. To validate the outlier detection method we used 142 individuals, and there was a match between the outlier ratings by the experts and by the system for 139 individuals (97.8%). To validate the reference model generation method, we applied k-fold cross validation (k = 5) with 60 athletes (basket-ball players and ice-skaters), and the system correctly classified 55 (91.7%). We then added 30 non-athletes as a control group, and the method output the correct result in a very high percentage of cases (96.6%).

Conclusions: We have achieved very satisfactory results for the tests on data from such a complex domain as stabilometry and for the comparison of the reference model generation method with other methods. This supports the validity of this framework.

 
  • References

  • 1 Bellazzi R, Diomidous M, Sarkar IN, Takabayashi K, Ziegler A, McCray AT. Data Analysis and Data Mining: Current Issues in Biomedical Informatics. Methods Inf Med 2001; 50 (06) 536-544.
  • 2 Jouhet V, Defossez G, Burgun A, Le Beux P, Levillain P, Ingrand P. et al. Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer. Methods Inf Med 2012; 51 (03) 242-251.
  • 3 Rantner LJ, Stühlinger MC, Nowak CN, Spuller K, Etsadashvili K, Stühlinger X. et al. Localizing the Accessory Pathway in Ventricular Preexcitation Patients Using a Score Based Algorithm. Methods Inf Med 2012; 51 (01) 3-12.
  • 4 Harle CA, Downs JS, Padman R. A Clustering Approach to Segmenting Users of Internet-based Risk Calculators. Methods Inform Med 2011; 50 (03) 244-252.
  • 5 Paoin W. Lessons Learned from Data Mining of WHO Mortality Database. Methods Inf Med 2011; 50 (04) 380-385.
  • 6 Bethel CL, Hall LO, Goldgof D. Mining for Implications in Medical Data. In: Tang YY, Wang SP, Lorette G, Yeung DS, Han T. editors. Proceedings of the 18th International Conference on Pattern Recognition; Aug 20-24. 2006. Hong Kong, China. Washington, DC: IEEE Computer Society; 1212-1215.
  • 7 Lama E, Mello P, Nanetti A, Riguzzi F, Storari S, Valastro G. Artificial Intelligence Techniques for Monitoring Dangerous Infections. IEEE T Inf Technol B 2006; 10 (01) 143-155.
  • 8 Cios KJ. editor. Medical Data Mining and Knowledge Discovery. Heidelberg: Springer; 2001.
  • 9 Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008; 8: 37-49.
  • 10 Terekhov Y. Stabilometry as a diagnostic tool in clinical medicine. Can Med Assoc J 1976; 115 (07) 631-633.
  • 11 Embley DW, Thalheim B. editors. Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges. Berlin: Springer; 2011.
  • 12 Booch G, Rumbaugh J, Jacobson I. The Unified Modeling Language User Guide. 2nd ed.. Reading, MA: Addison-Wesley; 2005.
  • 13 OMG Unified Modeling Language (OMG UML) Infrastructure specification. Version 2.4.1. 2011 Aug (cited 2011). Available from: http://www.omg.org/spec/UML/2.4.1/Infrastructure.
  • 14 OMG Unified Modeling Language (OMG UML) Superstructure specification. Version 2.4.1. 2011 Aug (cited 2011). Available from: http://www.omg.org/spec/UML/2.4.1/Superstructure.
  • 15 Yang R, Kalnis P, Tung AKH. Similarity evaluation on tree-structured data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 14-16. 2005. Baltimore, MA. New York, NY: ACM; 754-765.
  • 16 Li G, Liu X, Feng J, Zhou L. Efficient Similarity Search for Tree-Structured. Data. In: Ludäscher B, Mamoulis N. editors. Proceedings of 20th International Conference, SSDBM; July 9-11. 2008. Hong Kong, China. Lecture Notes in Computer Science 5069 Berlin/Heidelberg: Springer-Verlag; 2008: 131-149.
  • 17 Alonso F, Martínez L, Pérez A, Santamaría A, Valente JP. Modelling Medical Time Series Using Grammar-Guided Genetic Programming. In: Perner P. editor. ICDM 2008; Proceedings of the Industrial Conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing and Theoretical Aspects; July 16-18 2008. Leipzig, Germany. Lecture Notes in Computer Science 5077 Berlin/Heidelberg: Springer-Verlag; 2008: 32-46.
  • 18 Agrawal R, Faloutsos C, Swami A. Efficient Similarity Search in Sequence Databases. FODO Conference; Oct 13-15 19935; Evanston, IL
  • 19 Chan K-P, Fu AW-C. Efficient Time Series Matching by Wavelets. In: Kitsuregawa M, Papazoglou MP, Pu C. editors. Proceedings of the 15th International Conference on Data Engineering; March 23-26; 1999. Sydney, Australia. Washington, DC: IEEE Computer Society 1999: 126-133.
  • 20 Kahveci T, Singh AK, Gürel A. An Efficient Index Structure for Shift and Scale Invariant Search of Multi-Attribute Time Sequences. In: Agrawal R, Dittrich KR. editors. Proceedings of the 18th International Conference on Data Engineering; Feb 26-March 1, 2002 ; San Jose, CA. Washington DC: IEEE Computer Society; 2002: 266
  • 21 Perng C-S. Wang H, Zhang SR, Parker DS. Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases. In: Lomet DB, Weikum G. editors. Proceedings of the 16th International Conference on Data Engineering; Feb 28-March 3> . 2000. San Diego, USA. Washington, DC: IEEE Computer Society; 33-44.
  • 22 Negi T, Bansal V. Time Series: Similarity Search and its Applications. In: Proceedings of the International Conference on Systemics, Cybernetics and Informatics; Jan 7-9. 2005. Hyderabad, India. Hyderabad, India: Pentagram Research Centre Pvt. Ltd; 2005: 528-533.
  • 23 Rakthanmanon T, Keogh EJ, Lonardi S, Evans S. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data. In: Cook D, Pei J, Wang W, Zaïane O, Wu X. editors. ICDM 2011; Proceedings of the 11th IEEE International Conference on Data Mining; Dec 11-14,. 2011: Vancouver, Canada. Washington, DC: IEEE Computer Society; 547-556.
  • 24 Lara JA, Pérez A, Valente JP, López-Illescas A. Comparing time series through event clustering. In: Corchado JM, De Paz JF, Rocha MP, Fernández Riverola F. editors. IWPACBB’08: Proceedings of the 2nd International Workshop on Practical Applications of Computational Biology & Bioinformatics; Oct 22-24 2008. University of Salamanca, Spain Berlin: Springer; 2009: 1-9.
  • 25 Kuhnt S, Griefahn B. Annoyance from multiple transportation noise: Statistical models and outlier detection. Methods Inf Med 2004; 43 (05) 510-515.
  • 26 Krusinska E, Mathiesen UL, Franzen L, Bodemar G, Wigertz O. Influence of Outliers on the Association between Laboratory Data and Histopathological Findings in Liver-Biopsy. Methods Inf Med 1993; 32 (05) 388-395.
  • 27 Stefatos G, Hamza AB. Cluster PCA for Outliers Detection in High-Dimensional Data. Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics; Oct 7-10,. 2007. Montréal, Canada: IEEE; 2007; 3961-3966.
  • 28 Wang J-S, Chiang J-C. A Cluster Validity Measure with Outlier Detection for Support Vector Clustering. IEEE T Syst Man Cy B 2008; 38 (01) 78-89.
  • 29 Yang P, Huang B. A Spectral Clustering Algorithm for Outlier Detection. In: Tan H. editor. FITMW’08: Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering; Nov 20, 2008. Washington, DC: IEEE Computer Society; 2008: 33-36.
  • 30 Zhang T, Ramakhrishnan R, Livny M. Birch: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS. editors. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data; June 4-6. 1996. Montréal, Canada: ACM Press; 1996: 103-114.
  • 31 Li XY, Ye N. A supervised clustering algorithm for computer intrusion detection. Knowl Inform Syst 2005; 8 (04) 498-509.
  • 32 Yoon K-A, Kwon O-S, Bae D-H. An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method. In: ESEM ’07. Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement; Sep. 20-21. 2007. Madrid, Spain. Washington, DC: IEEE Computer Society; 2007: 443-445.
  • 33 Torgo L, Soares C. Resource-bounded Outlier Detection using Clustering Methods. Proceedings of the 2010 Conference on Data Mining for Business Applications. 2010. The Netherlands: IOS Press Amsterdam 84-98.
  • 34 Chan PK, Mahoney MV. Modeling Multiple Time Series for Anomaly Detection. In: Proceedings of the 5th IEEE International Conference on Data Mining; Nov 27-30. 2005. Houston, TX. Washington, DC: IEEE Computer Society; 2005; 90-97.
  • 35 Rombo S, Terracina G. Discovering Representative Models in Large Time Series Databases. In: Christiansen H, Hacid M-S, Andreasen T, Larsen HL. editors. Proceedings of the 6th International Conference on Flexible Query Answering Systems; June 24-26. 2004. Lyon, France Lecture Notes in Computer Science 2004, 3055 Berlin: Springe; 84-97.
  • 36 Lara JA, López-Illescas A, Pérez A, Valente JP. A Language for Defining Events in Multi-Dimensional Time Series: Application to a Medical Domain. In: Troncoso A, Arias M. editors. Proceedings of the 1st International Workshop on Mining of Non-Conventional Data; Nov 13, 2009. 2009. Seville, Spain. Seville: Universidad de Sevilla: 2009.
  • 37 Jahankhani P, Lara JA, Pérez A, Valente JP. Two Different Approaches of Feature Extraction for Classifying the EEG Signals. In: Iliadis L, Jayne C. (editors). Engineering Applications of Neural Networks. Proceedings of the 12th INNS EANN-SIG International Conference (EANN 2011) and 7th IFIP WG 12.5 International Conference (AIAI 2011); Sep 15-18 2011. Corfu, Greece. IFIP Advances in Information and Communication Technology Series; 363. Berlin: Springer; 2011: 229-239.
  • 38 Jahankhani P, Revett K, Kodogiannis V. EEG Signal Classification Using Wavelet Feature Extraction and Neural Networks. In: IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing; Oct 3-6 2006. Sofia, Bulgaria: 120-125.
  • 39 Bo Zhu. Applying Event-based Data Mining to Traffic Flow Forecasting. Master Thesis, Universidad Politécnica de Madrid, Julio 2012.