Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Lin Lawrence Guo; Stephen R. Pfohl; Jason Fries; Jose Posada; Scott Lanyon Fleming; Catherine Aftandilian; Nigam Shah; Lillian Sung

doi:10.1055/s-0041-1735184

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Download PDF

Appl Clin Inform 2021; 12(04): 808-815
DOI: 10.1055/s-0041-1735184

Review Article

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Authors

Lin Lawrence Guo

¹Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada
Stephen R. Pfohl

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Jason Fries

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Jose Posada

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Scott Lanyon Fleming

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Catherine Aftandilian

⁴Division of Pediatric Hematology/Oncology, Stanford University, Palo Alto, United States
Nigam Shah

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Lillian Sung

¹Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada

³Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Canada

Funding None.

Further Information

Also available at

Permissions and Reprints

Abstract

Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts.

Methods Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects.

Results Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination.

Conclusion There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

Keywords

dataset shift - machine learning - clinical data - systematic review

Note

L.S. is the Canada Research Chair in Pediatric Oncology Supportive Care.

Author Contributions

L.L.G. and L.S. supported in data acquisition and data analysis. All authors helped in study concepts and design, and data interpretation; involved in drafting the manuscript or revising it critically for important intellectual content; carried out the final approval of version to be published; and granted agreement to be accountable for all aspects of the work.

Protection of Human and Animal Subjects

As this study is a systematic review of primary studies, human and/or animal subjects were not included in the project.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Publication History

Received: 28 April 2021

Accepted: 12 July 2021

Article published online:
01 September 2021

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Challener DW, Prokop LJ, Abu-Saleh O. The proliferation of reports on clinical scoring systems: issues about uptake and clinical utility. JAMA 2019; 321 (24) 2405-2406

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Rajkomar A, Oren E, Chen K. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1: 18

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data 2019; 6 (01) 96

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Sendak MP, Balu S, Schulman KA. Barriers to Achieving Economies of Scale in Analysis of EHR Data. A Cautionary Tale. Appl Clin Inform 2017; 8 (03) 826-831

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
5 Cutillo CM, Sharma KR, Foschini L, Kundu S, Mackintosh M, Mandl KD. MI in Healthcare Workshop Working Group. Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med 2020; 3: 47

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Braithwaite J. Changing how we think about healthcare improvement. BMJ 2018; 361: k2014

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Johnson AE, Pollard TJ, Shen L. et al. MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3: 160035

Crossref PubMed Search in Google Scholar
Download RIS citation
8 National Center for Health Statistics. International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). Centers for Disease Control and Prevention. Accessed February 13, 2021 at: https://www.cdc.gov/nchs/icd/icd9cm.htm

Download RIS citation
9 Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F. A unifying view on dataset shift in classification. Pattern Recognit 2012; 45 (01) 521-530

Crossref Search in Google Scholar
Download RIS citation
10 Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28 (03) 231-237

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health 2020; 2 (09) e489-e492

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Comput Surv 2014; 46 (04) 1-37

Crossref Search in Google Scholar
Download RIS citation
13 Moher D, Shamseer L, Clarke M. et al; PRISMA-P Group. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev 2015; 4: 1

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Luo W, Phung D, Tran T. et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016; 18 (12) e323

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Cox DR. Two further applications of a model for binary regression. Biometrika 1958; 45 (3–4): 562-565

Crossref Search in Google Scholar
Download RIS citation
16 Davis SE, Greevy RA, Fonnesbeck C, Lasko TA, Walsh CG, Matheny ME. A nonparametric updating method to correct clinical prediction model drift. J Am Med Inform Assoc 2019; 26 (12) 1448-1457

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Siregar S, Nieboer D, Versteegh MIM, Steyerberg EW, Takkenberg JJM. Methods for updating a risk prediction model for cardiac surgery: a statistical primer. Review Interact Cardiovasc Thorac Surg 2019; 28 (03) 333-338

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Siregar S, Nieboer D, Vergouwe Y. et al. Improved prediction by dynamic modeling: an exploratory study in the adult cardiac surgery database of the netherlands association for cardio-thoracic surgery. Circ Cardiovasc Qual Outcomes 2016; 9 (02) 171-181

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Hickey GL, Grant SW, Caiado C. et al. Dynamic prediction modeling approaches for cardiac surgery. Circ Cardiovasc Qual Outcomes 2013; 6 (06) 649-658

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 2008; 61 (01) 76-86

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Parry G, Tucker J, Tarnow-Mordi W. UK Neonatal Staffing Study Collaborative Group. CRIB II: an update of the clinical risk index for babies score. Lancet 2003; 361 (9371): 1789-1791

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Adam GA, Chang C-HK, Haibe-Kains B, Goldenberg A. Hidden risks of machine learning applied to healthcare: unintended feedback loops between models and future data causing model degradation. Presented at: Proceedings of the 5th Machine Learning for Healthcare Conference; Proceedings of Machine Learning Research. Accessed 2020 at: http://proceedings.mlr.press

Download RIS citation
23 Su TL, Jaki T, Hickey GL, Buchan I, Sperrin M. A review of statistical updating methods for clinical prediction models. Stat Methods Med Res 2018; 27 (01) 185-197

Crossref PubMed Search in Google Scholar
Download RIS citation
24 Davis SE, Greevy RA, Lasko TA, Walsh CG, Matheny ME. Comparison of prediction model performance updating protocols: using a data-driven testing procedure to guide updating. AMIA Annu Symp Proc 2019. 2019: 1002-1010

PubMed Search in Google Scholar
Download RIS citation
25 Nestor B, McDermott MBA, Boag W. et al. Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. Presented at: Proceedings of the 4th Machine Learning for Healthcare Conference. Accessed 2019 at: http://proceedings.mlr.press

Download RIS citation
26 Nestor B, McDermott MBA, Chauhan G. et al. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation. Available at: arXiv:181112583 [csLG] . Accessed 2018

PubMed Search in Google Scholar
Download RIS citation
27 Strobl AN, Vickers AJ, Van Calster B. et al. Improving patient prostate cancer risk assessment: Moving from static, globally-applied to dynamic, practice-specific risk calculators. J Biomed Inform 2015; 56: 87-93

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Feng J. Learning how to approve updates to machine learning algorithms in non-stationary settings. Available at: arXiv preprint arXiv:201207278. Accessed 2020

PubMed Search in Google Scholar
Download RIS citation
29 Davis SE, Lasko TA, Chen G, Matheny ME. Calibration drift among regression and machine learning models for hospital mortality, AMIA Annual Symposium proceedings/AMIA Symposium. 2017;Annual Symposium proceedings. AMIA Symposium. 625-634 Accessed 2017 at: https://pubmed.ncbi.nlm.nih.gov/29854127/

Download RIS citation
30 Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc 2017; 24 (06) 1052-1061

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-D270

Crossref PubMed Search in Google Scholar
Download RIS citation
32 Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE Trans Knowl Data Eng 2018; 31 (12) 2346-2363

Search in Google Scholar
Download RIS citation
33 Zhang H, Dullerud N, Seyyed-Kalantari L, Morris Q, Joshi S, Ghassemi M. An empirical framework for domain generalization in clinical settings. Presented at: Proceedings of the Conference on Health, Inference, and Learning; Virtual Event, USA. Accessed 2021 at: https://doi.org/10.1145/3450439.3451878

Crossref
Download RIS citation
34 Quiñonero-Candela J, Sugiyama M, Ben-David S. et al. Dataset Shift in Machine Learning. MIT Press; 2008

Crossref Search in Google Scholar
Download RIS citation
35 Schölkopf B, Janzing D, Peters J, Sgouritsa E, Zhang K, Mooij J. On causal and anticausal learning. Presented at: Proceedings of the 29th International Coference on International Conference on Machine Learning; Edinburgh, Scotland. Accessed 2012 at: https://icml.cc/2012/papers/625.pdf

Download RIS citation
36 Subbaswamy A, Schulam P, Saria S. Preventing failures due to dataset shift: learning predictive models that transport. Presented at: International Conference on Artificial Intelligence and Statistics (AISTATS); Naha, Japan. Accessed 2019 at: http://proceedings.mlr.press

Download RIS citation
37 Heinze-Deml C, Peters J, Meinshausen N. Invariant causal prediction for nonlinear models. J Causal Inference 2018; 6 (02)

Crossref Search in Google Scholar
Download RIS citation
38 Arjovsky M, Bottou L, Gulrajani I, Lopez-Paz D. Invariant risk minimization. arXiv preprint arXiv:190702893. Accessed 2019 at: https://arxiv.org/abs/1907.02893

Download RIS citation
39 Liu VX, Bates DW, Wiens J, Shah NH. The number needed to benefit: estimating the value of predictive analytics in healthcare. J Am Med Inform Assoc 2019; 26 (12) 1655-1659

Crossref PubMed Search in Google Scholar
Download RIS citation
40 Sáez C, Gutiérrez-Sacristán A, Kohane I, García-Gómez JM, Avillach P. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience 2020; 9 (08) giaa079

Crossref PubMed Search in Google Scholar
Download RIS citation

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Authors

Abstract

Keywords

Note

Author Contributions

Protection of Human and Animal Subjects

Supplementary Material

Publication History

References