Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model

Keith E. Morse; Conner Brown; Scott Fleming; Irene Todd; Austin Powell; Alton Russell; David Scheinker; Scott M. Sutherland; Jonathan Lu; Brendan Watkins; Nigam H. Shah; Natalie M. Pageler; Jonathan P. Palma

doi:10.1055/s-0042-1746168

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Appl Clin Inform 2022; 13(02): 431-438
DOI: 10.1055/s-0042-1746168

Research Article

Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model

Keith E. Morse

¹Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States

,

Conner Brown

²Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States

,

Scott Fleming

³Department of Biomedical Data Science, Stanford University, Palo Alto, California, United States

,

Irene Todd

²Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States

,

Austin Powell

²Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States

,

Alton Russell

⁴Harvard Medical School, Boston, Massachusetts, United States

,

David Scheinker

²Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States

,

Scott M. Sutherland

⁵Division of Nephrology, Department of Pediatrics, Stanford University, Stanford, California, United States

,

Jonathan Lu

³Department of Biomedical Data Science, Stanford University, Palo Alto, California, United States

,

Brendan Watkins

²Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States

,

Nigam H. Shah

³Department of Biomedical Data Science, Stanford University, Palo Alto, California, United States

,

Natalie M. Pageler

⁶Division of Pediatric Critical Care Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States

⁷Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States

,

Jonathan P. Palma

⁸Division of Neonatology, Department of Pediatrics, Orlando Health, Orlando, Florida, United States

› Author Affiliations Funding None.

› Further Information

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Abstract

Objective The purpose of this study is to evaluate the ability of three metrics to monitor for a reduction in performance of a chronic kidney disease (CKD) model deployed at a pediatric hospital.

Methods The CKD risk model estimates a patient's risk of developing CKD 3 to 12 months following an inpatient admission. The model was developed on a retrospective dataset of 4,879 admissions from 2014 to 2018, then run silently on 1,270 admissions from April to October, 2019. Three metrics were used to monitor its performance during the silent phase: (1) standardized mean differences (SMDs); (2) performance of a “membership model”; and (3) response distribution analysis. Observed patient outcomes for the 1,270 admissions were used to calculate prospective model performance and the ability of the three metrics to detect performance changes.

Results The deployed model had an area under the receiver-operator curve (AUROC) of 0.63 in the prospective evaluation, which was a significant decrease from an AUROC of 0.76 on retrospective data (p = 0.033). Among the three metrics, SMDs were significantly different for 66/75 (88%) of the model's input variables (p <0.05) between retrospective and deployment data. The membership model was able to discriminate between the two settings (AUROC = 0.71, p <0.0001) and the response distributions were significantly different (p <0.0001) for the two settings.

Conclusion This study suggests that the three metrics examined could provide early indication of performance deterioration in deployed models' performance.

Keywords

learning algorithm - electronic health record - monitoring - machine learning - safety

Projection of Human and Animal Subjects

This project was reviewed and approved by the Stanford University Institutional Review Board.

Supplementary Material

Supplementary Material

Publication History

Received: 13 September 2021

Accepted: 01 March 2022

Article published online:
04 May 2022

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med 2019; 380 (14) 1347-1358

Crossref PubMed Google Scholar
2 Bates DW, Auerbach A, Schulam P, Wright A, Saria S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann Intern Med 2020; 172 (11) S137-S144

Crossref PubMed Google Scholar
3 Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28 (03) 231-237

Crossref PubMed Google Scholar
4 Wong A, Otles E, Donnelly JP. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021; 181 (08) 1065-1070

Crossref PubMed Google Scholar
5 Sendak MP, Balu S, Schulman KA. Barriers to achieving economies of scale in analysis of EHR data. a cautionary tale. Appl Clin Inform 2017; 8 (03) 826-831

Article in Thieme Connect PubMed Google Scholar
6 Pencina MJ, Goldstein BA, D'Agostino RB. Prediction models—development, evaluation, and clinical application. N Engl J Med 2020; 382 (17) 1583-1586

Crossref PubMed Google Scholar
7 Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999; 130 (06) 515-524

Crossref PubMed Google Scholar
8 Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc 2017; 24 (06) 1052-1061

Crossref PubMed Google Scholar
9 Bedoya AD, Futoma J, Clement ME. et al. Machine learning for early detection of sepsis: an internal and temporal validation study. JAMIA Open 2020; 3 (02) 252-260

Crossref PubMed Google Scholar
10 Nestor B, McDermott MBA, Boag W. et al. Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. MLHC. Published online 2019. Accessed June 7, 2021 at: https://www.semanticscholar.org/paper/dcbf6137fe16b33c2e2d9258bd4a1e3cdabee48f

PubMed Google Scholar
11 Moons KGM, Kengne AP, Grobbee DE. et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98 (09) 691-698

Crossref PubMed Google Scholar
12 Finlayson SG, Subbaswamy A, Singh K. et al. The clinician and dataset shift in artificial intelligence. N Engl J Med 2021; 385 (03) 283-286

Crossref PubMed Google Scholar
13 Rabanser S, Günnemann S, Lipton ZC. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. arXiv. Published online October 29, 2018. Accessed June 1, 2021 at: http://arxiv.org/abs/1810.11953

PubMed Google Scholar
14 Bernardi L, Mavridis T, Estevez P. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com. Paper presented at: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '19. Association for Computing Machinery; 2019: 1743-1751

PubMed Google Scholar
15 Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68 (03) 279-289

Crossref PubMed Google Scholar
16 Kaddourah A, Basu RK, Bagshaw SM, Goldstein SL. AWARE Investigators. Epidemiology of acute kidney injury in critically ill children and young adults. N Engl J Med 2017; 376 (01) 11-20

Crossref PubMed Google Scholar
17 Coca SG, Singanamala S, Parikh CR. Chronic kidney disease after acute kidney injury: a systematic review and meta-analysis. Kidney Int 2012; 81 (05) 442-448

Crossref PubMed Google Scholar
18 Silver SA, Goldstein SL, Harel Z. et al. Ambulatory care after acute kidney injury: an opportunity to improve patient outcomes. Can J Kidney Health Dis 2015; 2: 36

Crossref PubMed Google Scholar
19 Kaspar CDW, Bholah R, Bunchman TE. A review of pediatric chronic kidney disease. Blood Purif 2016; 41 (1-3): 211-217

Crossref PubMed Google Scholar
20 Hogg RJ, Furth S, Lemley KV. et al; National Kidney Foundation's Kidney Disease Outcomes Quality Initiative. National Kidney Foundation's Kidney Disease Outcomes Quality Initiative clinical practice guidelines for chronic kidney disease in children and adolescents: evaluation, classification, and stratification. Pediatrics 2003; 111 (6 Pt 1): 1416-1421

Crossref PubMed Google Scholar
21 Goldstein SL, Jaber BL, Faubel S, Chawla LS. Acute Kidney Injury Advisory Group of American Society of Nephrology. AKI transition of care: a potential opportunity to detect and prevent CKD. Clin J Am Soc Nephrol 2013; 8 (03) 476-483

Crossref PubMed Google Scholar
22 Glenn D, Ocegueda S, Nazareth M. et al. The global pediatric nephrology workforce: a survey of the International Pediatric Nephrology Association. BMC Nephrol 2016; 17 (01) 83

Crossref PubMed Google Scholar
23 Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract 2012; 120 (04) c179-c184

Crossref PubMed Google Scholar
24 Schwartz GJ, Haycock GB, Edelmann Jr CM, Spitzer A. A simple estimate of glomerular filtration rate in children derived from body length and plasma creatinine. Pediatrics 1976; 58 (02) 259-263

Crossref PubMed Google Scholar
25 Faraone SV. Interpreting estimates of treatment effects: implications for managed care. P&T 2008; 33 (12) 700-711

PubMed Google Scholar
26 Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011; 46 (03) 399-424

Crossref PubMed Google Scholar
27 Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 2020; 21 (02) 345-352

PubMed Google Scholar
28 Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ 1995; 310 (6973): 170

Crossref PubMed Google Scholar
29 Lu JH, Callahan A, Patel BS, Morse KE, Dash D, Shah NH. Low adherence to existing model reporting guidelines by commonly used clinical prediction models. bioRxiv DOI: 10.1101/2021.07.21.21260282.

Crossref PubMed Google Scholar
30 Breck E, Cai S, Nielsen E, Salib M, Sculley D. The ML test score: A rubric for ML production readiness and technical debt reduction. Paper presented at: 2017 IEEE International Conference on Big Data (Big Data); 2017: 1123-1132

PubMed Google Scholar
31 Altman DG, Royston P. What do we mean by validating a prognostic model?. Stat Med 2000; 19 (04) 453-473

Crossref PubMed Google Scholar
32 Massengill SF, Ferris M. Chronic kidney disease in children and adolescents. Pediatr Rev 2014; 35 (01) 16-29

Crossref PubMed Google Scholar
33 Pencina MJ, D'Agostino Sr RB, D'Agostino Jr RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27 (02) 157-172

Crossref PubMed Google Scholar
34 Mišić VV, Rajaram K, Gabel E. A simulation-based evaluation of machine learning models for clinical decision support: application and analysis using hospital readmission. NPJ Digit Med 2021; 4 (01) 98

Crossref PubMed Google Scholar
35 Sethi SK, Bunchman T, Chakraborty R, Raina R. Pediatric acute kidney injury: new advances in the last decade. Kidney Res Clin Pract 2021; 40 (01) 40-51

Crossref PubMed Google Scholar
36 Goldstein SL, Kirkendall E, Nguyen H. et al. Electronic health record identification of nephrotoxin exposure and associated acute kidney injury. Pediatrics 2013; 132 (03) e756-e767

Crossref PubMed Google Scholar
37 Wang L, McGregor TL, Jones DP. et al. Electronic health record-based predictive models for acute kidney injury screening in pediatric inpatients. Pediatr Res 2017; 82 (03) 465-473

Crossref PubMed Google Scholar
38 Goldstein SL, Mottes T, Simpson K. et al. A sustained quality improvement program reduces nephrotoxic medication-associated acute kidney injury. Kidney Int 2016; 90 (01) 212-221

Crossref PubMed Google Scholar
39 Goldstein SL, Dahale D, Kirkendall ES. et al. A prospective multi-center quality improvement initiative (NINJA) indicates a reduction in nephrotoxic acute kidney injury in hospitalized children. Kidney Int 2020; 97 (03) 580-588

Crossref PubMed Google Scholar
40 Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int Suppl 2013; 3: 1-150

PubMed Google Scholar

Supplementary Material

Supplementary Material

Subscribe to RSS

Share / Bookmark

Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model

Abstract

Keywords

Projection of Human and Animal Subjects

Supplementary Material

Publication History

References