Appl Clin Inform 2023; 14(03): 400-407
DOI: 10.1055/a-2051-9764
Adolescent Privacy and the Electronic Health Record

A Natural Language Processing Model to Identify Confidential Content in Adolescent Clinical Notes

Naveed Rabbani
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Michael Bedgood
2   California Department of Public Health, Richmond, California, United States
,
Conner Brown
3   Information Services Department, Lucile Packard Children's Hospital, Palo Alto, California, United States
,
Ethan Steinberg
4   Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, United States
5   Department of Computer Science, Stanford University, Stanford, California, United States
,
Rachel L. Goldstein
6   Division of Adolescent Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Jennifer L. Carlson
6   Division of Adolescent Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Natalie Pageler
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Keith E. Morse
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
› Author Affiliations
Funding None.

Abstract

Background The 21st Century Cures Act mandates the immediate, electronic release of health information to patients. However, in the case of adolescents, special consideration is required to ensure that confidentiality is maintained. The detection of confidential content in clinical notes may support operational efforts to preserve adolescent confidentiality while implementing information sharing.

Objectives This study aimed to determine if a natural language processing (NLP) algorithm can identify confidential content in adolescent clinical progress notes.

Methods A total of 1,200 outpatient adolescent progress notes written between 2016 and 2019 were manually annotated to identify confidential content. Labeled sentences from this corpus were featurized and used to train a two-part logistic regression model, which provides both sentence-level and note-level probability estimates that a given text contains confidential content. This model was prospectively validated on a set of 240 progress notes written in May 2022. It was subsequently deployed in a pilot intervention to augment an ongoing operational effort to identify confidential content in progress notes. Note-level probability estimates were used to triage notes for review and sentence-level probability estimates were used to highlight high-risk portions of those notes to aid the manual reviewer.

Results The prevalence of notes containing confidential content was 21% (255/1,200) and 22% (53/240) in the train/test and validation cohorts, respectively. The ensemble logistic regression model achieved an area under the receiver operating characteristic of 90 and 88% in the test and validation cohorts, respectively. Its use in a pilot intervention identified outlier documentation practices and demonstrated efficiency gains over completely manual note review.

Conclusion An NLP algorithm can identify confidential content in progress notes with high accuracy. Its human-in-the-loop deployment in clinical operations augmented an ongoing operational effort to identify confidential content in adolescent progress notes. These findings suggest NLP may be used to support efforts to preserve adolescent confidentiality in the wake of the information blocking mandate.

Protection of Human and Animal Subjects

The presented work was performed as part of a quality improvement effort at our institution and does not qualify as human subjects research.




Publication History

Received: 12 October 2022

Accepted: 01 March 2023

Accepted Manuscript online:
10 March 2023

Article published online:
24 May 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany