Appl Clin Inform 2023; 14(03): 400-407
DOI: 10.1055/a-2051-9764
Adolescent Privacy and the Electronic Health Record

A Natural Language Processing Model to Identify Confidential Content in Adolescent Clinical Notes

Naveed Rabbani
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Michael Bedgood
2   California Department of Public Health, Richmond, California, United States
,
Conner Brown
3   Information Services Department, Lucile Packard Children's Hospital, Palo Alto, California, United States
,
Ethan Steinberg
4   Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, United States
5   Department of Computer Science, Stanford University, Stanford, California, United States
,
Rachel L. Goldstein
6   Division of Adolescent Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Jennifer L. Carlson
6   Division of Adolescent Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Natalie Pageler
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
,
Keith E. Morse
1   Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States
› Author Affiliations
Funding None.
 

Abstract

Background The 21st Century Cures Act mandates the immediate, electronic release of health information to patients. However, in the case of adolescents, special consideration is required to ensure that confidentiality is maintained. The detection of confidential content in clinical notes may support operational efforts to preserve adolescent confidentiality while implementing information sharing.

Objectives This study aimed to determine if a natural language processing (NLP) algorithm can identify confidential content in adolescent clinical progress notes.

Methods A total of 1,200 outpatient adolescent progress notes written between 2016 and 2019 were manually annotated to identify confidential content. Labeled sentences from this corpus were featurized and used to train a two-part logistic regression model, which provides both sentence-level and note-level probability estimates that a given text contains confidential content. This model was prospectively validated on a set of 240 progress notes written in May 2022. It was subsequently deployed in a pilot intervention to augment an ongoing operational effort to identify confidential content in progress notes. Note-level probability estimates were used to triage notes for review and sentence-level probability estimates were used to highlight high-risk portions of those notes to aid the manual reviewer.

Results The prevalence of notes containing confidential content was 21% (255/1,200) and 22% (53/240) in the train/test and validation cohorts, respectively. The ensemble logistic regression model achieved an area under the receiver operating characteristic of 90 and 88% in the test and validation cohorts, respectively. Its use in a pilot intervention identified outlier documentation practices and demonstrated efficiency gains over completely manual note review.

Conclusion An NLP algorithm can identify confidential content in progress notes with high accuracy. Its human-in-the-loop deployment in clinical operations augmented an ongoing operational effort to identify confidential content in adolescent progress notes. These findings suggest NLP may be used to support efforts to preserve adolescent confidentiality in the wake of the information blocking mandate.


#

Background and Significance

The Information Blocking Final Rule of the 21st Century Cures Act mandates the timely, electronic release of a wide variety of health care data to patients.[1] This legislation represents a vital step in promoting health care information technology interoperability.[2] Furthermore, there is growing evidence that sharing health data with patients has several benefits including increased engagement, increased care plan adherence, and an improved patient experience.[3] [4] [5] [6] [7]

While there are significant benefits to information sharing with patients, careful consideration is required to protect privacy in the case of adolescents.[8] [9] [10] Providing confidential care for adolescents around sensitive topics such as substance use, sexual, and mental health is an important part of providing high-quality health care for this population and in many cases is mandated by state law.[11] [12] [13] [14] [15] Maintaining this confidentiality is critical to promoting an environment in which adolescent patients will communicate openly with providers and access essential care.[16] [17] [18]

As a result, information sharing as mandated by the 21st Century Cures Act must be implemented in a way that does not unintentionally disclose confidential information to adolescent patients' proxies without consent. This requires confidential information to be documented in the electronic health record (EHR) in a way that it can be segmented and withheld from release to parents or guardians.[19] In the case of clinical progress notes, confidential information may be documented in a separate note type.[20] Such an approach would yield two types of notes: a regular progress note that is shared with patient and proxy and an adolescent confidential note that is either not shared or shared only with the patient.[21] Like any change to workflow and documentation practices, provider education is an important part of the process. Additionally, health systems currently lack a scalable method for identifying and correcting inappropriate inclusion of confidential information in routine progress notes.

To address this issue, we sought to determine whether a natural language processing (NLP) algorithm can be developed to identify confidential content in adolescent progress notes in a way that is clinically relevant and useful for health care operations. In this manuscript, we demonstrate the development of that algorithm, its implementation into clinical operations, and results from a pilot intervention.


#

Methods

Dataset and Model Development

The following study was performed at a predominantly subspecialty outpatient pediatric network affiliated with a tertiary care academic children's hospital. To inform documentation changes in anticipation of the 21st Century Cures Act, a sample of outpatient progress notes from visits with adolescent patients were reviewed for confidential information. To perform this audit, 1,200 outpatient progress notes written between January 1, 2016 and December 31, 2019 for visits with patients aged 12 to 17 years were randomly sampled and then equally divided among a team of five physician reviewers (N.R., M.B., R.L.G., J.L.C., and K.E.M.). Physician reviewers were trained in the California adolescent confidentiality laws and annotated portions of the assigned clinical progress notes that contained confidential information. Both positive and negative references were determined to be confidential. The proportion of clinical progress notes containing confidential information was calculated. Further discussion of the methodological details around this annotation process including labeling rubric, interrater reliability, and summary of confidential content identified are summarized in a related manuscript.[22]

Since portions of the note were manually annotated, this process yielded two types of ground-truth labels for training and evaluation: a note-level label of whether the progress note as a whole contained any confidential information and sentence-level labels for the sentences that comprise the notes. Using an 80–20 training–test split, we evaluated two types of models: a note-based model and a sentence-based model.

The note-based model consists of a single logistic regression model that takes the note text and returns the probability that the note contains confidential information. For the note-based model, we featurized each note using unigram counts followed by a term frequency-inverse document frequency (TF-IDF) transformation.[23] We then fed those features into an L2 regularized logistic regression model to predict whether the note contains any confidential information.[24] Regularization strength was selected via cross validation on the training set.

The sentence-based model consists of two logistic regression models: one that takes a sentence and returns the probability that the sentence contains confidential information and another that takes all of the sentence-level probabilities from a note and generates a note-level probability estimate. To calculate sentence-level probabilities, we divided each note into “sentence” chunks of 10 tokens each. Each chunk was then featurized using unigram counts followed by a TF-IDF transformation. As before, those features were then used in an L2 regularized logistic regression model to predict whether that 10-token chunk contains any confidential information. As standard, we selected regularization strength using cross validation. This provided a logistic regression model that can predict whether a 10-token chunk contains confidential information. We then trained a second logistic regression model that transforms these sentence-level probabilities to an overall note probability estimate. To do this, we generated eleven features for each note by calculating the histogram of sentence-level probability estimates and calculating every 10th percentile (0th, 10th, 20th …, 80th, 90th, 100th). Those percentiles were fed into a logistic regression model and trained to predict the note-level label of whether the note as a whole contained any confidential content. By using a second model, as opposed to simply taking the maximum sentence-level probability from the first step, the system can take advantage of high-signal cases where a note has multiple high-probability sentences and thus a higher chance of containing confidential content.


#

Prospective Model Validation

To ensure the model would continue to perform adequately on recent notes, a prospective validation of the NLP modeling was performed. For this validation, 240 clinical progress notes were randomly sampled from visits occurring between May 1, 2022 and May 31, 2022 for patients aged 12 to 17 years. These notes were again assigned to the same five reviewers and annotated according to the same rules. The models trained above were applied to this dataset and note-level performance metrics were calculated.


#

Implementation into Clinical Operations and Pilot Intervention

Our system currently employs two note types used for adolescents: a regular progress note, which is being prepared for sharing with patient and proxy, and a confidential note type, which is not shared. The sentence-based logistic regression model was deployed into clinical operations as part of an ongoing documentation optimization quality improvement project at our institution that involves a clinic-by-clinic rollout in which providers are educated about adolescent confidentiality and nonconfidential outpatient progress note templates are optimized to decrease note bloat and prepare them for sharing with adolescents and their proxies. Prior to each educational intervention, a manual audit of the clinic's recent progress notes is performed to identify unintentional disclosures of confidential information and other high-risk documentation practices. Findings from the audit are used to inform the intervention.

The language processing model proposed in this manuscript was used to augment the documentation auditing for one of our subspecialty clinics. As part of this pilot intervention, clinical progress notes from patients ages 12 to 17 years seen in this clinic during the month of July 2022 were queried from our EHR database. The language model was applied to these notes to generate note-level risk estimates and to highlight high-risk portions of these notes to aid the manual reviewer.

A note-level threshold of 0.5 was applied to flag notes for potential review. The total number of notes and the number of notes exceeding the 0.5 risk score threshold was calculated per provider. This distribution was visualized to identify outlying providers, for whom the top 20 highest risk notes were manually reviewed. Within reviewed notes, the top 5% highest scoring sentences as identified by the sentence-based model were highlighted to expedite manual review.


#
#

Results

Development and Validation of Natural Language Processing Models

Descriptive statistics of the corpus of notes used to train and evaluate the language models and the prospective validation cohort are shown in [Table 1]. In the initial corpus of 1,200 notes, 255 (21%) contained confidential information. In the validation cohort, the prevalence of notes containing confidential content was 53 of 240 (22%).

Table 1

Demographic information for both the initial cohort of notes used to train and test the model and the prospective validation cohort

Initial cohort

Validation cohort

Total notes

1,200

240

Notes with confidential information

255 (21%)

53 (22%)

Patient sex

 Female

621 (52%)

112 (47%)

 Male

577 (48%)

128 (53%)

 Unknown

2 (<1%)

0

Patient age (y)

 12

178 (15%)

29 (12%)

 13

209 (17%)

42 (18%)

 14

215 (18%)

35 (16%)

 15

202 (17%)

48 (20%)

 16

193 (16%)

44 (18%)

 17

203 (17%)

42 (18%)

Patient race

 Asian

174 (15%)

32 (13%)

 Black or African American

21 (2%)

5 (2%)

 Native Hawaiian or Other Pacific Islander

11 (1%)

1 (<1%)

 White or Caucasian

494 (41%)

81 (34%)

 Other/unknown/declines

500 (42%)

121 (50%)

Patient language

 English

1,007 (84%)

186 (78%)

 Spanish

153 (13%)

47 (20%)

 Other

40 (3%)

7 (3%)

Note: Components of certain categories may add up to more or less than 100% due to rounding.


The note-based logistic regression model achieved an area under the receiver operating characteristic (AUROC) curve of 0.88 on the initial corpus and 0.80 on the validation corpus and an F1 score of 0.72 and 0.55, respectively. The sentence-based logistic regression model achieved an AUROC of 0.90 and 0.88, respectively, and an F1 score of 0.76 and 0.67, respectively. These and other model performance metrics are summarized in [Table 2]. The sentence-based logistic regression receiver operating characteristic curve and precision–recall curve are shown in [Fig. 1].

Zoom Image
Fig. 1 Note-level performance metrics, including (A) receiver operator curve and (B) precision–recall curve for the sentence-based model selected for implementation into clinical operations based on performance on the prospective validation set. AUC, area under curve; NLP, natural language processing.
Table 2

Summary of performance metrics of natural language processing models on the initial cohort test set and on the prospective validation set

Initial test cohort

Validation cohort

AUROC

AUPRC

F1 score

Sensitivity[a]

Specificity[a]

PPV[a]

AUROC

AUPRC

F1 score

Sensitivity[a]

Specificity[a]

PPV[a]

Note-based logistic regression

0.88

0.71

0.72

0.39

0.97

0.77

0.80

0.48

0.55

0.21

0.96

0.61

Sentence-based logistic regression

0.90

0.77

0.76

0.59

0.95

0.81

0.88

0.66

0.67

0.53

0.91

0.60

Abbreviations: AUPRC, area under the precision–recall curve; AUROC, area under the receiver operating characteristic; PPV, positive predictive value.


a Sensitivity, specificity, and PPV for logistic regression models are calculated at threshold of ŷ = 0.50.


In addition to providing classification at the level of the entire note, the sentence-based logistic regression model also calculates a risk score for each 10-word sentence in the note. This output can be used to highlight high-risk areas of a note to expedite manual review. An example of this is illustrated in [Fig. 2], which demonstrates how this feature might behave on synthetic note text data. The right pane shows high-risk excerpts highlighted by the algorithm from notes in the validation cohort. In our pilot intervention, we elected to highlight the top 5% highest risk sentences.

Zoom Image
Fig. 2 (A) Example of synthetic text highlighted with the sentence-based model; (B) examples of sentences identified by the model on the validation set.

#

Implementation and Pilot Intervention

In the pilot intervention, the model was used to audit notes from one of our subspecialty clinics from July 2022. During this time, 264 notes were written by 15 different providers. Of these, 60 notes were flagged as high risk. The number of total notes and high-risk notes per provider is shown in [Fig. 3]. One provider, who had 29 flagged notes, was identified as an outlier. As a result, their highest risk 20 notes were selected for manual review. This review identified frequent use of an automated phrase populated with the tobacco-use history of the patient (pulled from a provider entry social history flowsheet). This documentation practice was noted and informed educational interventions.

Zoom Image
Fig. 3 Summary of results from pilot intervention depicting the number of notes at high risk for containing confidential information by provider (anonymized).

#
#

Discussion

This study applies NLP to identify confidential content in adolescent progress notes and successfully demonstrates both computational feasibility and clinical utility in a pilot intervention. There are two notable takeaways from this study. First, this work demonstrates the feasibility of using NLP to optimize the technical implementation of information sharing with adolescent patients in a way that protects their confidentiality. This has far-reaching implications not just for adolescents, but for other sensitive topics in medicine as well such as reproductive and maternal–infant health.[19] Furthermore, this study serves as an example of a successful implementation of machine learning into clinical operations, with a human-in-the-loop approach that demonstrates efficiency gains in an otherwise burdensome manual task.

Promoting Adolescent Confidentiality

This study extends the body of work around promoting adolescent confidentiality in the wake of the 21st Century Cures Act and the information sharing mandate. Prior studies by Xie et al and Ip et al have used NLP algorithms to identify inappropriate health portal access by parents and guardians.[25] [26] This occurs when a parent or guardian accesses the health portal with the patient's account instead of a proxy account. In a related study by Lee et al, keyword expansion was used to identify the prevalence of potentially confidential content in adolescent progress notes.[27] Similarly, Ni et al also demonstrated the feasibility of using a combination of language processing techniques to identify information around substance use among pediatric patients in their proof of concept study.[28] From an operational perspective, Murugan et al enumerated their experience from the “learning mode” deployment of a confidential adolescent note type. They found that the use of autopopulated note elements was a common source of unintentional confidential information disclosure.[29]

Much of this recent work has focused on characterizing and measuring the issues around adolescent confidentiality. These prior studies lay the foundation for our work in which we develop and test a potential, scalable solution to support adolescent information sharing using NLP. In its current form, the algorithm is used to more efficiently audit adolescent progress notes for unintentional disclosures of confidential content. Findings from these audits inform targeted feedback and educational interventions. As the algorithm is improved, it may eventually be used autonomously to support such efforts. We envision advancements from this line of work may enable health systems that are currently not releasing adolescent notes due to the technical feasibility exception of the mandate to begin sharing this information to their patients in the near future.[30]


#

Efficiency Gains from Algorithm Implementation

The human-in-the-loop implementation of our pilot intervention demonstrates the successful use of a machine learning algorithm to augment the efficiency of a manual task. While our algorithm does not operate autonomously, this method of implementation still yields important operational gains.

Consider the context in which our algorithm was deployed. Previously, as part of an ongoing documentation optimization program, a random sampling of notes was periodically obtained and reviewed by a group of physicians on a specialty-by-specialty basis. The manual review of clinical progress notes for confidential content is resource-limited by provider availability. Our study demonstrated that approximately 20% of these notes had confidential content. Compare this to the proposed NLP algorithm, in which sampling notes with a risk score of 0.5 or higher yields a positive predictive value (PPV) of greater than 60%, already a three times gain in efficiency in sampling for manual review.

Consider further that the sentence-highlighting feature of our algorithm also expedites the manual review of those selected notes. For example, if a health system had the capacity to manually review 20 notes in a given time period, this would have yielded 4 notes with confidential content previously. However, suppose that with the use of the proposed sentence-based algorithm, which expedites manual review, this capacity would increase to reviewing 40 notes in the same amount of time. If these notes were chosen in a risk-stratified way with a PPV of at least 60%, this would yield 24 notes with confidential content, a total of six times increase in efficiency in identifying notes with confidential content.


#

Natural Language Processing for Supporting Clinical Operations

Additionally, this study is also an illustration of the successful deployment of an NLP model to improve clinical operations and builds upon a growing body of literature in this space. Other examples of language models deployed to support care delivery include medical coding, clinical trial recruitment, and chatbots for medical triage and education.[31] [32] [33] Outside of the scientific literature there are also third-party software that employ NLP models to review records for purposes of billing, quality metrics, and clinical decision support.[34] [35]

While the focus of our use case is adolescent confidentiality, there are lessons learned that are generalizable to other issues in clinical operations. For example, we observed that our language model, which was trained on featurized sentences (the “sentence-based model”), outperformed the “note-based model,” which was trained using the entire featurized note text. This suggests that the identification of confidential information is a relatively localized task, as in, it generally requires only looking at specific parts of a note.

Furthermore, the use of similar methods may be applied to other pressing issues in health information confidentiality, including protecting health information in the context of maternal–infant health and reproductive health, the latter being particularly pertinent in the wake of Dobbs v. Jackson.[19] [36] [37]


#

Limitations and Future Work

There are important limitations to this work. First, while the implementation of our algorithm promises efficiency gains in documentation optimization efforts, it is not yet accurate enough to operate autonomously and requires human input in its current form, which limits its scalability. Additionally, the model is limited by its sensitivity, 59 and 53% in our testing and prospective validation experiments, respectively (at a threshold of ŷ = 0.5). As a result, it cannot yet be used to identify all notes with confidential content. Additionally, because the algorithm was trained on data from a single site, it may be learning institutional patterns (e.g., relying on specific note templates or author writing styles) that are not generalizable to other settings. As such, the model may require retraining or fine-tuning at external sites. Similarly, because the training data were annotated using the California minor consent laws, this limits the algorithm's generalizability to states with differing rules.

Future work will focus on developing a data pipeline that will allow for the ongoing monitoring of clinical notes. We envision this may be achieved through the creation of a dashboard that visualizes risk scores from the model. Additionally, there is work ongoing to establish an operational workflow that will allow for the continued refinement of the model to combat data drift and improve its accuracy, including a process to identify and review misclassifications.


#
#

Conclusion

This study illustrates the development and successful implementation of an NLP algorithm to identify confidential content in adolescent progress notes. Our human-in-the-loop deployment into clinical operations demonstrates significant efficiency gains in the manual task of clinical note review. The proposed system shows promise as an operational solution to support the health information sharing with adolescents in a way that maintains patient confidentiality.


#

Clinical Relevance Statement

This manuscript describes the development and implementation of an NLP algorithm that has been directly deployed into health care operations to support information sharing and adolescent patient confidentiality. Our study includes elements of health information exchange, NLP, application/implementation of machine learning, and health IT regulation/policy with direct clinical relevance.


#

Multiple-Choice Questions

  1. In California State, adolescents may consent to care relating to the following domains

    • Sexual/reproductive health

    • Mental health

    • Substance use

    • All of the above

    Correct Answer: The correct answer is option d. Minor consent laws allow adolescents to consent to certain medical services without parent or guardian involvement. These laws vary state-by-state. In California, patients aged 12 and older may generally consent to care around reproductive/sexual health, mental health, and substance use without parental or guardian involvement.

  2. Which of the following are an effective way of using machine learning to estimate whether a clinical note contains confidential content?

    • Training a logistic regression model to directly estimate whether a note contains confidential content

    • Training a logistic regression model to estimate whether a sentence contains confidential content and then taking the maximum probability for each sentence in the note

    • Training a logistic regression model to estimate whether a sentence contains confidential content and then feeding the probabilities into another logistic regression model

    • Training a deep neural network to estimate whether a note contains confidential content

    Correct Answer: The correct answer is option c. As we have limited data, logistic regression models are generally the best option. And in that setting, sentence-level models tend to outperform note-level models. Finally, using a second logistic regression model to aggregate the probabilities from the sentence model yields better estimates of the note-level probability.

  3. In what ways could information sharing through a health portal result in breach of adolescent confidentiality?

    • Parent/guardian may receive an explanation of benefits from insurance regarding confidential medical care

    • Confidential information may be inadvertently released to a proxy health portal account

    • Parents or guardians may overhear a confidential part of the patient visit from outside the room

    • Confidential information may be accidentally relayed to a parent/guardian by a phone encounter with clinic staff

    Correct Answer: The correct answer is option b. Information sharing through the patient portal as mandated by the 21st Century Cures Act may cause unintentional breaches of adolescent confidentiality. As minors, adolescent patients cannot consent to all types of medical care. As a result, health portals for this population typically have two types of accounts: an account for the adolescent patient and a proxy account for the parent/guardian. Confidentiality may be breached if information is accidentally released to both the patient and proxy accounts. While choices c and d also explain potential ways that adolescent confidentiality may be breached, they are not related to information sharing through the electronic health portal.


#
#

Conflict of Interest

None declared.

Protection of Human and Animal Subjects

The presented work was performed as part of a quality improvement effort at our institution and does not qualify as human subjects research.



Address for correspondence

Naveed Rabbani, MD
Department of Pediatrics, Stanford University School of Medicine
453 Quarry Road, Stanford, CA 94304
United States   

Publication History

Received: 12 October 2022

Accepted: 01 March 2023

Accepted Manuscript online:
10 March 2023

Article published online:
24 May 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 Note-level performance metrics, including (A) receiver operator curve and (B) precision–recall curve for the sentence-based model selected for implementation into clinical operations based on performance on the prospective validation set. AUC, area under curve; NLP, natural language processing.
Zoom Image
Fig. 2 (A) Example of synthetic text highlighted with the sentence-based model; (B) examples of sentences identified by the model on the validation set.
Zoom Image
Fig. 3 Summary of results from pilot intervention depicting the number of notes at high risk for containing confidential information by provider (anonymized).