Keywords
confidentiality - patient portals - natural language processing - machine learning - health information exchange
Background and Significance
Background and Significance
The Information Blocking Final Rule of the 21st Century Cures Act mandates the timely, electronic release of a wide variety of health care data to patients.[1] This legislation represents a vital step in promoting health care information technology interoperability.[2] Furthermore, there is growing evidence that sharing health data with patients has several benefits including increased engagement, increased care plan adherence, and an improved patient experience.[3]
[4]
[5]
[6]
[7]
While there are significant benefits to information sharing with patients, careful consideration is required to protect privacy in the case of adolescents.[8]
[9]
[10] Providing confidential care for adolescents around sensitive topics such as substance use, sexual, and mental health is an important part of providing high-quality health care for this population and in many cases is mandated by state law.[11]
[12]
[13]
[14]
[15] Maintaining this confidentiality is critical to promoting an environment in which adolescent patients will communicate openly with providers and access essential care.[16]
[17]
[18]
As a result, information sharing as mandated by the 21st Century Cures Act must be implemented in a way that does not unintentionally disclose confidential information to adolescent patients' proxies without consent. This requires confidential information to be documented in the electronic health record (EHR) in a way that it can be segmented and withheld from release to parents or guardians.[19] In the case of clinical progress notes, confidential information may be documented in a separate note type.[20] Such an approach would yield two types of notes: a regular progress note that is shared with patient and proxy and an adolescent confidential note that is either not shared or shared only with the patient.[21] Like any change to workflow and documentation practices, provider education is an important part of the process. Additionally, health systems currently lack a scalable method for identifying and correcting inappropriate inclusion of confidential information in routine progress notes.
To address this issue, we sought to determine whether a natural language processing (NLP) algorithm can be developed to identify confidential content in adolescent progress notes in a way that is clinically relevant and useful for health care operations. In this manuscript, we demonstrate the development of that algorithm, its implementation into clinical operations, and results from a pilot intervention.
Methods
Dataset and Model Development
The following study was performed at a predominantly subspecialty outpatient pediatric network affiliated with a tertiary care academic children's hospital. To inform documentation changes in anticipation of the 21st Century Cures Act, a sample of outpatient progress notes from visits with adolescent patients were reviewed for confidential information. To perform this audit, 1,200 outpatient progress notes written between January 1, 2016 and December 31, 2019 for visits with patients aged 12 to 17 years were randomly sampled and then equally divided among a team of five physician reviewers (N.R., M.B., R.L.G., J.L.C., and K.E.M.). Physician reviewers were trained in the California adolescent confidentiality laws and annotated portions of the assigned clinical progress notes that contained confidential information. Both positive and negative references were determined to be confidential. The proportion of clinical progress notes containing confidential information was calculated. Further discussion of the methodological details around this annotation process including labeling rubric, interrater reliability, and summary of confidential content identified are summarized in a related manuscript.[22]
Since portions of the note were manually annotated, this process yielded two types of ground-truth labels for training and evaluation: a note-level label of whether the progress note as a whole contained any confidential information and sentence-level labels for the sentences that comprise the notes. Using an 80–20 training–test split, we evaluated two types of models: a note-based model and a sentence-based model.
The note-based model consists of a single logistic regression model that takes the note text and returns the probability that the note contains confidential information. For the note-based model, we featurized each note using unigram counts followed by a term frequency-inverse document frequency (TF-IDF) transformation.[23] We then fed those features into an L2 regularized logistic regression model to predict whether the note contains any confidential information.[24] Regularization strength was selected via cross validation on the training set.
The sentence-based model consists of two logistic regression models: one that takes a sentence and returns the probability that the sentence contains confidential information and another that takes all of the sentence-level probabilities from a note and generates a note-level probability estimate. To calculate sentence-level probabilities, we divided each note into “sentence” chunks of 10 tokens each. Each chunk was then featurized using unigram counts followed by a TF-IDF transformation. As before, those features were then used in an L2 regularized logistic regression model to predict whether that 10-token chunk contains any confidential information. As standard, we selected regularization strength using cross validation. This provided a logistic regression model that can predict whether a 10-token chunk contains confidential information. We then trained a second logistic regression model that transforms these sentence-level probabilities to an overall note probability estimate. To do this, we generated eleven features for each note by calculating the histogram of sentence-level probability estimates and calculating every 10th percentile (0th, 10th, 20th …, 80th, 90th, 100th). Those percentiles were fed into a logistic regression model and trained to predict the note-level label of whether the note as a whole contained any confidential content. By using a second model, as opposed to simply taking the maximum sentence-level probability from the first step, the system can take advantage of high-signal cases where a note has multiple high-probability sentences and thus a higher chance of containing confidential content.
Prospective Model Validation
To ensure the model would continue to perform adequately on recent notes, a prospective validation of the NLP modeling was performed. For this validation, 240 clinical progress notes were randomly sampled from visits occurring between May 1, 2022 and May 31, 2022 for patients aged 12 to 17 years. These notes were again assigned to the same five reviewers and annotated according to the same rules. The models trained above were applied to this dataset and note-level performance metrics were calculated.
Implementation into Clinical Operations and Pilot Intervention
Our system currently employs two note types used for adolescents: a regular progress note, which is being prepared for sharing with patient and proxy, and a confidential note type, which is not shared. The sentence-based logistic regression model was deployed into clinical operations as part of an ongoing documentation optimization quality improvement project at our institution that involves a clinic-by-clinic rollout in which providers are educated about adolescent confidentiality and nonconfidential outpatient progress note templates are optimized to decrease note bloat and prepare them for sharing with adolescents and their proxies. Prior to each educational intervention, a manual audit of the clinic's recent progress notes is performed to identify unintentional disclosures of confidential information and other high-risk documentation practices. Findings from the audit are used to inform the intervention.
The language processing model proposed in this manuscript was used to augment the documentation auditing for one of our subspecialty clinics. As part of this pilot intervention, clinical progress notes from patients ages 12 to 17 years seen in this clinic during the month of July 2022 were queried from our EHR database. The language model was applied to these notes to generate note-level risk estimates and to highlight high-risk portions of these notes to aid the manual reviewer.
A note-level threshold of 0.5 was applied to flag notes for potential review. The total number of notes and the number of notes exceeding the 0.5 risk score threshold was calculated per provider. This distribution was visualized to identify outlying providers, for whom the top 20 highest risk notes were manually reviewed. Within reviewed notes, the top 5% highest scoring sentences as identified by the sentence-based model were highlighted to expedite manual review.
Results
Development and Validation of Natural Language Processing Models
Descriptive statistics of the corpus of notes used to train and evaluate the language models and the prospective validation cohort are shown in [Table 1]. In the initial corpus of 1,200 notes, 255 (21%) contained confidential information. In the validation cohort, the prevalence of notes containing confidential content was 53 of 240 (22%).
Table 1
Demographic information for both the initial cohort of notes used to train and test the model and the prospective validation cohort
|
Initial cohort
|
Validation cohort
|
Total notes
|
1,200
|
240
|
Notes with confidential information
|
255 (21%)
|
53 (22%)
|
Patient sex
|
Female
|
621 (52%)
|
112 (47%)
|
Male
|
577 (48%)
|
128 (53%)
|
Unknown
|
2 (<1%)
|
0
|
Patient age (y)
|
12
|
178 (15%)
|
29 (12%)
|
13
|
209 (17%)
|
42 (18%)
|
14
|
215 (18%)
|
35 (16%)
|
15
|
202 (17%)
|
48 (20%)
|
16
|
193 (16%)
|
44 (18%)
|
17
|
203 (17%)
|
42 (18%)
|
Patient race
|
Asian
|
174 (15%)
|
32 (13%)
|
Black or African American
|
21 (2%)
|
5 (2%)
|
Native Hawaiian or Other Pacific Islander
|
11 (1%)
|
1 (<1%)
|
White or Caucasian
|
494 (41%)
|
81 (34%)
|
Other/unknown/declines
|
500 (42%)
|
121 (50%)
|
Patient language
|
English
|
1,007 (84%)
|
186 (78%)
|
Spanish
|
153 (13%)
|
47 (20%)
|
Other
|
40 (3%)
|
7 (3%)
|
Note: Components of certain categories may add up to more or less than 100% due to rounding.
The note-based logistic regression model achieved an area under the receiver operating characteristic (AUROC) curve of 0.88 on the initial corpus and 0.80 on the validation corpus and an F1 score of 0.72 and 0.55, respectively. The sentence-based logistic regression model achieved an AUROC of 0.90 and 0.88, respectively, and an F1 score of 0.76 and 0.67, respectively. These and other model performance metrics are summarized in [Table 2]. The sentence-based logistic regression receiver operating characteristic curve and precision–recall curve are shown in [Fig. 1].
Fig. 1 Note-level performance metrics, including (A) receiver operator curve and (B) precision–recall curve for the sentence-based model selected for implementation into clinical operations based on performance on the prospective validation set. AUC, area under curve; NLP, natural language processing.
Table 2
Summary of performance metrics of natural language processing models on the initial cohort test set and on the prospective validation set
|
Initial test cohort
|
Validation cohort
|
AUROC
|
AUPRC
|
F1 score
|
Sensitivity[a]
|
Specificity[a]
|
PPV[a]
|
AUROC
|
AUPRC
|
F1 score
|
Sensitivity[a]
|
Specificity[a]
|
PPV[a]
|
Note-based logistic regression
|
0.88
|
0.71
|
0.72
|
0.39
|
0.97
|
0.77
|
0.80
|
0.48
|
0.55
|
0.21
|
0.96
|
0.61
|
Sentence-based logistic regression
|
0.90
|
0.77
|
0.76
|
0.59
|
0.95
|
0.81
|
0.88
|
0.66
|
0.67
|
0.53
|
0.91
|
0.60
|
Abbreviations: AUPRC, area under the precision–recall curve; AUROC, area under the receiver operating characteristic; PPV, positive predictive value.
a Sensitivity, specificity, and PPV for logistic regression models are calculated at threshold of ŷ = 0.50.
In addition to providing classification at the level of the entire note, the sentence-based logistic regression model also calculates a risk score for each 10-word sentence in the note. This output can be used to highlight high-risk areas of a note to expedite manual review. An example of this is illustrated in [Fig. 2], which demonstrates how this feature might behave on synthetic note text data. The right pane shows high-risk excerpts highlighted by the algorithm from notes in the validation cohort. In our pilot intervention, we elected to highlight the top 5% highest risk sentences.
Fig. 2 (A) Example of synthetic text highlighted with the sentence-based model; (B) examples of sentences identified by the model on the validation set.
Implementation and Pilot Intervention
In the pilot intervention, the model was used to audit notes from one of our subspecialty clinics from July 2022. During this time, 264 notes were written by 15 different providers. Of these, 60 notes were flagged as high risk. The number of total notes and high-risk notes per provider is shown in [Fig. 3]. One provider, who had 29 flagged notes, was identified as an outlier. As a result, their highest risk 20 notes were selected for manual review. This review identified frequent use of an automated phrase populated with the tobacco-use history of the patient (pulled from a provider entry social history flowsheet). This documentation practice was noted and informed educational interventions.
Fig. 3 Summary of results from pilot intervention depicting the number of notes at high risk for containing confidential information by provider (anonymized).
Discussion
This study applies NLP to identify confidential content in adolescent progress notes and successfully demonstrates both computational feasibility and clinical utility in a pilot intervention. There are two notable takeaways from this study. First, this work demonstrates the feasibility of using NLP to optimize the technical implementation of information sharing with adolescent patients in a way that protects their confidentiality. This has far-reaching implications not just for adolescents, but for other sensitive topics in medicine as well such as reproductive and maternal–infant health.[19] Furthermore, this study serves as an example of a successful implementation of machine learning into clinical operations, with a human-in-the-loop approach that demonstrates efficiency gains in an otherwise burdensome manual task.
Promoting Adolescent Confidentiality
This study extends the body of work around promoting adolescent confidentiality in the wake of the 21st Century Cures Act and the information sharing mandate. Prior studies by Xie et al and Ip et al have used NLP algorithms to identify inappropriate health portal access by parents and guardians.[25]
[26] This occurs when a parent or guardian accesses the health portal with the patient's account instead of a proxy account. In a related study by Lee et al, keyword expansion was used to identify the prevalence of potentially confidential content in adolescent progress notes.[27] Similarly, Ni et al also demonstrated the feasibility of using a combination of language processing techniques to identify information around substance use among pediatric patients in their proof of concept study.[28] From an operational perspective, Murugan et al enumerated their experience from the “learning mode” deployment of a confidential adolescent note type. They found that the use of autopopulated note elements was a common source of unintentional confidential information disclosure.[29]
Much of this recent work has focused on characterizing and measuring the issues around adolescent confidentiality. These prior studies lay the foundation for our work in which we develop and test a potential, scalable solution to support adolescent information sharing using NLP. In its current form, the algorithm is used to more efficiently audit adolescent progress notes for unintentional disclosures of confidential content. Findings from these audits inform targeted feedback and educational interventions. As the algorithm is improved, it may eventually be used autonomously to support such efforts. We envision advancements from this line of work may enable health systems that are currently not releasing adolescent notes due to the technical feasibility exception of the mandate to begin sharing this information to their patients in the near future.[30]
Efficiency Gains from Algorithm Implementation
The human-in-the-loop implementation of our pilot intervention demonstrates the successful use of a machine learning algorithm to augment the efficiency of a manual task. While our algorithm does not operate autonomously, this method of implementation still yields important operational gains.
Consider the context in which our algorithm was deployed. Previously, as part of an ongoing documentation optimization program, a random sampling of notes was periodically obtained and reviewed by a group of physicians on a specialty-by-specialty basis. The manual review of clinical progress notes for confidential content is resource-limited by provider availability. Our study demonstrated that approximately 20% of these notes had confidential content. Compare this to the proposed NLP algorithm, in which sampling notes with a risk score of 0.5 or higher yields a positive predictive value (PPV) of greater than 60%, already a three times gain in efficiency in sampling for manual review.
Consider further that the sentence-highlighting feature of our algorithm also expedites the manual review of those selected notes. For example, if a health system had the capacity to manually review 20 notes in a given time period, this would have yielded 4 notes with confidential content previously. However, suppose that with the use of the proposed sentence-based algorithm, which expedites manual review, this capacity would increase to reviewing 40 notes in the same amount of time. If these notes were chosen in a risk-stratified way with a PPV of at least 60%, this would yield 24 notes with confidential content, a total of six times increase in efficiency in identifying notes with confidential content.
Natural Language Processing for Supporting Clinical Operations
Additionally, this study is also an illustration of the successful deployment of an NLP model to improve clinical operations and builds upon a growing body of literature in this space. Other examples of language models deployed to support care delivery include medical coding, clinical trial recruitment, and chatbots for medical triage and education.[31]
[32]
[33] Outside of the scientific literature there are also third-party software that employ NLP models to review records for purposes of billing, quality metrics, and clinical decision support.[34]
[35]
While the focus of our use case is adolescent confidentiality, there are lessons learned that are generalizable to other issues in clinical operations. For example, we observed that our language model, which was trained on featurized sentences (the “sentence-based model”), outperformed the “note-based model,” which was trained using the entire featurized note text. This suggests that the identification of confidential information is a relatively localized task, as in, it generally requires only looking at specific parts of a note.
Furthermore, the use of similar methods may be applied to other pressing issues in health information confidentiality, including protecting health information in the context of maternal–infant health and reproductive health, the latter being particularly pertinent in the wake of Dobbs v. Jackson.[19]
[36]
[37]
Limitations and Future Work
There are important limitations to this work. First, while the implementation of our algorithm promises efficiency gains in documentation optimization efforts, it is not yet accurate enough to operate autonomously and requires human input in its current form, which limits its scalability. Additionally, the model is limited by its sensitivity, 59 and 53% in our testing and prospective validation experiments, respectively (at a threshold of ŷ = 0.5). As a result, it cannot yet be used to identify all notes with confidential content. Additionally, because the algorithm was trained on data from a single site, it may be learning institutional patterns (e.g., relying on specific note templates or author writing styles) that are not generalizable to other settings. As such, the model may require retraining or fine-tuning at external sites. Similarly, because the training data were annotated using the California minor consent laws, this limits the algorithm's generalizability to states with differing rules.
Future work will focus on developing a data pipeline that will allow for the ongoing monitoring of clinical notes. We envision this may be achieved through the creation of a dashboard that visualizes risk scores from the model. Additionally, there is work ongoing to establish an operational workflow that will allow for the continued refinement of the model to combat data drift and improve its accuracy, including a process to identify and review misclassifications.
Conclusion
This study illustrates the development and successful implementation of an NLP algorithm to identify confidential content in adolescent progress notes. Our human-in-the-loop deployment into clinical operations demonstrates significant efficiency gains in the manual task of clinical note review. The proposed system shows promise as an operational solution to support the health information sharing with adolescents in a way that maintains patient confidentiality.
Clinical Relevance Statement
Clinical Relevance Statement
This manuscript describes the development and implementation of an NLP algorithm that has been directly deployed into health care operations to support information sharing and adolescent patient confidentiality. Our study includes elements of health information exchange, NLP, application/implementation of machine learning, and health IT regulation/policy with direct clinical relevance.
Multiple-Choice Questions
Multiple-Choice Questions
-
In California State, adolescents may consent to care relating to the following domains
Correct Answer: The correct answer is option d. Minor consent laws allow adolescents to consent to certain medical services without parent or guardian involvement. These laws vary state-by-state. In California, patients aged 12 and older may generally consent to care around reproductive/sexual health, mental health, and substance use without parental or guardian involvement.
-
Which of the following are an effective way of using machine learning to estimate whether a clinical note contains confidential content?
-
Training a logistic regression model to directly estimate whether a note contains confidential content
-
Training a logistic regression model to estimate whether a sentence contains confidential content and then taking the maximum probability for each sentence in the note
-
Training a logistic regression model to estimate whether a sentence contains confidential content and then feeding the probabilities into another logistic regression model
-
Training a deep neural network to estimate whether a note contains confidential content
Correct Answer: The correct answer is option c. As we have limited data, logistic regression models are generally the best option. And in that setting, sentence-level models tend to outperform note-level models. Finally, using a second logistic regression model to aggregate the probabilities from the sentence model yields better estimates of the note-level probability.
-
In what ways could information sharing through a health portal result in breach of adolescent confidentiality?
-
Parent/guardian may receive an explanation of benefits from insurance regarding confidential medical care
-
Confidential information may be inadvertently released to a proxy health portal account
-
Parents or guardians may overhear a confidential part of the patient visit from outside the room
-
Confidential information may be accidentally relayed to a parent/guardian by a phone encounter with clinic staff
Correct Answer: The correct answer is option b. Information sharing through the patient portal as mandated by the 21st Century Cures Act may cause unintentional breaches of adolescent confidentiality. As minors, adolescent patients cannot consent to all types of medical care. As a result, health portals for this population typically have two types of accounts: an account for the adolescent patient and a proxy account for the parent/guardian. Confidentiality may be breached if information is accidentally released to both the patient and proxy accounts. While choices c and d also explain potential ways that adolescent confidentiality may be breached, they are not related to information sharing through the electronic health portal.