Keywords
Undergraduate Medical Education - clinical clerkship - documentation
Background and Significance
Background and Significance
Redundancy in clinical notes, whether due to auto-importing of preexisting data or
to “copy-and-paste” practices, has been recognized as a pitfall of electronic medical
records (EMRs).[1]
[2] The use of copy-and-paste is highly prevalent among physicians[3] and many notes contain redundant text.[4]
[5]
During clinical rotations, medical students write notes frequently. Surveys suggest
the use of auto-importing and copy-and-paste by medical students is essentially ubiquitous
and the vast majority of students have observed supervising residents and attending
physicians engaging in such practices.[6] This behavior perpetuates redundant documentation and reduces the potential for
student notes to serve as educational tools, which has been a recognized goal of medical
records for at least half a century.[7] Although not studied specifically in students, copy-and-paste practices have also
been associated with diagnostic errors and suboptimal treatment outcomes.[8]
[9]
Formal instruction on note writing during medical school may assist in breaking this
cycle. However, prior to embarking on such a significant curricular initiative, quantifying
the extent of redundancy in medical student notes and any impact on medical school
performance is needed to provide justification and to identify potential targets for
intervention. Therefore, using an established indicator of redundancy, we analyzed
a cohort of notes generated by students rotating through the inpatient portion of
their medicine clerkship during a single academic year at our institution.
Objectives
Based on prior work that focused on resident documentation,[10] we sought to assess whether student notes become more redundant over the course
of a given admission. Redundancy in notes, particularly in the assessment/plan section,
may indicate a relative lack of emphasis by students on solidifying via documentation
their understanding of the clinical course of patients they are following, which could
be reflected in measures of student performance. As such, we aimed to evaluate the
hypothesis that students with more redundant notes would have lower performance than
those with less redundant notes.
Methods
The study was approved by the Vanderbilt Medical Center Institutional Review Board.
Cohort of Student–Patient Interactions
On the medicine clerkship, students document their encounters with patients in the
form of an admission history and physical (H + P) and daily progress notes written
via free text into the formal EMR. These notes are automatically copied directly from
the EMR into a database maintained by the Department of Bioinformatics for instructional
purposes.[11] From this database, all inpatient notes written during the 2012 to 2013 academic
year were examined; this was the most recent year for which a complete set of notes
was available.
A student–patient interaction (SPI) was defined as a series of notes written by the
same student on a given patient during a single admission and consisted of: (1) an
admission H + P; and (2) at least two consecutive daily progress notes contiguous
with the H + P (i.e., H + P written on hospital day #1 with the progress notes written
on days #2 and 3). Therefore, an SPI contained at least three notes. SPIs involving
interservice transfers or intensive care unit patients were excluded due to the existence
of a considerable number of notes prior to the creation of the student H + P.
If a student had more than one eligible SPI, one was selected at random for the main
analysis. If a student had two eligible SPIs with one occurring during the first one-third
of the clerkship and the second occurring during the last-third, those SPI pairs formed
an additional cohort to assess changes in redundancy during the clerkship. For approximately
three-quarters of these pairs, one SPI occurred while the student was rotating on
a general medicine team and the other occurred during a subspecialty rotation. The
remainder of the pairs consisted of two subspecialty SPIs. The proportion of clerkship
completed at the time of an SPI was calculated as the interval between the day the
clerkship began and the day the H + P was written divided by the duration of the clerkship
(12 weeks). A rotation number (1–4) was designated based on when in the academic year
each student was assigned to the medicine rotation (i.e., 1 is the earliest 12-week
block and 4 is the latest).
Note Components and Comparisons
Comparisons of note characteristics were performed between discrete subsections, rather
than between entire notes as reported previously.[10] For the H + P, medical history/problem list, medications, allergies, family history,
and social history were included. The “initial progress note (IPN)” was defined as
the first progress note following the H + P (i.e., the note for hospital day #2).
The “next progress note (NPN)” was defined as the note following the IPN (i.e., the
note for hospital day #3). The key progress note sections were physical examination
and assessment/plan. For each SPI, a patient summary was also used in the analysis.
Note templates used by medical students auto-import data from the patient summary,
which contains components of the history that can be updated by anyone involved in
the patient's care. When a change is made to the patient summary, the new version
is saved. The patient summary most proximal to the hospitalization of interest was
used and individual subsections from this patient summary were compared to the analogous
subsections of the H + P. The specific note section pair comparisons are displayed
in [Table 1].
Table 1
Note types, sections, and comparisons
|
Current document
|
Prior document
|
Sections for comparison
|
|
History and physical exam
|
Patient summary
|
Medical history/Problem list
|
|
|
Medications
|
|
|
Allergies
|
|
|
Family history
|
|
|
Social history
|
|
Initial progress note
|
History and physical exam
|
Physical examination
|
|
|
Assessment/Plan
|
|
Next progress note
|
Initial progress note
|
Physical examination
|
|
|
Assessment/Plan
|
If an SPI contained more than two progress notes, then the nth and n – 1th notes were compared in similar fashion as the “NPN versus IPN.” Vital signs
were omitted from the analysis as these are inherently variable from day to day and
thus may systematically bias assessments of redundancy.
Identification and Selection of Note Sections
For a given note within each SPI, the sections of interest were identified by manual
review. Five reviewers, either 4th year medical students or senior-level nurse practitioner
students, were each assigned randomly to an equal portion of the SPIs. These reviewers
worked independently of one other and did not collaborate. Each note was loaded into
a modified version of the software platform PYBOSSA, an open-source tool used to manage
crowdsourcing efforts, within which custom software was used to mark the beginning
and end of a given note section.[12] The associated text was tagged and saved with the appropriate label, which incorporated
the note section, document type, and a unique identifier for the SPI. To facilitate
evaluation of interobserver reliability, each reviewer's task list was constructed
such that there was a degree of overlap with another reviewer.
Redundancy Metrics, Student Characteristics, and Performance Measures
The main redundancy metric was based on the Levenshtein edit distance, a parameter
that tabulates the number of additions, deletions, and substitutions needed to transform
one text string into another.[13] As changes to notes typically include additions or deletions of entire words rather
than individual characters in the same words, we used word-level edit distance for
this analysis. This edit distance can be normalized by the size (number of words)
of the longer string such that it takes on a value between 0 and 1, where a normalized
edit distance of 0 denotes that two strings are identical and a value of 1 indicates
that they are entirely unique. For two analogous note sections, we defined redundancy
as 1—normalized word-level edit distance—such that this term (multiplied by 100) represents
the percentage of content that the 2 sections have in common. This measure quantifies
redundancy of a note section pair from the same SPI (i.e., the physical examination
of the IPN and the physical examination of the H + P).
Student characteristics were obtained from the Office of Undergraduate Medical Education
(UME) and included date of birth, gender, and undergraduate major. Majors were grouped
into engineering or physical/biological science or nonengineering/science.
Objective measures of student performance were obtained from the Office of UME and
the Registrar's Office. These included Alpha Omega Alpha (AOA) status, medicine clerkship
grades (honors, high pass, pass, fail), medicine shelf exam scores, and United States
Medical Licensing Exam (USMLE) Step I and II clinical knowledge (CK) scores. Using
national USMLE data, raw scores were converted into percentiles. All SPIs analyzed
in this study occurred after the students took USMLE Step I, but prior to them taking
USMLE step II CK.
Statistical Analysis
Comparisons between two continuous variables were made with the Mann–Whitney test
and comparisons between more than two continuous variables were made with the Kruskal–Wallis
test as the pertinent covariates were not normally distributed. Comparisons of redundancy
between note section pairs were modeled as paired occurrences and thus the Wilcoxon
matched-pairs signed rank test was used. The Pearson's correlation coefficient was
used to assess the relationship between two continuous variables. These analyses were
completed in GraphPad Prism 7.02 (GraphPad Inc; La Jolla, California, United States);
p-value of < 0.05 was considered statistically significant.
Redundancy was used to evaluate interobserver reliability of note section identification.
As the reviewers were given identical training and instructions and worked independently,
all comparisons assessed by two reviewers were treated as if they were completed by
the same two reviewers. If the content of a given pair of note sections was selected
by two reviewers in the same manner, the two resultant redundancy values should be
identical. Therefore, for the subset of note section pairs that was assessed by two
reviewers, the redundancy values from the two reviews were used to calculate the intraclass
correlation coefficient (R Project version 3.5.1; R Foundation; Vienna, Austria).
Results
Cohort Characteristics
Ninety-four students contributed a single SPI and 58 of those students contributed
two SPIs. [Table 2] displays basic demographics, undergraduate major, and medical school performance
metrics. Students were typically in their mid-20s and slightly more than half were
female. Approximately three-quarters of students majored in engineering or a physical/biologic
science. Less than 20% of students were members of AOA and nearly all students earned
honors or high pass in the medicine clerkship. Scores for UMSLE Step I and II CK were
in the mid-upper 60th percentile.
Table 2
Student characteristics and medical school performance
|
Characteristic
|
Main cohort (n = 94)
|
|
Age at initial SPI (y)
|
26 ± 2
|
|
Female (%)
|
53
|
|
Undergraduate major (%)
|
|
|
Science/engineering
|
76
|
|
Nonscience/engineering
|
24
|
|
AOA society (%)
|
17
|
|
Medicine clerkship grade (%)
|
|
|
Honors
|
33
|
|
High pass
|
64
|
|
Medicine shelf exam score
|
81 ± 8
|
|
USMLE Step I
|
|
|
Score
|
238 ± 17
|
|
Percentile
|
68 ± 22
|
|
USMLE Step II CK
|
|
|
Score
|
247 ± 15
|
|
Percentile
|
63 ± 25
|
Abbreviations: AOA, Alpha Omega Alpha; CK, clinical knowledge; SPI, student–patient
interaction; USMLE, United States Medical Licensing Exam.
Note: Continuous covariates are mean ± standard deviation.
Interobserver Variability
A total of 135 pairs of redundancy values were used to calculate the intraclass correlation
coefficient. These included contributions from all the note section pair combinations
described above. The overall intraclass correlation coefficient was 0.80 and the 95%
confidence intervals were 0.73 to 0.86. This level of agreement is comparable to recent
studies of note redundancy.[14]
[15]
Redundancy between the Admission H + P and Patient Summary
The redundancy of comparable sections of the admission H + P and patient summary is
shown in [Fig. 1]. Most sections were approximately 50% redundant with the allergies section exceeding
60% redundancy. Sixteen SPIs did not have a preexisting patient summary as the admission
used for the SPI was the patient's first encounter with our medical center. Of note,
approximately 50% of the allergy section comparisons yielded completely redundant
results. Nearly 60% of family history comparisons and more than 30% of medical history
comparisons had greater than 80% redundancy.
Fig. 1 Redundancy between the admission history and physical and patient summary. Most sections
show redundancy of approximately 50% with the allergy section exceeding 60%. Sixteen
student–patient interactions (SPIs) did not have a patient summary as the admission
used for the SPI was the patient's first encounter with our medical center. Data are
displayed as mean ± standard deviation.
Redundancy over the Course of an SPI
The temporal trends of redundancy during an SPI for the physical examination (panel
A) and assessment/plan (panel B) sections are displayed in [Fig. 2]. For both sections, the redundancy between the IPN and the H + P is approximately
35 to 40%. Redundancy between the NPN and the IPN is significantly higher (p < 0.001) for both sections (nearly 80% for physical exam and nearly 60% for assessment/plan).
In the subset of SPIs (n = 43) that had a subsequent progress note (i.e., from hospital day #4), redundancy
between this note and the prior note was nearly identical to that between the next
and IPNs. When stratified by rotation number (1: n = 25; 2: n = 23; 3: n = 26; 4: n = 18), the increase in assessment/plan redundancy within an SPI remained highly significant
during all blocks. In addition, there was no difference in assessment/plan redundancy
based on rotation number for either note pair (i.e., IPN vs. H + P or NPN vs. IPN).
Fig. 2 Redundancy over the course of a student–patient interaction. For both the physical
examination (A) and the assessment/plan (B), redundancy increases significantly between the IPN/H + P and the NPN/IPN comparisons.
For the subset of student–patient interaction with an SPN (n = 43), the redundancy between the SPN and NPN is no different than the redundancy
between the NPN and IPN. Data are displayed as mean ± standard deviation. H + P, history
and physical examination; IPN, initial progress note; NPN, next progress note; SPN,
subsequent progress note.
Analysis of Paired SPIs
The “early” SPI in a given pair occurred after 26 ± 10% of the clerkship had elapsed
and the “late” SPI occurred 74 ± 7% into the rotation. For the early and late SPIs,
[Fig. 3] shows redundancy for the assessment/plan section of the IPN/H + P and the NPN/IPN.
Consistent with the main cohort results, redundancy increases significantly over time
for both the early and late SPIs (p < 0.001). Additionally, at both time points evaluated, redundancy is significantly
higher for the late SPI comparisons relative to the early SPI comparisons (by ∼30–40%;
p < 0.001 for both).
Fig. 3 Redundancy in student–patient interactions over the course of the medicine clerkship.
For the assessment/plan section, redundancy increases not only over the course of
a given student–patient interaction, but over the course of the clerkship as well.
Data are taken from 58 pairs of student–patient interactions that occurred after 26 ± 10%
(early) and 74 ± 7% (late) of the clerkship had elapsed. Data are displayed as mean ± standard
deviation. H + P, history and physical examination; IPN, initial progress note; NPN,
next progress note; SPN, subsequent progress note. *p < 0.001 compared to early IPN vs. H + P. †
p < 0.001 compared to early IPN vs. H + P. ‡
p < 0.001 compared to late IPN vs. H + P. §
p < 0.001 compared to early NPN vs. IPN.
Redundancy and Measures of Medical Student Performance
There was no significant difference between redundancy in the assessment/plan section
when stratified by AOA status, undergraduate major, or clerkship grade. Similarly,
the correlation between redundancy and shelf exam score was not significant.
[Fig. 4] illustrates the relationship between redundancy of the assessment/plan section of
the IPN/H + P comparison and USMLE Step II CK percentile. For the upper two-thirds
of scores, redundancy is essentially uniformly distributed. However, within the lowest
tertile of scores, there is a cluster of higher redundancy (panel A) and redundancy
is significantly higher in the lowest tertile than in the upper tertiles (67 ± 24%
vs. 38 ± 22%; p = 0.002) (panel B). Similar results were found for the assessment/plan section of
the NPN/IPN comparison. Comparable findings were not observed with USMLE Step I.
Fig. 4 Redundancy and USMLE Step II clinical knowledge scores. (A) There is a cluster of high-redundancy within the lowest tertile of USMLE Step II
clinical knowledge scores. The lowest tertile is to the left of the dashed vertical
line. (B) Redundancy within the lowest tertile is significantly higher than within the upper
two tertiles (p = 0.002). Redundancy is for the assessment/plan section of the initial progress note/history
and physical comparison. Data in (B) are displayed as mean ± standard deviation. CK, clinical knowledge; USMLE, United
States Medical Licensing Examination.
Discussion
This study quantifies the redundancy present in student notes over the course of patients'
time in the hospital and at different time points during the medicine clerkship. Exploratory
analyses suggest a potential association between redundancy and medical school performance.
While some redundancy in the H + P/patient summary comparison can be anticipated,[6]
[16] two sections suggest a degree of pervasiveness that may be counterproductive. In
nearly 50% of SPIs, the allergy section was entirely redundant, meaning that no changes
were made to the auto-imported text. Upon further review, most of these sections contained
information other than some variant of “no known drug allergies,” suggesting that
this information may not have been verified by the student, a phenomenon previously
reported in a cohort of dermatology residents.[17] More than 30% of the medical history/problem list sections had at least 80% redundancy,
meaning that a substantive portion of medical students are not actively constructing
patients' medical histories first-hand, an important skill to learn when trying to
formulate differential diagnoses for the present illness.
Increasing redundancy in the assessment/plan section as the hospital course progresses
could reflect the truly static nature of some SPIs. However, the comparisons examined
here are relatively early in the hospital stay when patient evaluations and status
can be at their most dynamic. For a given SPI, higher redundancy over time may reflect
the inherently greater amount of material upon which to draw for the next note, similar
to findings from analyses of house staff documentation.[10] Likewise, accumulated observations by students of documentation habits among house
staff and attending physicians may contribute to the increased redundancy seen later
in the clerkship relative to earlier. Increasing redundancy over the course of an
SPI was independent of exposure to prior rotations, suggesting that habits learned
on other services and “note fatigue” are not primarily responsible for this finding.
Students may be learning “efficiency” at the expense of honing clinical reasoning
through serial exposition of their patients' course. At the student level, such efficiency
is less of a priority as students typically follow a relatively small subset of their
teams' patients and only certain sections of student notes may be used for billing
and thus they are largely exempt from any billing-related requirements imposed upon
their content.
The findings regarding USMLE scores suggest that note-based metrics could compliment
traditional tools for identifying subgroups of students who may be struggling with
CK and its application. A similar relationship was not shown between note redundancy
and USMLE Step I, an exam geared more toward testing basic science principles, suggesting
that note analytics may be more reflective of clinical, rather than fact-based, knowledge.
Likewise, there was no association between note redundancy and AOA status, clerkship
grade, or shelf exam score. Subjective criteria are incorporated into selection for
AOA and these attributes may not be captured by note characteristics. Student evaluations
contain subjective elements, which may confound the relationship between clerkship
grades and note redundancy as well. The discordance between medicine shelf exam and
USMLE Step II CK scores with regards to their association with note redundancy is
more challenging to reconcile, as both are objective continuous measures of CK. A
potential explanation may lie in how each exam is perceived to impact applications
for residency. The score on the shelf exam contributes to the clerkship grade, which
in turn, influences the strength of an application for residency, particularly if
a student is applying in internal medicine. In our cohort, the USMLE Step II CK exam
was taken in the final year of medical school and the actual score may not have been
a part of, or factored substantively into, the residency application. Most students
took this exam either after residency interviews had concluded or proximal enough
to interviews such that scores were not available during the interview process, plausibly
leading to the perception that only the binary result (pass/fail) was ultimately important.
Thus, if students are more incentivized to prepare for the shelf exam, the results
may be out of proportion to the effort put forth in their notes, particularly for
lower-performing individuals. Similar logic can be applied to USMLE Step I, which
was taken by all students prior to beginning the 3rd year. Less preparation for the
“lower-stakes” USMLE Step II CK exam may permit note characteristics to provide a
more direct window into performance than is otherwise possible.
This study has several limitations. The main analysis was limited to a single SPI
during a single clerkship in a single academic year at a single institution. This
SPI may not be representative of a given student's note-writing tendencies, although
the paired SPI analysis mitigates this shortcoming to some extent. The analysis emphasized
the assessment/plan section for two main reasons: (1) it is the first section of a
note to be read and the section that providers spend the most time reading[18]; and (2) it was thought that this part of the note would be most reflective of idea
synthesis and integration. Students may have displayed these cognitive attributes
elsewhere in their notes, which would not have been reflected in our results. Similarly,
we did not analyze auto-imported lab and test results and whether these were interpreted
and integrated into the assessment/plan; such analysis could also provide insight
into students' thought processes. Furthermore, it is also possible that subtle changes
in the recorded assessment/plan, which would result in high redundancy, may have represented
the result of considerable synthetic and integrative thought/discussion. Redundancy
is a very basic metric and does not comment upon the conceptual complexity of note
content. Other metrics may more accurately capture such higher-level attributes. The
key measure of student performance used in this study, USMLE Step II CK score, is
not a perfect indicator of CK and reasoning, but it is available for all students
and is standardized, thus increasing the applicability to medical students across
the country. Our institution's EMR provides a “reuse” option, wherein the prior day's
note can be cloned to form a template for the current day's note. The usage rate of
this tool in our cohort is unknown, which limits the ability to estimate the degree
to which redundancy may have been enabled by the design of the EMR itself. However,
even if redundancy is entirely a surrogate for the “reuse” of prior notes, the range
of redundancy within the cohort suggests the response by students to the availability
of this feature is not uniform and thus the propensity to use it may provide insight
that compliments traditional methods of student evaluation.
Conclusion
This study demonstrates that, within the medicine clerkship, medical student notes
contain a high level of redundancy that increases over the course of a student's interaction
with a given patient and over the course of the clerkship for a given student. Furthermore,
high redundancy may indicate a relative deficiency in CK and/or reasoning, at least
as measured by some standardized testing. To further evaluate and potentially address
these observations, design and implementation of documentation-based initiatives in
UME seems reasonable.
Clinical Relevance Statement
Clinical Relevance Statement
This demonstration of high redundancy in medical student notes and the association
of note redundancy with lower scores on standardized tests of clinical knowledge/reasoning
may motivate modifications to medical school curricula that focus on dedicated instruction
in note writing. If this approach is implemented and found to be beneficial, the clinical
care provided by future physicians may be enhanced by improvements in clinical reasoning
and communication via more thoughtful and less redundant documentation.
Multiple Choice Questions
Multiple Choice Questions
-
When comparing information from the Patient Summary and the History and Physical Examination,
which section shows the greatest degree of redundancy?
-
Problem list.
-
Allergies.
-
Family history.
-
Social history.
Correct Answer: The correct answer is option b. The allergies section was > 60% redundant with the
other sections having approximately 50% redundancy.
-
With which indicator of student performance did note redundancy show a statistically
significant relationship?
Correct Answer: The correct answer is option d. Note redundancy was higher for those students that
scored in the bottom one-third on the USMLE Step II CK exam compared to those who
scored in the upper two-thirds. There were no statistically significant relationships
between note redundancy and the other indicators of student performance.