Keywords analytics - documentation burden - EHR system - clinical informatics
Background and Significance
Background and Significance
Usage analytics is an emerging source of data for measuring burden related to use of the electronic health record (EHR),[1 ] investigating the impacts of EHR use on patient care[2 ] and evaluating the impact of interventions[1 ]
[3 ] or events (e.g., COVID-19)[4 ] on EHR use. Analytics data from non-EHR care delivery tools such as nurse call systems and communication devices have also been used to measure and predict strain amongst clinicians.[5 ] For EHR burden, measures have traditionally focused on subjective measures through surveys,[6 ]
[7 ]
[8 ] including validated instruments for concepts like “burnout” (e.g., Mini-Z survey[9 ]). While these approaches allow for characterizing perceived burden and EHR experience,[10 ] discrepancies between what is estimated by end-users and what is observed in the system have been identified.[8 ] Moreover, it remains difficult to pinpoint the specific challenges that lead to the burden.[11 ]
[12 ] To identify the root causes of EHR-related burden and evaluate the impact of the solutions, usage log data will be a critical component toward enabling data-informed training[13 ] and optimization of the system.[14 ]
[15 ]
[16 ] Over the past decade, EHR usage logs have been increasingly used to gather metrics for documentation burden concepts[1 ] such as effort (e.g., after-hours work) and time (e.g., average time spent),[17 ]
[18 ] with a recent scoping review[1 ] reporting the use of this method of data collection within 80% of studies across a variety of hospital settings (e.g., general surgery,[19 ] primary care, and internal medicine[20 ]
[21 ]
[22 ]).
Previous research has demonstrated the need for validity and reliability within EHR-based metrics and reporting.[23 ]
[24 ] Therefore, before employing these tools to measure efficiency, understanding the reliability and validity of these tools is important. To ensure administrators and physicians are confident in using EHR usage log data, the validation process described in this case report is necessary. Given that each organization has unique workflow and documentation practices on the EHR system, it is important that the metrics are able to accurately provide a complete picture of the usage patterns of clinicians. A systematic review of EHR logs[25 ] used for measuring clinical activity demonstrated that only 22% of the 85 studies validated the accuracy of the findings. Currently, two approaches are used for validation: (1) action–activity mappings (by consulting EHR vendors, consensus between researchers, or direct observation) or (2) activity patterns or duration (using direct observation or using self-reported data). These “time–motion” studies[25 ] involve real-time observation and comparison of how long a function has been performed by a clinician (e.g., using a stopwatch or TimeCat[26 ]) against the usage log readings. Use of this method has been fairly successful in ophthalmology[27 ] and primary care[28 ] settings, among others.[29 ] However, time–motion studies require significant human resources and time to conduct the validation, and can be difficult to scale in organizations with many departments and unique workflows. The presence of an observer may also hinder the comfort levels of patients and providers within the clinical environment, particularly in a mental health setting. Moreover, given the current pandemic and the rise of telemedicine or distancing requirements for in-person care, it is not feasible nor appropriate to introduce an observer within the clinical environment. From a privacy perspective, recording physician's screens can also introduce a challenge for privacy preservation as identifying information of patient names, images, and diagnoses are unintentionally captured.[30 ] Thus, a safe, remote, and less invasive approach for validation of the usage log data of EHR systems is needed.
Objectives
In this case report, we introduce an approach using clinical test cases that was implemented to validate the usage log data for an integrated EHR system in use at a large academic teaching mental health hospital. While other organizations may have used this approach in their validation, to our knowledge, this is the first study that discusses this approach within the academic literature. This approach overcomes the limitations identified above in current validation approaches and provides an effective way to test out a large number of workflows with limited resources, and in a manner that is noninvasive to the clinical environment. We highlight the utility of this approach, including key considerations for applying this methodology and using EHR log data in practice.
Materials and Methods
The approach for validation of usage log data is composed of three phases ([Fig. 1 ]). The validation was conducted in an academic teaching mental health hospital located in Toronto, Ontario and approval was obtained from the organizational Quality Project Ethics Review Board.
Fig. 1 Overview of approach for usage log validation.
Phase 1: Creating Test Cases Based on Real-World Clinical Workflows
Given that the ultimate goal of EHR usage data is to accurately capture the usage patterns of clinicians using the EHR, a guiding principle of this stage was to develop use cases (hereafter called “test runs”) that mimicked as closely to real-world clinical workflows used by physicians at the organization. To create these test runs, we began by consulting the organization's EHR training modules that are available to all physicians on our intranet, and provide detailed line-by-line physician workflows of how to carry out common tasks within the EHR. Through consultation from senior clinical leadership including the Chief Medical Informatics Officer (T.T., who is also a practicing physician within the organization), physician super users of our EHR system, and our clinical informatics nurse, we identified common inpatient and outpatient physician workflows and reconciled any differences between training documents and real-world EHR usage. The clinical informatics nurse consulted is solely responsible for EHR education and training for all 400 physicians at our organization and hence well versed in physicians' use of the EHR. Using an agile, iterative approach, consultation with these aforementioned individuals helped develop test runs that closely resembled real-world physician workflows (see [Table 1 ] for example test runs; see the [Supplementary Material ] for all test runs [available in the online version]). Since organizations have unique clinical workflows, we wanted to ensure that these test cases were specific to our organization, and therefore did not consult the literature for this step.
Table 1
Example of a test run used for validation of the usage log data
Example inpatient test run: preparing for inpatient discharge
Example outpatient test run: regular follow-up at clinic
• Search and open MRN
• Add discharge diagnosis (bipolar disorder)
• Initiate a discharge order for patient, including the location of discharge
• Complete discharge summary note for the patient
• Search and open MRN
• Order “lorazepam”
• Complete an outpatient progress note
• Search and open another MRN
• Review laboratory results
• Complete discharge documentation for the patient
Abbreviations: MRN, medical record number.
Phase 2: Execution: Conducting the Clinical Workflows in the Test Environment
This involved setting up a dedicated mock physician account in a nonproduction version of the EHR system, so that all activity within this account was recorded. The only difference between the test environment used within this study and our organization's production environment is the absence of real patient data. EHR functionality and modules were identical between the two environments, and our vendor provided confirmation that the methods used to measure usage metrics were identical between the two environments. Mock patients within the test environment were developed by our clinical informatics nurses that mimic real patients with representative diagnoses, care plans, etc. The Research Assistant (RA) performed the test runs that were developed in phase 1, using pauses and variations in documentation time to resemble real-world interruptions and workflows.
For data collection, two complementary techniques were used: (1) a spreadsheet with automatic time-log capture[31 ] was developed: when each task within a test run was complete, the RA manually indicated completion in the respective cell, which then automatically logged the time of completion; (2) a screen recording tool was used: this method was used to re-measure the execution of the test runs through measuring the times by a stopwatch. The spreadsheet was used to easily calculate the time spent across each task of a test run (as well as the entire test run) and the screen recording allowed for retrospective review and confirmation of time spent.
We gathered EHR-usage data from the back-end analytics platform from the EHR vendor. By logging the exact date and time of the interactions in the test account, the EHR vendor extracted the most granular level of detail during that time period to allow for comparison. The following data were sent through a spreadsheet:
Phase 3: Data Analysis: Comparison between Usage Logs and Test Case Observations
To determine if the time extracted from the EHR back-end analytics platform was comparable to the data we collected through the spreadsheet and screen recording, we converted all time measures to seconds. We compared (1) time spent per patient for the following metrics: total time in EHR, documentation time, order time, chart review time, allergies time, problem and diagnosis time, and (2) counts for the following metrics: patients seen, notes written.
Utility was defined by the absolute differences and percent differences observed between the two methods. To explore the discrepancies between recorded values and those found within the analytics platform, the RA replicated the tasks and consulted with our vendor to identify the root causes.
Results
A total of 10 test runs were conducted by the RA (A.K.) across 3 days, with one system interruption reported in Run 8. Differences in measurements found between the two methods of data collection (i.e., RA recorded values and usage analytics platform) for time-based metrics averaged across 3 days are summarized in [Fig. 2 ] and count-based metrics are summarized in [Fig 3 ]. Results of independent t -tests performed for all eight metrics are highlighted in [Supplementary Table S1 ]; however, it should be noted that these results are based on one user with 3 days of data. A summary of measurements extracted from the analytics platform compared with recorded values is outlined in [Supplementary Table S2 ] (available in the online version). The percent difference between measurements recorded by the RA and the usage analytics platform ranged from 9 to 60%. The discrepancies observed in time in EHR and order time in EHR were relatively small (<20%). Of the 3 concurrent days of data collection of documentation time in EHR and chart review time per EHR metrics, findings from one of these days yielded large % differences (57–60%) between time captured by the usage analytics log and our spreadsheet.
Fig. 2 Measurements recorded by usage analytics platform and test case observations for time-based metrics (averaged across 3 days).
Discussion
Validation of metrics is often considered a barrier toward the full uptake and use of usage log data to support characterization and mitigation of the EHR burden. This is mainly due to the resource and time required to conduct robust time validation studies, hindering its practical execution. This work outlines a feasible and resource-efficient approach for validating usage log data for use in practice. Previous validation studies using screen recordings of EHR sessions have demonstrated that metrics such as total time spent time within the EHR correlated strongly with observed metrics (r = 0.98, p < 0.001), where each minute of audit-log-based time corresponded to 0.95 minutes of observed time[32 ] across a variety of provider roles. Other research within ophthalmology clinics validating EHR log data using time–motion observations have demonstrated that 65 to 69% of the total EHR interaction estimates were ≤3 minutes from the observed timing, and 17 to 24% of the EHR interaction estimates >5 minutes from the observed timing.[33 ]
Lessons Learned
We learned the following lessons from our technique:
Ensure partnership with EHR vendor : this approach allowed for a collaborative review of discrepancies by both organization and vendor. We identified some discrepancies amongst our metrics after splitting it by the reported number of patients seen. Upon review with our vendor developers, the number of patients seen is only recorded if they only complete a certain action (e.g., completing a form as opposed to an actual documentation). Additionally, when looking at our after-hours metric, the hours were predetermined by our vendor (6 p.m.–6 a.m.), and this time frame might be different from physician's work hours. This helped us explain why some values were overinflated against our own measurements, and also helped us brainstorm other situations (e.g., physicians signing on resident's notes) that might impact the time spent.
Iteratively test and maximize data transparency : the resource-efficient approach of this method allows us to adjust the level of depth of our workflows and repeat the runs as necessary (e.g., after EHR upgrades). We began our validation with less complex test runs that repeated the essential tasks in a very controlled setting (results not shown here) and gradually increased to more sophisticated test runs as it aligns to our metrics. Thus, this method allows for a step-wise, controlled validation approach that can be embedded as part of the implementation lifecycle. The screen-recording helped us explore the rationales for the discrepancies. These recordings also allow for transparency and replicability of the results.
Interpret results appropriately
: our findings provide support of the accuracy of the usage log data to our clinical workflows. Previous studies have reported variations of overestimations of 43% (4.3 vs. 3.0 minutes) to underestimations by 33% (2.4 ± 1.7 vs. 1.6 ± 1.2 minutes).[25 ] For most of our metrics, the discrepancy between the usage analytics platform and our observation was fairly consistent across the 3 days of test runs, which suggests that the metrics are consistently calculated on a day-to-day basis. For large differences within our data (e.g., 124% underestimation of Allergies time), it is important to note that these differences can be amplified when small amounts of time are spent carrying out a particular task (i.e., <30 seconds/patient). Moreover, there is a slight variation in the back-end calculation of certain metrics where it might appear that the user is “documenting.” In these cases, back-end timers may have counted this task within a different area of the chart since the documentation sat within a larger section (e.g., a workflow page). While it is very difficult to obtain very high accuracy as per the nature of log data (e.g., when mouse stops moving),[34 ] the presence of these results provided confidence on the level of accuracy (i.e., range of error) we can expect in practice. When differences between observed and measured time are vast, organizations can still make use of usage analytics tools to evaluate the impact of initiatives; however, in these cases, organizations will need to consider the % change instead of absolute value of time spent in EHR activities pre- and post-initiative.
Limitations
Several limitations should be considered for these results and methodology. Foremost, while the test runs created for use in our validation closely resemble physician behavior in the EHR, we did not use real-world observational data of multiple physicians, which highlights the variety in practice and use of the EHR. The human-factors bias of using a single RA for conducting the tests is an added limitation. We used test runs inclusive of pauses and interruptions to EHR use to resemble real-world interruptions and workflows; however, we recognize that there could be longer workflow interruptions caused on a clinical unit.
Moreover, we used a test environment (or “instance”) of the EHR instead of a production environment. However, since back-end metrics calculation within the test environment was confirmed (by our vendor) to be identical with how metrics are calculated in the production environment, the impact of difference in environments is negligible. Moreover, the current validation was only conducted at one mental health organization. Despite the context being a large academic mental health hospital, we caution generalizing these results as the workflows likely differ across organizations.
Future Directions
The findings from the use of this methodology identify a few areas for future consideration. Foremost, based on our information gathering,[8 ] we only validated a small number of usage log metrics that were considered a priority for our organization. While we anticipate that this method should suffice for other metrics, future clinician metrics considered important for reducing EHR burden[3 ] (e.g., medication reconciliation) should also be validated using a similar approach mentioned in this report. Moreover, it would be useful to explore the application of this approach to other EHR systems at different organizations. Lastly, we continue to see a paucity of data validation studies being published. In an effort to promote transparency and understanding of the utility of usage log data, we encourage other mental health and nonmental health organizations to share their validation results. Once validation of usage log data has been conducted, organizations can use appropriate metrics to measure EHR-related burden prior and after implementing initiatives aimed at reducing burden.
Conclusion
This case report introduces a novel minimally invasive approach toward validating usage log data. By applying it to usage data at a Canadian mental health hospital, we demonstrated the flexibility and utility of the approach in comparison to conventional time–motion studies. Future studies should aim to explore and optimize this methodology for validation of usage data across various EHR systems and practice settings.
Clinical Relevance Statement
Clinical Relevance Statement
This initiative will provide a feasible and low-burden approach to validating EHR usage data for further optimization and quality-improvement initiatives.
Multiple Choice Questions
Multiple Choice Questions
When studying EHR burden, what are the some of the EHR analytics metrics to consider?
Correct Answer: The correct answer is option d, all of the above. All the three metrics listed above are valuable in measuring the amount of burden caused by the EHR, and could be helpful in measuring the impact of interventions.
When interpreting EHR analytics data, what is a good metric to use for measuring change due to an intervention?
% difference
Mean difference
Median difference
Standard deviation
Correct Answer: The correct answer is option a, % difference. While the mean or median difference is an absolute measure of measuring change, using the percent (%) difference measure takes into account minor discrepancies between actual time spent and time recorded by analytics systems.
Fig. 3 Measurements recorded by usage analytics platform and test case observations for count-based metrics (averaged across 3 days).