Accuracy of Physician Electronic Health Record Usage Analytics using Clinical Test Cases

Brian Lo; Lydia Sequeira; Gillian Strudwick; Damian Jankowicz; Khaled Almilaji; Anjchuca Karunaithas; Dennis Hang; Tania Tajirian

doi:10.1055/s-0042-1756424

Applied Clinical Informatics, Table of Contents

Appl Clin Inform 2022; 13(05): 928-934
DOI: 10.1055/s-0042-1756424

Case Report

Accuracy of Physician Electronic Health Record Usage Analytics using Clinical Test Cases

Brian Lo^*

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

²Centre for Complex Interventions (Digital Interventions Unit), Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Canada

³Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada

,

Lydia Sequeira^*

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

²Centre for Complex Interventions (Digital Interventions Unit), Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Canada

³Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada

,

Gillian Strudwick

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

²Centre for Complex Interventions (Digital Interventions Unit), Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Canada

³Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada

,

Damian Jankowicz

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

,

Khaled Almilaji

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

,

Anjchuca Karunaithas

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

⁴Department of Health and Society, University of Toronto Scarborough, Scarborough, Canada

,

Dennis Hang

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

⁵Health Information Science, University of Victoria, Victoria, British Columbia, Canada

,

Tania Tajirian

¹Information Management Group, Centre for Addiction and Mental Health, Toronto, Canada

⁶Department of Family and Community Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Canada

› Author Affiliations

Abstract

Full Text

PDF Download

Keywords

analytics - documentation burden - EHR system - clinical informatics

Background and Significance

Usage analytics is an emerging source of data for measuring burden related to use of the electronic health record (EHR),[1] investigating the impacts of EHR use on patient care[2] and evaluating the impact of interventions[1] [3] or events (e.g., COVID-19)[4] on EHR use. Analytics data from non-EHR care delivery tools such as nurse call systems and communication devices have also been used to measure and predict strain amongst clinicians.[5] For EHR burden, measures have traditionally focused on subjective measures through surveys,[6] [7] [8] including validated instruments for concepts like “burnout” (e.g., Mini-Z survey[9]). While these approaches allow for characterizing perceived burden and EHR experience,[10] discrepancies between what is estimated by end-users and what is observed in the system have been identified.[8] Moreover, it remains difficult to pinpoint the specific challenges that lead to the burden.[11] [12] To identify the root causes of EHR-related burden and evaluate the impact of the solutions, usage log data will be a critical component toward enabling data-informed training[13] and optimization of the system.[14] [15] [16] Over the past decade, EHR usage logs have been increasingly used to gather metrics for documentation burden concepts[1] such as effort (e.g., after-hours work) and time (e.g., average time spent),[17] [18] with a recent scoping review[1] reporting the use of this method of data collection within 80% of studies across a variety of hospital settings (e.g., general surgery,[19] primary care, and internal medicine[20] [21] [22]).

Previous research has demonstrated the need for validity and reliability within EHR-based metrics and reporting.[23] [24] Therefore, before employing these tools to measure efficiency, understanding the reliability and validity of these tools is important. To ensure administrators and physicians are confident in using EHR usage log data, the validation process described in this case report is necessary. Given that each organization has unique workflow and documentation practices on the EHR system, it is important that the metrics are able to accurately provide a complete picture of the usage patterns of clinicians. A systematic review of EHR logs[25] used for measuring clinical activity demonstrated that only 22% of the 85 studies validated the accuracy of the findings. Currently, two approaches are used for validation: (1) action–activity mappings (by consulting EHR vendors, consensus between researchers, or direct observation) or (2) activity patterns or duration (using direct observation or using self-reported data). These “time–motion” studies[25] involve real-time observation and comparison of how long a function has been performed by a clinician (e.g., using a stopwatch or TimeCat[26]) against the usage log readings. Use of this method has been fairly successful in ophthalmology[27] and primary care[28] settings, among others.[29] However, time–motion studies require significant human resources and time to conduct the validation, and can be difficult to scale in organizations with many departments and unique workflows. The presence of an observer may also hinder the comfort levels of patients and providers within the clinical environment, particularly in a mental health setting. Moreover, given the current pandemic and the rise of telemedicine or distancing requirements for in-person care, it is not feasible nor appropriate to introduce an observer within the clinical environment. From a privacy perspective, recording physician's screens can also introduce a challenge for privacy preservation as identifying information of patient names, images, and diagnoses are unintentionally captured.[30] Thus, a safe, remote, and less invasive approach for validation of the usage log data of EHR systems is needed.

Objectives

In this case report, we introduce an approach using clinical test cases that was implemented to validate the usage log data for an integrated EHR system in use at a large academic teaching mental health hospital. While other organizations may have used this approach in their validation, to our knowledge, this is the first study that discusses this approach within the academic literature. This approach overcomes the limitations identified above in current validation approaches and provides an effective way to test out a large number of workflows with limited resources, and in a manner that is noninvasive to the clinical environment. We highlight the utility of this approach, including key considerations for applying this methodology and using EHR log data in practice.

Materials and Methods

The approach for validation of usage log data is composed of three phases ([Fig. 1]). The validation was conducted in an academic teaching mental health hospital located in Toronto, Ontario and approval was obtained from the organizational Quality Project Ethics Review Board.

Fig. 1 Overview of approach for usage log validation.

Phase 1: Creating Test Cases Based on Real-World Clinical Workflows

Given that the ultimate goal of EHR usage data is to accurately capture the usage patterns of clinicians using the EHR, a guiding principle of this stage was to develop use cases (hereafter called “test runs”) that mimicked as closely to real-world clinical workflows used by physicians at the organization. To create these test runs, we began by consulting the organization's EHR training modules that are available to all physicians on our intranet, and provide detailed line-by-line physician workflows of how to carry out common tasks within the EHR. Through consultation from senior clinical leadership including the Chief Medical Informatics Officer (T.T., who is also a practicing physician within the organization), physician super users of our EHR system, and our clinical informatics nurse, we identified common inpatient and outpatient physician workflows and reconciled any differences between training documents and real-world EHR usage. The clinical informatics nurse consulted is solely responsible for EHR education and training for all 400 physicians at our organization and hence well versed in physicians' use of the EHR. Using an agile, iterative approach, consultation with these aforementioned individuals helped develop test runs that closely resembled real-world physician workflows (see [Table 1] for example test runs; see the [Supplementary Material] for all test runs [available in the online version]). Since organizations have unique clinical workflows, we wanted to ensure that these test cases were specific to our organization, and therefore did not consult the literature for this step.

Table 1
Example of a test run used for validation of the usage log data
Example inpatient test run: preparing for inpatient discharge	Example outpatient test run: regular follow-up at clinic
• Search and open MRN • Add discharge diagnosis (bipolar disorder) • Initiate a discharge order for patient, including the location of discharge • Complete discharge summary note for the patient	• Search and open MRN • Order “lorazepam” • Complete an outpatient progress note • Search and open another MRN • Review laboratory results • Complete discharge documentation for the patient

Abbreviations: MRN, medical record number.

Phase 2: Execution: Conducting the Clinical Workflows in the Test Environment

This involved setting up a dedicated mock physician account in a nonproduction version of the EHR system, so that all activity within this account was recorded. The only difference between the test environment used within this study and our organization's production environment is the absence of real patient data. EHR functionality and modules were identical between the two environments, and our vendor provided confirmation that the methods used to measure usage metrics were identical between the two environments. Mock patients within the test environment were developed by our clinical informatics nurses that mimic real patients with representative diagnoses, care plans, etc. The Research Assistant (RA) performed the test runs that were developed in phase 1, using pauses and variations in documentation time to resemble real-world interruptions and workflows.

For data collection, two complementary techniques were used: (1) a spreadsheet with automatic time-log capture[31] was developed: when each task within a test run was complete, the RA manually indicated completion in the respective cell, which then automatically logged the time of completion; (2) a screen recording tool was used: this method was used to re-measure the execution of the test runs through measuring the times by a stopwatch. The spreadsheet was used to easily calculate the time spent across each task of a test run (as well as the entire test run) and the screen recording allowed for retrospective review and confirmation of time spent.

We gathered EHR-usage data from the back-end analytics platform from the EHR vendor. By logging the exact date and time of the interactions in the test account, the EHR vendor extracted the most granular level of detail during that time period to allow for comparison. The following data were sent through a spreadsheet:

Transaction date.
Number of patients seen.
Clinical note count (including different types of notes).
Time spent (per patient):
- - In the EHR (total).
- - On documentation.
- - Documenting or reviewing allergies.
- - Documenting or reviewing problems or diagnosis.
- - Chart review.
- - Patient histories.

Phase 3: Data Analysis: Comparison between Usage Logs and Test Case Observations

To determine if the time extracted from the EHR back-end analytics platform was comparable to the data we collected through the spreadsheet and screen recording, we converted all time measures to seconds. We compared (1) time spent per patient for the following metrics: total time in EHR, documentation time, order time, chart review time, allergies time, problem and diagnosis time, and (2) counts for the following metrics: patients seen, notes written.

Utility was defined by the absolute differences and percent differences observed between the two methods. To explore the discrepancies between recorded values and those found within the analytics platform, the RA replicated the tasks and consulted with our vendor to identify the root causes.

Results

A total of 10 test runs were conducted by the RA (A.K.) across 3 days, with one system interruption reported in Run 8. Differences in measurements found between the two methods of data collection (i.e., RA recorded values and usage analytics platform) for time-based metrics averaged across 3 days are summarized in [Fig. 2] and count-based metrics are summarized in [Fig 3]. Results of independent t-tests performed for all eight metrics are highlighted in [Supplementary Table S1]; however, it should be noted that these results are based on one user with 3 days of data. A summary of measurements extracted from the analytics platform compared with recorded values is outlined in [Supplementary Table S2] (available in the online version). The percent difference between measurements recorded by the RA and the usage analytics platform ranged from 9 to 60%. The discrepancies observed in time in EHR and order time in EHR were relatively small (<20%). Of the 3 concurrent days of data collection of documentation time in EHR and chart review time per EHR metrics, findings from one of these days yielded large % differences (57–60%) between time captured by the usage analytics log and our spreadsheet.

Fig. 2 Measurements recorded by usage analytics platform and test case observations for time-based metrics (averaged across 3 days).

Discussion

Validation of metrics is often considered a barrier toward the full uptake and use of usage log data to support characterization and mitigation of the EHR burden. This is mainly due to the resource and time required to conduct robust time validation studies, hindering its practical execution. This work outlines a feasible and resource-efficient approach for validating usage log data for use in practice. Previous validation studies using screen recordings of EHR sessions have demonstrated that metrics such as total time spent time within the EHR correlated strongly with observed metrics (r = 0.98, p < 0.001), where each minute of audit-log-based time corresponded to 0.95 minutes of observed time[32] across a variety of provider roles. Other research within ophthalmology clinics validating EHR log data using time–motion observations have demonstrated that 65 to 69% of the total EHR interaction estimates were ≤3 minutes from the observed timing, and 17 to 24% of the EHR interaction estimates >5 minutes from the observed timing.[33]

Lessons Learned

We learned the following lessons from our technique:

Ensure partnership with EHR vendor: this approach allowed for a collaborative review of discrepancies by both organization and vendor. We identified some discrepancies amongst our metrics after splitting it by the reported number of patients seen. Upon review with our vendor developers, the number of patients seen is only recorded if they only complete a certain action (e.g., completing a form as opposed to an actual documentation). Additionally, when looking at our after-hours metric, the hours were predetermined by our vendor (6 p.m.–6 a.m.), and this time frame might be different from physician's work hours. This helped us explain why some values were overinflated against our own measurements, and also helped us brainstorm other situations (e.g., physicians signing on resident's notes) that might impact the time spent.
Iteratively test and maximize data transparency: the resource-efficient approach of this method allows us to adjust the level of depth of our workflows and repeat the runs as necessary (e.g., after EHR upgrades). We began our validation with less complex test runs that repeated the essential tasks in a very controlled setting (results not shown here) and gradually increased to more sophisticated test runs as it aligns to our metrics. Thus, this method allows for a step-wise, controlled validation approach that can be embedded as part of the implementation lifecycle. The screen-recording helped us explore the rationales for the discrepancies. These recordings also allow for transparency and replicability of the results.
Interpret results appropriately : our findings provide support of the accuracy of the usage log data to our clinical workflows. Previous studies have reported variations of overestimations of 43% (4.3 vs. 3.0 minutes) to underestimations by 33% (2.4 ± 1.7 vs. 1.6 ± 1.2 minutes).[25] For most of our metrics, the discrepancy between the usage analytics platform and our observation was fairly consistent across the 3 days of test runs, which suggests that the metrics are consistently calculated on a day-to-day basis. For large differences within our data (e.g., 124% underestimation of Allergies time), it is important to note that these differences can be amplified when small amounts of time are spent carrying out a particular task (i.e., <30 seconds/patient). Moreover, there is a slight variation in the back-end calculation of certain metrics where it might appear that the user is “documenting.” In these cases, back-end timers may have counted this task within a different area of the chart since the documentation sat within a larger section (e.g., a workflow page). While it is very difficult to obtain very high accuracy as per the nature of log data (e.g., when mouse stops moving),[34] the presence of these results provided confidence on the level of accuracy (i.e., range of error) we can expect in practice. When differences between observed and measured time are vast, organizations can still make use of usage analytics tools to evaluate the impact of initiatives; however, in these cases, organizations will need to consider the % change instead of absolute value of time spent in EHR activities pre- and post-initiative.

Limitations

Several limitations should be considered for these results and methodology. Foremost, while the test runs created for use in our validation closely resemble physician behavior in the EHR, we did not use real-world observational data of multiple physicians, which highlights the variety in practice and use of the EHR. The human-factors bias of using a single RA for conducting the tests is an added limitation. We used test runs inclusive of pauses and interruptions to EHR use to resemble real-world interruptions and workflows; however, we recognize that there could be longer workflow interruptions caused on a clinical unit.

Moreover, we used a test environment (or “instance”) of the EHR instead of a production environment. However, since back-end metrics calculation within the test environment was confirmed (by our vendor) to be identical with how metrics are calculated in the production environment, the impact of difference in environments is negligible. Moreover, the current validation was only conducted at one mental health organization. Despite the context being a large academic mental health hospital, we caution generalizing these results as the workflows likely differ across organizations.

Future Directions

The findings from the use of this methodology identify a few areas for future consideration. Foremost, based on our information gathering,[8] we only validated a small number of usage log metrics that were considered a priority for our organization. While we anticipate that this method should suffice for other metrics, future clinician metrics considered important for reducing EHR burden[3] (e.g., medication reconciliation) should also be validated using a similar approach mentioned in this report. Moreover, it would be useful to explore the application of this approach to other EHR systems at different organizations. Lastly, we continue to see a paucity of data validation studies being published. In an effort to promote transparency and understanding of the utility of usage log data, we encourage other mental health and nonmental health organizations to share their validation results. Once validation of usage log data has been conducted, organizations can use appropriate metrics to measure EHR-related burden prior and after implementing initiatives aimed at reducing burden.

Conclusion

This case report introduces a novel minimally invasive approach toward validating usage log data. By applying it to usage data at a Canadian mental health hospital, we demonstrated the flexibility and utility of the approach in comparison to conventional time–motion studies. Future studies should aim to explore and optimize this methodology for validation of usage data across various EHR systems and practice settings.

Clinical Relevance Statement

This initiative will provide a feasible and low-burden approach to validating EHR usage data for further optimization and quality-improvement initiatives.

Multiple Choice Questions

When studying EHR burden, what are the some of the EHR analytics metrics to consider?
- After-hours usage
- Number of clicks per patient
- Documentation time per patient
- All of the above
Correct Answer: The correct answer is option d, all of the above. All the three metrics listed above are valuable in measuring the amount of burden caused by the EHR, and could be helpful in measuring the impact of interventions.
When interpreting EHR analytics data, what is a good metric to use for measuring change due to an intervention?
- % difference
- Mean difference
- Median difference
- Standard deviation
Correct Answer: The correct answer is option a, % difference. While the mean or median difference is an absolute measure of measuring change, using the percent (%) difference measure takes into account minor discrepancies between actual time spent and time recorded by analytics systems.

Fig. 3 Measurements recorded by usage analytics platform and test case observations for count-based metrics (averaged across 3 days).

References

References
1 Moy AJ, Schwartz JM, Chen R. et al. Measurement of clinical documentation burden among physicians and nurses using electronic health records: a scoping review. J Am Med Inform Assoc 2021; 28 (05) 998-1008
2 Kannampallil TG, Denton CA, Shapiro JS, Patel VL. Efficiency of emergency physicians: insights from an observational study using EHR log files. Appl Clin Inform 2018; 9 (01) 99-104
3 Sinsky CA, Rule A, Cohen G. et al. Metrics for assessing physician activity using electronic health record log data. J Am Med Inform Assoc 2020; 27 (04) 639-643
4 Moore C, Valenti A, Robinson E, Perkins R. Using log data to measure provider EHR activity at a cancer center during rapid telemedicine deployment. Appl Clin Inform 2021; 12 (03) 629-636
5 Womack DM, Hribar MR, Steege LM, Vuckovic NH, Eldredge DH, Gorman PN. Registered nurse strain detection using ambient data: an exploratory study of underutilized operational data streams in the hospital workplace. Appl Clin Inform 2020; 11 (04) 598-605
6 Melnick ER, West CP, Nath B. et al. The association between perceived electronic health record usability and professional burnout among US nurses. J Am Med Inform Assoc 2021; 28 (08) 1632-1641
7 Melnick ER, Dyrbye LN, Sinsky CA. et al. The association between perceived electronic health record usability and professional burnout among US physicians. Mayo Clin Proc 2020; 95 (03) 476-487
8 Tajirian T, Stergiopoulos V, Strudwick G. et al. The influence of electronic health record use on physician burnout: cross-sectional survey. J Med Internet Res 2020; 22 (07) e19274
9 Linzer M, Poplau S, Babbott S. et al. Worklife and wellness in academic general internal medicine: results from a national survey. J Gen Intern Med 2016; 31 (09) 1004-1010
10 Eschenroeder HC, Manzione LC, Adler-Milstein J. et al. Associations of physician burnout with organizational electronic health record support and after-hours charting. J Am Med Inform Assoc 2021; 28 (05) 960-966
11 Overhage JM, McCallie Jr D. Physician time spent using the electronic health record during outpatient encounters: a descriptive study. Ann Intern Med 2020; 172 (03) 169-174
12 Overhage JM, Johnson KB. Pediatrician electronic health record time use for outpatient encounters. Pediatrics 2020; 146 (06) e20194017
13 Sockalingam S, Tavares W, Charow R. et al. Examining associations between physician data utilization for practice improvement and lifelong learning. J Contin Educ Health Prof 2019; 39 (04) 236-242
14 Sieja A, Kim E, Holmstrom H. et al. Multidisciplinary sprint program achieved specialty-specific EHR optimization in 20 clinics. Appl Clin Inform 2021; 12 (02) 329-339
15 Otokiti AU, Craven CK, Shetreat-Klein A, Cohen S, Darrow B. Beyond Getting Rid of Stupid Stuff in the Electronic Health Record (Beyond-GROSS): protocol for a user-centered, mixed-method intervention to improve the electronic health record system. JMIR Res Protoc 2021; 10 (03) e25148
16 Sequeira L, Almilaji K, Strudwick G, Jankowicz D, Tajirian T. EHR “SWAT” teams: a physician engagement initiative to improve electronic health record (EHR) experiences and mitigate possible causes of EHR-related burnout. JAMIA Open 2021; 4 (02) ooab018
17 Kannampallil T, Abraham J, Lou SS, Payne PRO. Conceptual considerations for using EHR-based activity logs to measure clinician burnout and its effects. J Am Med Inform Assoc 2021; 28 (05) 1032-1037
18 Adler-Milstein J, Adelman JS, Tai-Seale M, Patel VL, Dymek C. EHR audit logs: a new goldmine for health services research?. J Biomed Inform 2020; 101: 103343
19 Cox ML, Farjat AE, Risoli TJ. et al. Documenting or operating: where is time spent in general surgery residency?. J Surg Educ 2018; 75 (06) e97-e106
20 Wang JK, Ouyang D, Hom J, Chi J, Chen JH. Characterizing electronic health record usage patterns of inpatient medicine residents using event log data. PLoS One 2019; 14 (02) e0205379
21 Mishra P, Kiang JC, Grant RW. Association of medical scribes in primary care with physician workflow and patient experience. JAMA Intern Med 2018; 178 (11) 1467-1472
22 Tran B, Lenhart A, Ross R, Dorr DA. Burnout and EHR use among academic primary care physicians with varied clinical workloads. AMIA Jt Summits Transl Sci Proc 2019; 2019: 136-144
23 Weiner JP, Fowles JB, Chan KS. New paradigms for measuring clinical performance using electronic health records. Int J Qual Health Care 2012; 24 (03) 200-205
24 Kanger C, Brown L, Mukherjee S, Xin H, Diana ML, Khurshid A. Evaluating the reliability of EHR-generated clinical outcomes reports: a case study. EGEMS (Wash DC) 2014; 2 (03) 1102
25 Rule A, Chiang MF, Hribar MR. Using electronic health record audit logs to study clinical activity: a systematic review of aims, measures, and methods. J Am Med Inform Assoc 2020; 27 (03) 480-490
26 TimeCaT. TimeCaT 3.9. Accessed July 1, 2021 at: https://lopetegui.net/timecat/39/login/
27 Read-Brown S, Hribar MR, Reznick LG. et al. Time requirements for electronic health record use in an academic ophthalmology center. JAMA Ophthalmol 2017; 135 (11) 1250-1257
28 Tai-Seale M, Olson CW, Li J. et al. Electronic health record logs indicate that physicians split time evenly between seeing patients and desktop medicine. Health Aff (Millwood) 2017; 36 (04) 655-662
29 Arndt BG, Beasley JW, Watkinson MD. et al. Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations. Ann Fam Med 2017; 15 (05) 419-426
30 Cooley J, Smith S. Privacy-preserving screen capture: towards closing the loop for health IT usability. J Biomed Inform 2013; 46 (04) 721-733
31 UserFocus. Log usability tests like a pro. Accessed May 31, 2021 at: https://www.userfocus.co.uk/articles/datalogging.html
32 Sinha A, Stevens LA, Su F, Pageler NM, Tawfik DS. Measuring electronic health record use in the pediatric ICU using audit-logs and screen recordings. Appl Clin Inform 2021; 12 (04) 737-744
33 Hribar MR, Read-Brown S, Reznick L. et al. Secondary use of EHR timestamp data: validation and application for workflow optimization. 2015 AMIA Annual Symposium Proceedings. San Francisco, CA, November 14–18, 2015: 1909-1917
34 Wolpert M, Rutter H. Using flawed, uncertain, proximate and sparse (FUPS) data in the context of complexity: learning from the case of child mental health. BMC Med 2018; 16 (01) 82

Figures

Fig. 1 Overview of approach for usage log validation.

Fig. 2 Measurements recorded by usage analytics platform and test case observations for time-based metrics (averaged across 3 days).

Fig. 3 Measurements recorded by usage analytics platform and test case observations for count-based metrics (averaged across 3 days).

Supplementary Material

Supplementary Material