User-Centered Evaluation and Design Recommendations for an Internal Medicine Resident Competency Assessment Dashboard

Scott Vennemeyer; Benjamin Kinnear; Andy Gao; Siyi Zhu; Anunita Nattam; Michelle I. Knopp; Eric Warm; Danny T.Y. Wu

doi:10.1055/s-0043-1777103

Applied Clinical Informatics, Table of Contents

Appl Clin Inform 2023; 14(05): 996-1007
DOI: 10.1055/s-0043-1777103

Research Article

User-Centered Evaluation and Design Recommendations for an Internal Medicine Resident Competency Assessment Dashboard

Scott Vennemeyer

¹Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Ohio, United States

,

Benjamin Kinnear

²Department of Pediatrics, College of Medicine, University of Cincinnati, Ohio, United States

⁵Department of Internal Medicine, College of Medicine, University of Cincinnati, Ohio, United States

,

Andy Gao

¹Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Ohio, United States

³Medical Sciences Baccalaureate Program, College of Medicine, University of Cincinnati, Ohio, United States

,

Siyi Zhu

¹Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Ohio, United States

⁴School of Design, College of Design, Architecture, Art, and Planning (DAAP), University of Cincinnati, Ohio, United States

,

Anunita Nattam

¹Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Ohio, United States

³Medical Sciences Baccalaureate Program, College of Medicine, University of Cincinnati, Ohio, United States

,

Michelle I. Knopp

⁵Department of Internal Medicine, College of Medicine, University of Cincinnati, Ohio, United States

⁶Division of Hospital Medicine, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, College of Medicine, University of Cincinnati, Ohio, United States

,

Eric Warm

⁵Department of Internal Medicine, College of Medicine, University of Cincinnati, Ohio, United States

,

Danny T.Y. Wu

¹Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Ohio, United States

²Department of Pediatrics, College of Medicine, University of Cincinnati, Ohio, United States

³Medical Sciences Baccalaureate Program, College of Medicine, University of Cincinnati, Ohio, United States

⁴School of Design, College of Design, Architecture, Art, and Planning (DAAP), University of Cincinnati, Ohio, United States

› Author Affiliations

Abstract

Full Text

PDF Download

Keywords

internship and residency - requirements analysis and design - evaluation of impact - qualitative - quantitative - dashboard

Background and Significance

Each year residency programs across the country graduate a new class of board eligible physicians. Nationally, these programs are accredited by the Accreditation Council for Graduate Medical Education (ACGME). The ACGME mandates that residency and fellowship programs must develop Clinical Competency Committees (CCCs)[1] whose role is to demonstrate “that graduates will provide high quality, safe care to patients while in training and be well prepared to do so once in practice.”[2]

CCCs use methods such as programmatic assessment, a systematic approach in which robust amounts of data are collected from multiple sources in multiple ways.[3] [4] [5] CCCs consider multiple learners during a given meeting, causing time pressure for efficient data review and group processing. Information sharing is, therefore, crucial to defensible group decision-making.[6] [7] [8] [9] [10] Group decision-making in this context is fraught with challenges, however, especially effective and efficient sharing of information.[11] Dashboards are used for information sharing,[12] [13] but optimal design approaches are unclear.

While literature has described best practices for reviewing assessment data and for better integration of data into group decision-making,[11] [12] actual practice across institutions remains highly variable. One suggestion is to use interactive data dashboards. Dashboards combine several types of scientific data visualization[14] to help interpret large datasets. Dashboards have also been used to facilitate communication between program leadership and residents.[15] [16] Recently, medical education dashboards have integrated real-time, web-based systems with data visualizations to provide timely feedback to trainees and increase transparency of competency assessment.[17] [18] [19] [20] Few studies, however, describe how CCC dashboards should be designed in a learner-centered manner that meets the needs of end users.[21] Boscardin et al provided 12 practical tips to implement educational dashboards,[13] one of which suggests a team-based approach to codesign a dashboard with multiple stakeholders, including the learners. CCC dashboards should be designed using rigorous methods and user input to ensure maximum value.

Objectives

Our objective was to conduct a user-centered evaluation of an existing dashboard using a multimethod approach and generate design recommendations (DRs) that can be used to guide programs in the development of a competency assessment dashboard.

Methods

Current Clinical Competency Committee Dashboard

The CCC at the University of Cincinnati Internal Medicine (IM) residency program uses an Excel (Excel 2016, Microsoft Corporation, Redmond, WA)-based dashboard developed by the Internal Medicine (IM) program director. The dashboard incorporates data from the program's workplace-based assessment system, utilizing frontline assessment elements called observable practice actions/activities (OPAs).[22] [23] OPAs are specific skills that residents perform in daily practice that are rated by faculty members, resident peers, medical students, nurses, and allied health professionals using a five point entrustment–supervision scale.[24] These OPA-based entrustment ratings are mapped to the ACGME IM subcompetencies in the electronic residency management system (MedHub Incorporated, Ann Arbor, MI) and inform each residents' subcompetency ratings.[25] A typical resident receives an annualized average of 83 assessment encounters producing approximately 4,000 subcompetency assessments. The dashboard also includes learning analytics that overlays an “expected entrustment score.” This score is generated using historical programmatic entrustment data and a generalized linear-mixed model with random effects.[26]

A mock-up of the current dashboard is displayed in [Fig. 1]. [Table 1] describes the various components that we considered for this study, along with their labels from [Fig. 1]. In addition to structured feedback, assessors also provide narrative comments describing each resident's strengths and opportunities for improvement. Narrative data are downloaded from MedHub into an Excel document that is separate from the entrustment dashboard and separately reviewed by the CCC members. Of note, the current dashboard does not display numerous other data points, such as practice exam scores or patient care data, that are collected about residents and must be accessed separately.

Fig. 1 (a) Mock-up of the front page of the dashboard with the following components: (A) Spider Graphs of faculty and peers/allied averaged subcompetency score from last 6 months, (B) Line Graphs demonstrating trend over time of data in (A), (C) Control Chart to assess special cause variation, (D) Main Heatmap of subcompetencies including counts, and z-scores, (E) PivotTable Selectors. (b) Mock-up of second page of the dashboard, with the following components: (F) Comments by Month table, (G) Ratings count table, (H) Subcompetency Review Chart, (I) OPA's Past 6 Months, (J) OPA's ranked >25 Times during residency, (K) CCC Meeting Worksheet, (L) PivotTable Selectors. CCC, clinical competency committee; OPA, observable practice action/activity.

Table 1
Description of relevant dashboard components used in the ranking experiment
Component	Description
Line Graph (A)	Presents actual versus expected average entrustment scores by month, as well as a trendline
Spider Graph (B)	Shares average actual versus expected entrustment ratings by subcompetency over the past 6 months
Control Chart (C)	Designed to identify hidden variations in scores, should alert users when data for a certain month are out of the ordinary
Main Heatmap (D)	Displays average ratings of each subcompetency and is color coded by z-score
Comments by Month (F)	Worksheet in which committee members can paste relevant narrative data and leave their own comments about scores by month
Ratings Count (G)	Shows number of ratings by entrustment scores (i.e., number of 1's, 2's, 3's, etc.)
Subcompetency Review (H)	Shares similar data to the Main Heatmap, specifically average scores by subcompetency, but located on the second page
OPA's Past 6 Months (I)	Shares OPA's that have been rated in the past 6 months
OPA's Rated <25 Times (J)	Share OPA's that have been rated less than 25 times, meaning that they may not give an accurate depiction of a resident's skill in that area.

Abbreviation: OPA, observable practice action/activity.

Study Setting and Participants

This study was conducted with members of the IM residency program CCC. There were 87 categorical residents enrolled in the program at the time of this study. The IM CCC consisted of 22 members including the program director, associate program directors, core education faculty, and chief residents who have completed their residency training. Not every member was present at every meeting. The CCC met monthly for 2 hours to discuss half of a single class as well as any specific resident competency issues as they arise. Each resident is reviewed at least semiannually.

All members of the CCC received an email requesting their participation from author B.K. and responded on a voluntary basis. After 11 interviews (50% of the CCC members), the study team felt that thematic saturation was achieved. The final participants in this study included eight faculty physicians and three chief residents. Participants were offered a gift card to compensate for their time.

Study Design

We employed a user-centered evaluation approach to better understand how CCC members used the current system before our team began an iterative redesign process. As the first round of evaluation in a phased strategy,[27] our study utilized a multimethod approach[28] combining quantitative questions (ranking experiment) with the qualitative interviews. These methods included affinity diagramming,[29] [30] [31] a ranking experiment,[32] and process mapping for workflow analysis.[33] [34] The overall study flow is detailed in [Fig. 2]. The process began with an initial preinterview with a CCC associate program director (author E.W.). The first author, along with two research assistants, used this preinterview about dashboard function with author B.K. to develop a set of 12 “grand tour” style questions.[35] These questions, listed in [Table 2], were designed to allow participants to give an overview of their background, experience, workflow, and overall thoughts and opinions of the dashboard.

Fig. 2 Study conduction.

Table 2
Questions asked during the semistructured interview
Category	Question	Text
Participant background	1	What is your name and job title?
	2	How many years have you been with Internal Medicine? What is your other prior experience?
	3	What is your role on the competency committee? What responsibilities do you have?
	4	When do you use this dashboard?
Workflow and daily usage	5	Who collects and enters the data that are used in this dashboard?
Workflow and daily usage	6	Please walk me through your process when you utilize the dashboard to review a learner, sharing your thoughts aloud.
Introduction of ranking session
Ranking	7	Rank the cards in order of value. In other words, order the cards by how valuable you feel each component of the dashboard is to help you review a learner.
Ranking	8	Rank the cards in order of frequency of use. In other words, order the cards by how often you find yourself looking at each individual component, regardless of how important the information it contains is to your process.
Individual interpretation and opinion	9	How much of the raw data are helpful to see? On one extreme (1), we would only display the raw data from the assessors with no backend analysis. On the other extreme (5), we would display no raw data and only display a recommendation based on backend analysis of the data.
	10	What issues have you encountered while trying to use the dashboard?
	11	Once you are finished using the dashboard, how do you save what you have learned for later? Are there other resources that you utilize to help you with your competency assessment decisions?
	12	Knowing that we are looking to redesign this dashboard to improve both competency assessment and resident education, what would be most important for us to consider about how you utilize this platform? Are there any additional features that you can think of that would be helpful to you?

Questions (7) and (8) were specifically designed to accompany a ranking experiment. The interviewer asked follow-up questions at each step and allowed the participant to guide the conversation whenever possible. Most interviews were conducted through online video calls and screen sharing. These interviews lasted approximately 45 minutes and were facilitated, recorded, and transcribed verbatim by the interviewers. The interviewers then reviewed each other's transcriptions as a quality check to guarantee the accuracy and completeness of the transcriptions. The privacy of each participant was kept by de-identifying and referring to them as “P00” to “P10” in all interviews. This study was reviewed and approved by the University of Cincinnati Institutional Review Board (IRB# 2019-1418).

Data Analysis

Using the transcribed interviews, three analyses were conducted. First, the transcripts were examined using the actor–action–artifact model to complete process mapping.[33] [34] This method helped determine the action (what is being done at each step), the actor (who is doing the action), and the artifact (items that enable the actor to complete the action) carried out at each step. Individual process maps were used to generate a consolidated workflow diagram.

Second, the transcriptions were utilized to create affinity diagrams.[29] [30] [31] Affinity diagramming, also known as the K-J Method, was originally developed for groups to make sense of large amounts of data and helps organize information into groupings of related concepts.[30] To begin, the research team used a bottom-up approach to determine categories used for the initial data coding for transcript highlighting. These categories include “dashboard background/context,” “potential pain points,” “expectations/personal opinions/suggestions” from the users, and “dashboard usage and mindset.” After highlighting was completed, the quotes were exported to the online platform “Miro” (Miro, San Francisco, CA) and placed on “virtual sticky notes.” In the critical next step,[30] the team worked together silently to form groupings of quotes based on their similarities. Once categories had been formed, names were assigned based on the theme of the quotes in each group for further analysis.

The third analysis technique combined a quantitative approach with follow-up qualitative interview questions. This technique, a ranking experiment,[32] assigned a numerical rank to the subjective value of nine dashboard components (highest rank being “1” and lowest being “9”). In interview questions (7) and (8), the participants were asked to rank components by “value” and by “frequency of use,” then to discuss their thought process. The average scores were calculated for each of the two questions, then combined to produce a final rank for each component. A preliminary analysis of the ranking experiment has been published elsewhere,[36] but has been expanded upon greatly in this work.

Results

Process Mapping

There are four major actors in the CCC assessment process: residents, frontline assessors, pre-reviewers, and CCC members. There are five phases in which actions can occur: patient care, assessment, data compilation, pre-review, and committee review. A summary of this workflow in the form of a swim lane diagram is presented in [Fig. 3].

Fig. 3 Condensed workflow diagram demonstrates the role of the resident, assessor, pre-reviewer, and CCC committee. Importantly, the pre-review process and the CCC meeting discussions do not share the same workflow but use the same visualizations on the dashboard. Pre-reviewers also do not follow a standard method for reviewing residents and may choose to look at visualizations in any order and choose to highlight whatever commentary they feel is appropriate for the discussion. CCC, clinical competency committee.

First, resident physicians care for patients. Then, the frontline assessors provide entrustment ratings and narrative assessments for each resident. After assessments are completed, the data are processed and compiled into the Excel sheet by the dashboard design team (specific members of the CCC that help to create the dashboard).

At this point, the dashboard is sent to the CCC. Some members of the CCC serve as volunteer “pre-reviewers” and are assigned four to six residents to present to the other CCC members during the monthly meeting.

When pre-reviewers evaluate a resident, they first look at the opening screen of the dashboard to synthesize information found on the line and spider graphs, the subcompetency review chart, and the control chart. It is just as critical to their review are the narrative data offered by assessors. This helps the CCC members understand why a resident was rated at a certain entrustment level. To access this information, the CCC member must switch to a separate Excel file and filter manually to the correct resident. At this point, the pre-reviewer completes the “resident review sheet” by manually adding narrative comments that give context to the scores on the dashboard and synthesize the data to create a set of recommendations for CCC review.

At the CCC meeting, the dashboard is utilized again but in a different manner. The pre-reviewer shares the recommendations they have made and directs the dashboard that is shared on a screen for the committee to see. At this point, the narrative data have been manually added to the dashboard by the pre-reviewer and is heavily emphasized in the committee's discussion. They do not emphasize the numeric values but rather look for signal using z-scores that are highlighted in green (above expected scores) or red (below average scores).[26] The full committee shares its thoughts on the resident's performance and provides feedback to be shared with the resident by their faculty coach (a role outside of the committee filled by faculty members assigned to each resident).

Affinity Diagramming

Five themes were identified through affinity diagramming that help us further make sense of these workflows. [Table 3] contains selected quotations from the interviews with participants that support these findings, but each theme is summarized below.

Table 3
Theme and representative quotes
Theme	Participant ID	Representative quotes
Preprocessing of data is time-consuming	P02	“It's a process that I work with [P04]. So first we download the data as a flat file from MedHub and [P04] does the analysis and I take the data and plug it and update the dashboard every month and then I send the dashboard out”
	P02	“[we run] it through his program which sometimes takes three/four hours”
	P07	“So having this Excel sheet that you have to download, it's not updated. By the time we actually get the data, we're probably at least a month behind; and then by the time I meet with my resident, there's another month of data that hasn't been uploaded.”
	P00	“As a pre-reviewer and as a coach, you must have both of these files open and toggle them back and forth and all that kinds of stuff and so it's really hard to connect the quantitative to the qualitative without doing all this toggling.”
Integration of narrative and other data	P00	“The narrative comes in a separate Excel file, and I would filter to whatever resident I was going to look at, and then the problem is that you try to match up which data point on the graph goes with which comments”
	P03	“It bothers me that you know ITE scores[a] and MKSAP[b] and Long Block are nowhere to be found in the dashboard at all.”
	P01	“I don't think right now we have a good way of viewing narrative data…”
There is no agreed upon interpretation for some dashboard components	P03	“In my opinion, [the control chart] has only been useful in probably two instances in my life”
	P07	“And when I have seen it or seen significant standard deviation, I'm usually not able to tell why. So even looking through narrative data, it doesn't really give me a good explanation. So I think it was usually just two different attendings that have rated in two different ways. I couldn't really get anything out of that”
	P01	“The heat map means nothing to me. I don't even have a framework in my head for what “systems-based practice 2” is. I would have to go back to like the ACGME internal medicine [subcompetency] document to even know what that even stood for.”
Data are not saved for future competency review	P06	“The [ideal] dashboard would be more of sort of a longitudinal tool, where we could be able to quickly assess those graphs. Not going back in or not just looking at this dashboard for this particular month, but for this particular learner over the past three years, a certain way to collate the information where I can be able to quickly access that data within one master document or program if that makes sense.”
	P10	“I mean I think that being able to see all of the critical deficiencies in the last six months, being able to toggle through and see what performance evaluations and where residents were in their schedule are two big things and then being able to quickly access the comment, the qualitative comments and match them to the scores that people are getting are most important and then again being able to store that information. As you're kind of creating a live document and going through this process, such that you're not reinventing the wheel all the time, you don't have to like rename the Excel sheet with the residents name and all that business.”
	P08	“It would be really nice if it was just an interactive dashboard that we could go in at any time rather than having to rely on Excel spreadsheet data.”
CCC meeting workflow is different from pre-reviewer workflow	P00	“We have sharing issues with the committee too. Committee members get the dashboard ahead of time but anything that I fill out here, they can't see until I show up at committee and pull it up on the screen because it's saved on my OneDrive or whatever”
	P03	“We have realized that there is some danger in having one person to review everything in depth, and then start sharing it to the group…there is a potential for bias there.”
	P07	“Unless it's a resident that's obviously struggling and I'm spending a lot more time looking through every single narrative, and going through month by month, even if it's a lot older data. But it usually takes me to do the preview around eight to ten minutes.”

Abbreviations: ACGME, Accreditation Council for Graduate Medical Education; CCC, Clinical Competency Committee.

^a ITE, in training exam; an annual, standardized examination for learners.[37]

^b MKSAP, American College of Physicians Medical Knowledge Self-Assessment Program.[38]

Preprocessing of Data is Time-Consuming

There are significant delays in the delivery of real-time data for CCC review. This is because the current dashboard is recompiled each month by using data from multiple sources. The Excel sheet is cumbersome to create and dissemination to reviewers can delay assessment and review by several days. Additionally, more data accrue between the time the spreadsheet is created and the date that the CCC meets. These data are often not reviewed at the meeting even if they could be helpful. This leads to the CCC using data that may not be the most up to date.

Integration of Narrative and Other Data

Narrative data play a critical role in understanding how residents are performing, yet these data are very difficult to combine with the existing dashboard. Currently, the narrative comments must be accessed separately from the dashboard and are manually copied into the “resident review sheet.” Reviewers struggle to navigate through the sheets to get a complete picture of the data. Other data sources like the annual “in training exam”,[37] the American College of Physicians Medical Knowledge Self-Assessment Program,[38] and the Long Block data (a unique 1-year ambulatory training program for the UC IM Residency Program)[39] are not included in this dashboard, but were useful to participants in their review tasks.

Users Lack Understanding of Certain Dashboard Components

When asked to describe each visualization, several participants were not certain how components like the Control Chart (C) were meant to work. For example, P03 only found the Control Chart (C) useful “in probably two instances.” P07, however, uses the Control Chart (C) to “immediately jump to the narrative data” based on the information they gain from the chart. Because P07 knew how to utilize this chart, they do not discount the graph immediately because they understand how to properly interpret the data.

Data are Not Saved for Future Competency Reviews

Even though data are added into the dashboard each month, the reviews for each resident from the prior CCC meetings are not saved in the next month's file for later review. To access these previous reviews, CCC members must remember the last time that the resident was reviewed and search through computer folders to find the appropriate file. This makes the tool more difficult to follow-up for longitudinal resident development.

Clinical Competency Committee Meeting Workflow is Different from Pre-reviewer Workflow

Pre-reviewers spend much more time reviewing resident data compared to the amount of time spent at CCC meetings presenting the data. This includes the narrative data described above. The current system does not accommodate these differing workflows.

Ranking Experiment

The ranking experiment allowed the team to assess the dashboard components by their overall perceived value and frequency of use. [Table 4] displays the results of the ranking experiment. The components can be organized into the following five groups based on the similarity of their average scores.

Table 4
Results of the ranking experiment, ranks, and standard deviation of the scores (11 participants)
Group[a]	Dashboard components[b]	Average perceived value	Standard deviation	Average frequency of use	Standard deviation	Average overall score	Overall rank
1	(A) Line Graph	1.31	0.46	1.88	1.36	1.59	1
2	(F) Comments by Month	3.00	1.27	2.69	2.19	2.84	2
2	(B) Spider Graph	2.63	2.46	3.31	2.07	2.97	3
3	(H) Subcompetency Review	5.88	2.01	4.88	1.91	5.38	4
	(G) Ratings Count	5.38	1.79	6.00	1.65	5.69	5
	(I) Lowest OPA's Past 6 Months	5.63	1.16	5.81	2.07	5.72	6
4	(D) Main Heatmap	6.44	0.93	6.44	1.71	6.44	7
4	(J) OPA's > 25	6.88	1.53	6.19	1.81	6.54	8
5	(C) Control Chart	7.81	1.98	7.81	1.98	7.81	9

Abbreviation: OPA, observable practice action/activity.

^a The group number was assigned based on similarity of overall average scores to compare similarly ranked components.

^b Dashboard components are labeled “(A) to (J)” based on their order in the interface (as seen in [Fig. 1]).

Group 1 (the highest ranked component) contained the Line Graph (A). This visualization was valued for its quick “at-a-glance” style information.

Group 2 contained the Comments by Month (F) and Spider Graph (B) components. These two ranked closely and were appreciated for being essential for narrative data collection and CCC decision-making, respectively.

Group 3 contained the widest variety of components, including the subcompetency Review (H), Ratings Count (G), and Lowest OPA's Past 6 Months (I). These provided valuable information but are not applicable to every review.

Group 4 includes the Main Heatmap (D) and OPA > 25 (J). These components were less favored by the committee members, even though they could provide potential utility.

Group 5 (the lowest ranked group) contains one component, the Control Chart (C). This component was generally overlooked by users and was ranked low in both categories.

Calculating a standard deviation for the ranks of each component also helps us place our quantitative results in context with our qualitative interviews. The highest deviation in ranks came from the Comments by Month (F, SD = 2.19) by frequency of use and from the Spider Graph (B, SD = 2.46) by value. This demonstrates the finding from affinity diagramming that there is not one “agreed upon interpretation” of these dashboard components. The lowest deviation came from Line Graph (A, SD = 0.46) by value. This means to us that the line graphs would always be considered most valuable by the users of the dashboard.

Discussion

Key Findings

This user-centered evaluation of the University of Cincinnati IM residency program's CCC dashboard employed a multimethod approach. The process maps revealed that individual users of this dashboard have highly varied approaches to their use of the system. This implies that a good dashboard must cater to multiple user types with differing needs. Affinity diagramming yielded several qualitative themes that describe areas for improvement, identified by current users of the system. These themes can be used to both improve our own system and to recommend design principles for others looking to create their own dashboards. Finally, the ranking experiment showed how users assign varying levels of importance to different dashboard components. For example, components like Line Graph (A) and Comments by Month should be prioritized due to their universal utility, while others, like the Control Chart, may require reevaluation or redesign to better align with user needs. These findings emphasize the importance of user-centered design in health care informatics and allow for the opportunity for others to learn from our evaluation efforts.

Design Recommendations

Using these findings, this research team is conducting an iterative redesign process, making the following four DRs to improve upon the current CCC dashboard system. These recommendations can be used to guide other programs that may be looking to evaluate or design their own CCC dashboards.

Design Recommendation 1: Integrating Both Quantitative and Qualitative Data

CCC members require both quantitative scores and qualitative data to make defensible decisions.[40] [41] [42] [43] [44] According to the interviews, there is no integration between narrative data and dashboard information. Despite this, narrative comments are integral to understand the scores provided by assessors. They also provide direct feedback to the CCC and can help residents set specific goals for improvement in the future when combined with the quantitative scores. An ideal dashboard should integrate both types of data.

Design Recommendation 2: Developing an Informatics Platform for Better Management of Data

Currently, the dashboard is formed by combining raw data from multiple sources to create a document each month shared with the CCC for review. In addition to workplace-based entrustment data[22] [23] enhanced through the use of learning analytics[26] and narrative data,[23] multiple sources are used outside of the Excel-based dashboard to assess residents. These sources include an internal testing program,[45] multisource feedback from a year-long longitudinal ambulatory Long Block,[39] resident self-assessment,[46] and outpatient clinical care measures.[47] The current process leads to difficulty retrieving data for future reviews and a time-consuming preprocessing workflow described above.

Design Recommendation 3: Improving the Interpretability and Actionability of Visualization

At the beginning of this project, it was clear that the assessment data used by the IM team were high quality and rooted in the best practices described in the literature. As the dashboard grew from a small set of graphs to a full assessment system, each user developed their own workflow for using the dashboard to evaluate resident progress. Along with the redesign efforts, additions to the dashboard that may help users interpret and take action using the visualizations could include informational tooltips that describe the content of each figure and provide instructions for use.

Design Recommendation 4: Creating Multiple Views to Support Workflow with High Usability

A comprehensive dashboard would include multiple views of the data, depending upon which user logs in to the system. Workflow can vary greatly depending upon the goal of the user (e.g., a CCC pre-reviewer has a different workflow than a coach sharing the reviews with the resident). Laying out the page differently based on the user's goal may help create a more holistic picture of each resident's competencies and make the system more usable. Dashboard visualization ought to be customized to serve the disparate workflows.

Learner Evaluation

Described briefly above, IM team uses the current dashboard as a tool to help learners understand their own progress. Although they receive basic training on the information contained within the system, most residents do not spend a significant amount of time reviewing the dashboard. A focused and resident-specific learning plan would revolutionize the way that the residents interact with their own data to learn. Additionally, utilizing descriptive tooltips or pop-up windows with explanations of how the data are collected, what the data measure, how the visualization should be interpreted, or other helpful tips will be a key strategy to support usability.

Limitations

The study has multiple limitations. First, two of the eleven participants did not participate in the card sorting interview. One took a job at another institution and was unable to participate in a follow-up card-sorting interview. The other participant performs critical dashboard development tasks but does not spend time utilizing the dashboard for reviews. Therefore, their input was valuable to help the team develop initial qualitative themes but was not necessary for accurate card-sorting results. Second, the needs of the University of Cincinnati IM CCC may be different from CCCs in other fields and institutions. This may limit the generalizability to other residency programs of our specific workflow. Our DRs, however, are broad and could be widely applicable to any program looking to replicate or attempt similar work. Third, end users were defined as the members of the CCC who are tasked with providing summative assessments for residents. Ideally, the list of end users would also include learners, but this group was not considered in this initial evaluation. In the future, we could look to create a generalizable process and/or system with a common data model that can be utilized by multiple residency programs. Fourth, discomfort with and lack of understanding about how to use certain charts or calculations such as the Control Chart (C) may have influenced rankings from participants. Training on how to use this tool and interpret data will ensure consistency in evaluation of learner data.

Future Directions

Our team is currently completing a full redesign and implementation of the dashboard using the findings of this study. We will look to create a flexible and automatic system allowing for multiple users, different views, and drill-down analysis. We will also conduct thorough usability testing to validate our product.

To address each DR, we will take the following steps in the redesign: First, the new system will integrate the narrative data fully by allowing for drill-down analysis of the narrative comments for each resident and subcompetency. Second, data from all sources will be stored in a relational database. As much as possible, this process will be automated using batched data downloads and real-time queries to MedHub, which store current IM education data for this program. From here, the Flask library[48] in Python with a Representational State Transfer (REST) design will be used to create a web application enabling access control and increased extensibility and connectivity. This will require strong informatics and development support to build and maintain but will allow for more flexible use of the system and fuller control over the development of future features.

Next, a cross-functional development team consisting of user interface/user experience designers, developers, stakeholders from the IM team, and researchers will work together to create and iteratively refine this new tool. Finally, the dashboard will be accessible by both the CCC members and the residents. User-specific views will be created so that residents can safely view their data, but not edit comments or see other resident evaluations, while CCC members can continue their reviews seeing all data.

An additional area of feedback that we received from the CCC members concerned how they utilize the dashboard during coaching sessions with the residents. Because of the data structure and the limitation of Excel spreadsheet, there is no easy way for a resident to access their own data without having access to the data for the entire program. A future project will take this into account and work specifically to address the needs of coaches as they help residents understand their data using this system. After completing the first iteration of the dashboard, our team will interview current residents so that we can take their opinions into account.

Conclusion

We conducted a multimethod, user-centered evaluation of an existing competency assessment dashboard. We identified both strengths and weaknesses of the current dashboard and plan to use them in our efforts to redesign the system. We hope the process that we describe of analyzing and optimizing a dashboard for residency education would be helpful for others developing or refining similar tools. From our findings, we recommend that programs integrate both qualitative and quantitative data into their dashboards, develop a strong informatics pipeline to manage that data, create informative and useful visualizations to aid in the review process, and create multiple views of the data to support different workflows.

Clinical Relevance Statement

CCCs aim to assess resident progress and readiness to practice unsupervised. This user-centered evaluation study is clinically relevant because it helps to maintain and update a dashboard that is used to provide specific feedback about important core competencies for physicians-in-training.

Multiple Choice Questions

According to the process mapping and affinity diagramming done in the study, which of the following is a weakness of the current dashboard identified in the study?
- (A) All users work with the same view of the dashboard.
- (B) The assessment data used by the IM team were not of high quality.
- (C) CCC members do not use quantitative scores and qualitative narrative data.
- (D) The required OPAs are not indicated well in the dashboard.
The correct choice is (A). Both quantitative scores and qualitative narrative data are high quality when displayed in the dashboard. However, they were poorly integrated when considering how each member of the CCC reviews and ranks their residents. As mentioned in the manuscript, narrative data and quantitative scores were essential to rank a resident, and the required OPAs are clearly indicated on the dashboard. Option (A) remains the correct answer choice, as the manuscript showed that each CCC member uses the dashboard differently, depending on whether they are talking to the student, preparing a review for the team, or reviewing the resident as a part of the team.
In this multimethod approach, which of the following would be considered a quantitative approach taken in the study to understand the user behavior of the dashboard?
- (A) Affinity diagramming
- (B) Process mapping
- (C) Ranking experiment
- (D) Interviews
The correct answer choice is (C). Although most of the feedback received from dashboard users was qualitative, a ranking experiment was chosen as the third user-centered evaluation approach to allow a quantitative measure of the subjective value placed on dashboard components by each participant. Process mapping offered insight into the workflow of the review committee (presented in [Fig. 3]), and affinity diagramming allowed the identification of themes to make sense of the workflow. While these methods offered narrative data on the usefulness of dashboard components, subjective data gathered from the ranking experiment allowed the study to create a holistic picture combined with the qualitative interview results.

References

References
1 Accreditation Council for Graduate Medical Education. Common program requirements. Accessed January 24, 2022 at: https://acgme.org/what-we-do/accreditation/common-program-requirements/
2 Andolsek KM, Padmore J, Hauer KE, Ekpenyong A, Edgar L, Holmboe E. Clinical Competency Committees: A Guidebook for Programs. Accredidation Council for Graduate Medical Education (2020)
3 van der Vleuten CPM, Schuwirth LWT, Driessen EW. et al. A model for programmatic assessment fit for purpose. Med Teach 2012; 34 (03) 205-214
4 Hauer KE, O'Sullivan PS, Fitzhenry K, Boscardin C. Translating theory into practice: implementing a program of assessment. Acad Med 2018; 93 (03) 444-450
5 Schut S, Maggio LA, Heeneman S, van Tartwijk J, van der Vleuten C, Driessen E. Where the rubber meets the road - An integrative review of programmatic assessment in health care professions education. Perspect Med Educ 2021; 10 (01) 6-13
6 Pack R, Lingard L, Watling CJ, Chahine S, Cristancho SM. Some assembly required: tracing the interpretative work of Clinical Competency Committees. Med Educ 2019; 53 (07) 723-734
7 Lomis KD, Russell RG, Davidson MA. et al. Competency milestones for medical students: design, implementation, and analysis at one medical school. Med Teach 2017; 39 (05) 494-504
8 Spickard III A, Ridinger H, Wrenn J. et al. Automatic scoring of medical students' clinical notes to monitor learning in the workplace. Med Teach 2014; 36 (01) 68-72
9 Dai P. The conceptual model of influencing factors and influencing mechanism on team decision-making quality mediated by information sharing. IB 2013; 05 (04) 119-125
10 Dennis AR. Information exchange and use in small group decision making. Small Group Res 1996; 27 (04) 532-550
11 Kinnear B, Warm EJ, Hauer KE. Twelve tips to maximize the value of a clinical competency committee in postgraduate medical education. Med Teach 2018; 40 (11) 1110-1115
12 Friedman KA, Raimo J, Spielmann K, Chaudhry S. Resident dashboards: helping your clinical competency committee visualize trainees' key performance indicators. Med Educ Online 2016; 21: 29838
13 Boscardin C, Fergus KB, Hellevig B, Hauer KE. Twelve tips to promote successful development of a learner performance dashboard within a medical education program. Med Teach 2018; 40 (08) 855-861
14 Li Q. Overview of data visualization. In: Embodying Data. Singapore: Springer; 2020: 17-47
15 Choi HH, Clark J, Jay AK, Filice RW. Minimizing barriers in learning for on-call radiology residents-end-to-end web-based resident feedback system. J Digit Imaging 2018; 31 (01) 117-123
16 Levin JC, Hron J. Automated reporting of trainee metrics using electronic clinical systems. J Grad Med Educ 2017; 9 (03) 361-365
17 Ehrenfeld JM, McEvoy MD, Furman WR, Snyder D, Sandberg WS. Automated near-real-time clinical performance feedback for anesthesiology residents: one piece of the milestones puzzle. Anesthesiology 2014; 120 (01) 172-184
18 Johna S, Woodward B. Navigating the next accreditation system: a dashboard for the milestones. Perm J 2015; 19 (04) 61-63
19 Cooney CM, Cooney DS, Bello RJ, Bojovic B, Redett RJ, Lifchez SD. Comprehensive observations of resident evolution: a novel method for assessing procedure-based residency training. Plast Reconstr Surg 2016; 137 (02) 673-678
20 Durojaiye AB, Snyder E, Cohen M, Nagy P, Hong K, Johnson PT. Radiology resident assessment and feedback dashboard. Radiographics 2018; 38 (05) 1443-1453
21 Thoma B, Bandi V, Carey R. et al. Developing a dashboard to meet Competence Committee needs: a design-based research project. Can Med Educ J 2020; 11 (01) e16-e34
22 Warm EJ, Mathis BR, Held JD. et al. Entrustment and mapping of observable practice activities for resident assessment. J Gen Intern Med 2014; 29 (08) 1177-1182
23 Warm EJ, Held JD, Hellmann M. et al. Entrusting observable practice activities and milestones over the 36 months of an internal medicine residency. Acad Med 2016; 91 (10) 1398-1405
24 Chen HC, ten Cate O, Delany C, Molloy E. Assessment through entrustable professional activities. In: Learning and Teaching in Clinical Contexts: A Practical Guide. Philidelphia: Elsevier Health Sciences; 2018: 286-304
25 Kelleher M, Kinnear B, Wong SEP, O'Toole J, Warm E. Linking workplace-based assessment to ACGME milestones: a comparison of mapping strategies in two specialties. Teach Learn Med 2020; 32 (02) 194-203
26 Schauer DP, Kinnear B, Kelleher M, Sall D, Schumacher DJ, Warm EJ. Developing the expected entrustment score: accounting for variation in resident assessment. J Gen Intern Med 2022; 37 (14) 3670-3675
27 Wu DTY, Vennemeyer S, Brown K. et al. Usability testing of an interactive dashboard for surgical quality improvement in a large congenital heart center. Appl Clin Inform 2019; 10 (05) 859-869
28 Schoonenboom J, Johnson RB. How to construct a mixed methods research design. Kolner Z Soz Sozialpsychol (Aufl) 2017; 69 (Suppl. 02) 107-131
29 Tague NR. Affinity diagram. In: The Quality Toolbox. 2nd ed.. Milwaukee: ASQ Quality Press; 2005: 96-100
30 ASQ. What is an affinity diagram? K-J Method. Accessed August 17, 2022 at: https://asq.org/quality-resources/affinity
31 Nielsen Norman Group (World Leaders in Research-Based User Experience. Affinity Diagramming for Collaboratively Sort UX Findings and Design Ideas. Accessed September 2, 2022 at: https://www.nngroup.com/articles/affinity-diagram/
32 Ali S, Ronaldson S. Ordinal preference elicitation methods in health economics and health services research: using discrete choice experiments and ranking methods. Br Med Bull 2012; 103 (01) 21-44
33 Unertl KM, Novak LL, Johnson KB, Lorenzi NM. Traversing the many paths of workflow research: developing a conceptual framework of workflow terminology through a systematic literature review. J Am Med Inform Assoc 2010; 17 (03) 265-273
34 Rosala M. Why map in Discovery: 3 mapping methods. Published January 16, 2022. Accessed September 11, 2022 at: https://www.nngroup.com/articles/mapping-in-discovery/
35 Leech BL. Asking questions: techniques for semistructured interviews. PS Polit Sci Polit 2002; 35 (04) 665-668
36 Vennemeyer S, Parikh M, Mu S, Kinnear B, Wu DTY. Evaluation of a static dashboard to support resident learning and competency assessment. In: 2020 Workshop on Visual Analytics in Healthcare (VAHC), MD, USA; 2020: 28-30
37 Waxman H, Braunstein G, Dantzker D. et al. Performance on the internal medicine second-year residency in-training examination predicts the outcome of the ABIM certifying examination. J Gen Intern Med 1994; 9 (12) 692-694
38 American College of Physicians. MKSAP: Medical Knowledge Self-Assessment Program VIII. Philadelphia, PA: American College of Physicians; 1988: c1988-c1989 https://search.library.wisc.edu/catalog/999593859302121
39 Warm EJ, Schauer DP, Diers T. et al. The ambulatory long-block: an accreditation council for graduate medical education (ACGME) educational innovations project (EIP). J Gen Intern Med 2008; 23 (07) 921-926
40 Kelleher M, Kinnear B, Sall DR. et al. Warnings in early narrative assessment that might predict performance in residency: signal from an internal medicine residency program. Perspect Med Educ 2021; 10 (06) 334-340
41 Ginsburg S, van der Vleuten CPM, Eva KW. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data. Acad Med 2017; 92 (11) 1617-1621
42 Hanson JL, Rosenberg AA, Lane JL. Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States. Front Psychol 2013; 4: 668
43 Bartels J, Mooney CJ, Stone RT. Numerical versus narrative: a comparison between methods to measure medical student performance during clinical clerkships. Med Teach 2017; 39 (11) 1154-1158
44 Van Der Vleuten CPM, Schuwirth LWT, Driessen EW, Govaerts MJB, Heeneman S. Twelve Tips for programmatic assessment. Med Teach 2015; 37 (07) 641-646
45 Mathis BR, Warm EJ, Schauer DP, Holmboe E, Rouan GW. A multiple choice testing program coupled with a year-long elective experience is associated with improved performance on the internal medicine in-training examination. J Gen Intern Med 2011; 26 (11) 1253-1257
46 Warm EJ, Schauer D, Revis B, Boex JR. Multisource feedback in the ambulatory setting. J Grad Med Educ 2010; 2 (02) 269-277
47 Zafar MA, Diers T, Schauer DP, Warm EJ. Connecting resident education to patient outcomes: the evolution of a quality improvement curriculum in an internal medicine residency. Acad Med 2014; 89 (10) 1341-1347
48 Grinberg M. Flask Web Development. 1st ed.. O'Reilly Media, Inc.; 2014

Figures

Fig. 1 (a) Mock-up of the front page of the dashboard with the following components: (A) Spider Graphs of faculty and peers/allied averaged subcompetency score from last 6 months, (B) Line Graphs demonstrating trend over time of data in (A), (C) Control Chart to assess special cause variation, (D) Main Heatmap of subcompetencies including counts, and z-scores, (E) PivotTable Selectors. (b) Mock-up of second page of the dashboard, with the following components: (F) Comments by Month table, (G) Ratings count table, (H) Subcompetency Review Chart, (I) OPA's Past 6 Months, (J) OPA's ranked >25 Times during residency, (K) CCC Meeting Worksheet, (L) PivotTable Selectors. CCC, clinical competency committee; OPA, observable practice action/activity.

Fig. 2 Study conduction.

Fig. 3 Condensed workflow diagram demonstrates the role of the resident, assessor, pre-reviewer, and CCC committee. Importantly, the pre-review process and the CCC meeting discussions do not share the same workflow but use the same visualizations on the dashboard. Pre-reviewers also do not follow a standard method for reviewing residents and may choose to look at visualizations in any order and choose to highlight whatever commentary they feel is appropriate for the discussion. CCC, clinical competency committee.