Keywords
Ophthalmic Knowledge Assessment Program - institution keyword reports - residency
program - didactics - education - curriculum
The Ophthalmic Knowledge Assessment Program (OKAP) is an annual examination completed
by ophthalmology residents across North America and many other parts of the world.[1]
[2]
[3] Each participant receives a performance report several weeks after the examination.[4] Overall performance is classified by the cognitive domain and subspecialty section
of each question.[5] The cognitive domains include three categories: Recall, Interpretive, and Decision-Making/Clinical Management. Subspecialty sections correspond to the 13 volumes of the Basic Clinical Science
Course (BCSC), the comprehensive curriculum from the American Academy of Ophthalmology
(AAO) from which questions for the OKAP are derived.[6] Scaled scores and percentile ranks are reported.[5] Scaled scores indicate how many standard deviations above or below average a resident
performs compared with all test-takers that year, regardless of training level. Percentile
ranks indicate the percentage of other examinees at the same training level who score
below the resident.
Each residency Program Director also receives a similar cognitive domain and keyword
report that summarizes the cumulative performance of residents within their program.
The cognitive domain report includes a scaled score for the program as a whole. The
keyword report shows only the raw number of questions answered incorrectly, broken
down by postgraduate year (PGY) level and subspecialty. The OKAP User's Guide encourages
residency programs to use this information to identify program-wide gaps in knowledge.[5] However, the performance between different trainee levels within a program varies
with years of clinical experience, and the performance between different years of
the test fluctuates with test difficulty. The keyword report does not provide an intuitive
approach for assessing the relative performance between the subspecialty sections
for a given residency program which can make it difficult to interpret. In this study,
we propose a method to analyze the institutional keyword report to identify relative
strengths and weaknesses in trainee exam performance between subspecialty sections
and best guide future educational curriculum development.
Methods
In this study, we retrospectively reviewed Boston Medical Center's keyword reports
from 2017 to 2019. We did not include reports from earlier years because the structure
of the OKAP exam, including the number and naming of subspecialty sections, changed
between the 2016 and 2017 test years. We also focused our review on this time period
because it included the most recent test years without significant changes in our
didactic curriculum. All 12 residents over the three PGY training levels completed
the OKAP in 2018 and 2019, while 11 completed the test in 2017. This quality improvement
project did not involve the use of patient information and did not require approval
from our Institutional Review Board.
Normalized Performance
To analyze the OKAP institutional keyword report, we sought to normalize the raw scores
for each PGY level for each test year. For each PGY level and subspecialty section
of the keyword report, we tallied the incorrect responses ([Fig. 1], red box) and calculated the percentage of correct answers. We then normalized the
percentage of correctly answered questions by:
Fig. 1 Annotated excerpt from Ophthalmic Knowledge Assessment Program institutional keyword
report from the Fundamentals and Principles of Ophthalmology section illustrating
the components involved in the analysis including number of incorrectly answered questions
per postgraduate year (red box), cognitive domain category (blue box), and keyword
topics (yellow box).
Where P is the normalized performance such that P = 1 represents average performance across all subspecialties, P > 1 represents above average performance, and P < 1 represents below average performance. C is the percent of correctly answered
questions by a PGY level for a given subspecialty and C (with a line on the top) is
the mean percent correct across all subspecialties for that PGY level. We repeated
this calculation for each PGY level and calendar year of analysis. We also calculated
the breakdown of cognitive domains ([Fig. 1], blue box) and individual keywords ([Fig. 1], yellow box) for incorrectly answered questions as these metrics can be used to
guide specific interventions if any outlying subspecialty sections are identified.
We also combined the normalized performance scores for all PGY levels in all testing
years to identify program-wide trends in subspecialty performance over the study period.
Statistics
Statistical analysis was performed using R Studio, version 1.2.1335 (RStudio, Inc.,
Boston). We performed one-way analysis of variance to assess for statistically significant
difference in subspecialty performance. Post-hoc analysis was performed using the
Tukey–Kramer method. We report 95% confidence intervals (CI). Statistical significance
was defined as p < 0.05.
Results
Our institution's normalized performance in each subspecialty section for the 2017
to 2019 study period is shown in [Fig. 2]. There was a statistically significant difference in the normalized performance
between all subspecialties (p = 0.038). We found above average performance in the Uveitis and Ocular Inflammation
section (95% CI: 1.02–1.18) that was statistically significant (p = 0.031). Though performance in the remaining sections did not differ significantly
from the mean, our analysis allowed us to visualize above average, average, and below
average performance across the other subspecialties. Sections with above average performance
included Neuro-Ophthalmology (95% CI: 0.99–1.32) and Fundamentals of Ophthalmology
(95% CI: 0.99–1.14). Sections with average performance included Refractive Surgery
(95% CI: 0.94–1.22), Glaucoma (95% CI: 0.93–1.07), Retina and Vitreous (95% CI: 0.90–1.10),
and Oculofacial Plastic and Orbital Surgery (95% CI: 0.76–1.09). Sections with below
average performance included Pediatric Ophthalmology (95% CI: 0.79–1.05), General
Medicine (95% CI: 0.79–1.04), External Disease and Cornea (95% CI: 0.88–1.10), Ophthalmic
Pathology and Intraocular Tumors (95% CI: 0.88–1.00), and Lens and Cataract (95% CI:
0.72–1.02). The Clinical Optics section (95% CI: 0.76–1.34) was found to have both
the lowest median performance and the largest range in performance.
Fig. 2 Box plot comparing relative performance between different subspecialties in our residency
program during the 2017, 2018, and 2019 exam years (*: p < 0.05). Black dots represent outliers (1.5x the interquartile range above the upper
quartile and below the lower quartile)
The cognitive domain distribution for incorrectly answered questions in each subsection
is shown in [Fig. 3]. The section with greatest percentage of incorrect answers in the Recall domain was Fundamentals of Ophthalmology (70.5%). The Ophthalmic Pathology and Intraocular
Tumors section had the highest rate of incorrect answers in the Interpretive domain (52.9%) and the Lens and Cataract Section had the highest rate of incorrect
answers in the Decision-Making/Clinical Management domain (34.0%).
Fig. 3 Bar plots showing the cognitive domain of incorrectly answered questions in each
subspecialty section. Percent of incorrect responses is calculated for each subspecialty
as the number of incorrect responses in a given domain divided by total number of
incorrect responses. Cognitive domains include Recall (I), Interpretive (II), and Decision-Making/Clinical Management (III).
Discussion
Institutional keyword reports contain valuable information on OKAP exam performance
of trainees within a residency program. Understanding performance patterns can allow
programs to design data-driven curriculum changes to address relative weaknesses in
specific subspecialty knowledge. Similarly, an appreciation of why certain subspecialties
consistently rank well within a program may reveal educational practices worth exploring
and applying to other subspecialties. While our specific calculations for relative
performance are not generalizable to other institutions, the technique may be universally
applied to provide residency programs with institution specific insight.
The primary benefit of this information is that it allows residency programs to design
educational initiatives to meet medical knowledge-based ophthalmology milestones.[7] For example, the relative quantity and distribution of subspecialty didactics through
the academic year could be adjusted based on an annual assessment of the keyword report.
Using our institution's reports, we were able to identify below average performance
in the Clinical Optics section ([Fig. 2]). Certain exam sections, Clinical Optics in particular, require the memorization
of formulas that are not otherwise used routinely in a clinical setting. Preparation
efforts for these sections may benefit from additional review sessions closer to the
date of the exam. Similarly, sections with a strong emphasis on the cognitive domain
Recall may benefit from increased didactic sessions through the academic year with greater
focus on the BCSC curriculum from which test questions are derived. In contrast, the
cognitive domains Interpretive and Decision-Making/Clinical Management may benefit most from increased educational initiatives in a clinical setting. Potential
interventions include adjusting resident rotation schedules to optimize subspecialty
service exposure to address any relative weaknesses identified by this analysis. The
specific keywords ([Fig. 1], yellow box) provide an excellent starting point for specific subjects that could
be covered during a potential intervention.
There are many advantages to analyzing OKAP performance using a normalized approach.
First, the method involves retrospective analysis of the institutional keyword reports
that each residency program participating in the OKAP receives annually. Second, normalization
across PGY level and test year allows programs to compare performance of all residents
within an institution without the bias of years of clinical training or variability
in test difficulty from year to year. Third, this approach allows for further subgroup
analysis into specific test years or PGY levels. Access to this information can alert
a program and allow for earlier intervention with targeted didactics or clinical rotations.
In addition, analyzing keyword reports before and after an educational intervention
can provide an objective way to quantify the impact of the intervention. Finally,
the anonymity of the report analysis is an important benefit not to be overlooked.
Not only can this method be performed without risk of loss of confidentiality of individual
test scores but also the normalized performance of a residency program can be compared
between institutions without revealing raw program performance. Sharing of this information
may be particularly helpful in the design of interinstitutional didactic curricula.
There are also several limitations of this approach and reasons to carefully interpret
the results. First, since the number of categories and subsection names in the OKAP
exam changed between 2016 and 2017, we are not able to combine and collectively analyze
keyword reports from before 2016 with reports from 2017 onwards. Second, smaller residency
programs may have increased difficulty detecting patterns given greater fluctuation
in individual performance associated with fewer trainees. Wide confidence intervals
due to the presence of outliers could result in a subspecialty area with high variability
in performance. Variability may be seen in subspecialties with high testing uncertainty
characterized by an increased percentage of guessed answer choices in the multiple-choice
exam. Both high- and low-scoring outliers can affect the interpretation of mean program
performance and thus programs may consider further subgroup analysis and recomputing
program averages after excluding certain outliers. Third, many factors besides institutional
didactic strength are involved in test-taking performance including individual test-taking
abilities and residents with English as a second language. There is also some degree
of overlap between the cognitive domains defined in the OKAP user manual.[5]
Recall questions measure an examinee's command of facts, concepts, and principles procedures,
Interpretive questions measure abstraction of facts to identify implication, make inferences and
predictions, and Decision-Making/Clinical Management questions measure problem solving ability in recalling relevant knowledge to make
appropriate decisions about diagnosis and treatment. Not all subspecialty sections
have an equal distribution of questions from these three domains, which must also
be taken into consideration when comparing the relative performance in each section.
Finally, normalized performance is institution specific and does not reflect performance
compared with the national average. Absence of difference between the subspecialty
sections could correspond to either stellar performance or need for improvement across
all categories and therefore should be interpreted in the context of the cumulative
score report.
Residency programs can take advantage of the valuable cumulative data of their trainees
to set program educational objectives and guide curriculum changes just as individual
participants can use the performance report of the annual exam to guide their future
study goals and plans. Performance on the OKAP examination has been associated with
performance on the American Board of Ophthalmology licensing examinations, and OKAP
scores are frequently used as criteria in fellowship applications.[8]
[9]
[10]
[11] We hope this method will serve as a valuable tool to for residency program self-evaluation
and data-driven curriculum improvement to maximize resident success and ensure a broad,
well-rounded curriculum.