Introduction
Small colonic polyp characterization has been identified as a key goal for novel endoscopic
advanced imaging techniques and is an area of much research activity [1]. While expert endoscopists and those with an interest in in vivo diagnostic techniques
can obtain high levels of accuracy, this level of performance has not generally been
seen in studies involving non-expert endoscopists who have no particular experience
or expertise in this area [2]
[3]
[4]
[5]
[6].
Hence, the issue of training in advanced diagnostic imaging techniques is of paramount
importance if use of these techniques is to become part of everyday practice in the
endoscopy community as a whole.
Several studies have examined the impact of brief training interventions on small
colonic polyp (SCP) characterization. Significant improvements in the accuracy of
diminutive colonic polyp (DCP) characterization have been demonstrated using advanced
endoscopic imaging techniques (narrow-band imaging (NBI), i-Scan) following still
image or video-based training interventions lasting 20 – 60 minutes [7]
[8]
[9]
[10]. In contrast, a similar study by Coe et al. showed no significant improvement following
two 1-hour training sessions on the use of NBI for SCP characterization [11]. The impact of training on high definition white light (HDWL) polyp characterization
and its relative performance compared to advanced imaging techniques and chromoendoscopy
has not been widely studied. There is some evidence that the accuracy of HDWL characterization
of DCPs can match that obtained using advanced imaging [6]
[12]
[13].
Aims
This study aimed to examine the effect of a web-based training module on the accuracy
of in vivo characterization of diminutive (< 5 mm) colonic polyps using HDWL, i-Scan,
and chromoendoscopy. Differences between groups with varying degrees of endoscopic
experience were also assessed.
Methods
This work was formally assessed and deemed to be an evaluation of an educational tool
and therefore did not require formal research ethics committee approval.
Images obtained during a previous study were used [12]. Informed consent was obtained from all patients for the use of anonymized images
of polyps for future teaching and training purposes. High quality images of diminutive
polyps viewed with HDWL, i-Scan 3 (Surface enhancement + 3, Contrast enhancement + 2,
Tone enhancement – colon), and 0.2 % indigo carmine chromoendoscopy were selected,
with three images of each polyp used (one with each modality). In total, 30 polyps
were included in this study, 15 adenomatous and 15 hyperplastic, which corresponds
to the approximate proportions of each histological type found in DCPs in situ. Therefore,
in total, 90 images of the 30 polyps were used in the study. Histopathological diagnosis
as determined by a UK Bowel Cancer Screening Programme accredited histopathologist
was used as the reference standard for all optical assessments.
Images were randomized using an online randomization tool (GraphPad) and incorporated
into a web-based testing and training module hosted on the University of Portsmouth
virtual learning platform (Moodle). Participants viewed each image in turn and were
asked to predict polyp histology (adenomatous vs hyperplastic) and to also report
their confidence level for each assessment based on the following confidence levels:
-
< 70 % certain = low confidence;
-
70 – 90 % certain = medium confidence;
-
> 90 % certain = high confidence.
All questions had to be answered to complete the training module. To avoid the influence
of other participants, all tests and training were undertaken individually.
Following the initial ‘pre-training’ testing phase, subjects completed a novel web-based
training tool on small colonic polyp characterization. The training module was developed
using Prezi – a cloud-based presentation software program, and was designed to take
20 minutes to complete.
The training module covered several key areas:
-
Outlining the two main types of DCPs (hyperplastic and adenomatous);
-
Modes of assessing polyps in vivo – HDWL, i-Scan, and indigo carmine chromoendoscopy
without magnification;
-
Key features used for DCP characterization (vascularity, vascular patterns, and surface
patterns) and which features were suggestive of each histological type:
-
Hyperplastic polyps – paler, or similar in color to the surrounding mucosa; no visible
surface patterns or large non-compact pits; no visible surface vessels, or a few thin
thread-like surface vessels;
-
Adenomatous polyps – more vascular than the surrounding mucosa; small compact regular
pits, tubular or branched pits; dense regular vessels following the edges of pits;
-
20 image examples of all key features;
-
An interactive “test” section where subjects were asked to predict histology, followed
by feedback on key features visible and the histopathological diagnosis.
To avoid subject bias, the training module did not refer to which key features were
visible with each imaging modality, nor whether any modality was likely to lead to
a more accurate prediction of histology.
Screenshot images from the training module are shown in [Figs. 1 – 4].
Fig. 1 Screenshot image from the web-based training module.
Fig. 2 Screenshot image from the web-based training module.
Fig. 3 Screenshot image from the web-based training module.
Fig. 4 Screenshot image from the web-based training module.
Following completion of the training module, subjects underwent a second test module
using the same 90 polyp images presented in a different randomized order.
Subjects were divided into three groups:
-
3 rd year medical students (n = 7)
-
Gastroenterology Registrars (n = 7)
-
Higher specialist gastroenterology trainees. All currently training in colonoscopy
and polypectomy.
-
No previous formal training in in vivo diagnostic techniques.
-
Median lifetime colonoscopy cases performed: 120 (range 30 – 450).
-
Gastroenterology Consultants (n = 7)
Two expert endoscopists with experience of over 5000 colonoscopies and extensive experience
in in vivo diagnostic techniques, including extensive research experience in this
field, also undertook the test module to validate the image library.
Statistics
The pre-training test was expected to demonstrate a difference in overall accuracy
between the three subject groups. Based on pilot studies performed with similar subjects
groups viewing polyp images, the predicted overall accuracy for each group was:
-
Medical students 60 %;
-
Registrars 70 %;
-
Consultants 75 %.
Power calculations showed that seven subjects per group would be required to provide
80 % power to demonstrate these differences with a significance level of 5 %.
Overall accuracy was expected to be similar in all three groups after training [8]
[14]
[15]. It was anticipated that, following training, there would be a small difference
in accuracy between white light (80 %) and i-Scan/indigo carmine chromoendoscopy (85 %).
Comparisons between groups for mean accuracy, sensitivity, specificity, and negative
predictive value (NPV) were performed by one-way ANOVA. In those comparisons where
a significant difference was detected using ANOVA, post-hoc testing was performed
to detect significant differences between pairs of groups using Hochberg’s GT2 test
(when sample sizes differed) or Tukey’s test. Comparisons of mean accuracy by modality
were also performed using one-way ANOVA and post-hoc Tukey’s test. Differences in
performance pre- and post-training were compared using McNemar’s test. Differences
in confidence levels between groups and modality were compared using the Pearson’s
chi-squared test. Odds ratios were calculated for two level confidence results (low/medium
vs high) if a significant chi-squared statistic was detected.
Mean kappa values within groups pre- and post-training were also calculated and the
accuracy of high confidence predictions was also assessed.
Results
Overall accuracy
Mean overall accuracy results in the pre-training test are shown in [Table 1]. No statistically significant differences were found between the three study groups
pre-training (P > 0.5 for all three comparisons). The mean accuracy of the two experts was significantly
higher than the three study groups (P < 0.01 for all three comparisons).
Table 1
Pre-training and post-training accuracy by subject group.
|
Accuracy pre-training: Correct/Total, % (95 %CI)
|
Accuracy post-training: Correct/Total, % (95 %CI)
|
P value pre-training vs post-training
|
Students
|
372/630, 59.1 % (50.6 – 67.5 %)
|
436/630, 69.2 % (66.1 – 72.3 %)
|
< 0.001
|
Registrars
|
414/630, 65.7 % (60.8 – 70.7 %)
|
448/630, 71.1 % (67.4 – 74.9 %)
|
0.013
|
Consultants
|
393/630, 62.4 % (54.0 – 70.7 %)
|
449/630, 71.3 % (64.7 – 77.9 %)
|
< 0.001
|
Experts
|
156/180, 86.7 % (58.4 – 100 %)
|
|
|
In the post-training test, mean overall accuracy was significantly higher for all
three groups compared to pre-training (P < 0.02 for all three comparisons). Again, there were no statistically significant
differences in accuracy between the three study groups post-training (P > 0.5 for all three comparisons) but mean accuracy remained significantly higher
for the two experts (P < 0.025 for all three comparisons). Pre- and post-training accuracy results for individual
participants are shown in [Fig. 5].
Fig. 5 Individual participant accuracy scores pre- and post-training.
Sensitivity, specificity, and negative predictive value
In analysis of sensitivity, specificity, and negative predictive value (NPV) for adenomatous
histology, no statistically significant differences were found between the three groups
either pre- or post-training ([Tables 2 – 4]).
Table 2
Pre-training and post-training sensitivity by group with comparisons between groups.
|
Sensitivity pre-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
Sensitivity post-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
175/315, 55.6 % (36.8 – 74.3 %)
|
0.285
|
223/315, 70.8 % (63.1 – 78.5 %)
|
0.520
|
< 0.001
|
Registrars
|
218/315, 69.2 % (59.8 – 78.7 %)
|
222/315, 70.5 % (65.6 – 75.3 %)
|
0.789
|
Consultants
|
203/315, 64.4 % (50.3 – 78.6 %)
|
234/315 74.3 % (68.4 – 80.2 %)
|
0.007
|
Table 3
Pre-training and post-training specificity by group with comparisons between groups.
|
Specificity pre-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
Specificity post-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
197/315, 62.5 % (49.6 – 75.5 %)
|
0.973
|
213/315, 67.6 % (55.9 – 79.4 %)
|
0.825
|
0.211
|
Registrars
|
195/315, 61.9 % (49.8 – 74.0 %)
|
226/315, 71.7 % (64.7 – 78.8 %)
|
0.010
|
Consultants
|
190/315, 60.3 % (37.0 – 83.6 %)
|
214/315 67.9 % (50.6 – 85.2 %)
|
0.045
|
Table 4
Pre-training and post-training negative predictive value (NPV) by group with comparisons
between groups.
|
NPV pre-training: Mean % (95 %CI)
|
P value for comparison between groups
|
NPV post-training: Mean % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
59.8 % (51.7 – 67.9 %)
|
0.220
|
70.2 % (67.4 – 73.1 %)
|
0.523
|
0.003
|
Registrars
|
67.2 % (61.4 – 73.0 %)
|
70.9 % (67.4 – 74.4 %)
|
0.294
|
Consultants
|
62.7 % (55.4 – 70.0 %)
|
72.3 % (69.2 – 75.4 %)
|
0.014
|
Interobserver agreement
Mean kappa values for each group pre- and post-training were calculated ([Table 5]). Agreement pre-training in the student group was slight, and fair in the registrar
and consultant groups. There was a significant improvement in mean kappa for all three
groups post-training. Post-training mean kappa was fair for the consultant group and
moderate for the student and registrar groups.
Table 5
Interobserver agreement by group pre- and post-training.
|
Mean pre-training kappa (95 %CI)
|
Mean post-training kappa (95 %CI)
|
Pre vs post kappa change, P value
|
Students
|
0.106 ( – 0.009 – 0.222)
|
0.472 (0.417 – 0.528)
|
< 0.001
|
Registrars
|
0.298 (0.239 – 0.258)
|
0.541 (0.502 – 0.581)
|
< 0.001
|
Consultants
|
0.216 (0.128 – 0.304)
|
0.371 (0.321 – 0.422)
|
0.004
|
Performance by modality
Accuracy rates achieved when assessing images with each of the three modalities (HDWL/i-Scan/indigo
carmine chromoendoscopy) were compared pre- and post-training ([Table 6] and [Table 7]). No significant difference in mean accuracy between the three modalities was found
pre-training. Post-training mean accuracy for HDWL and chromoendoscopy images was
significantly higher than for i-Scan images.
Table 6
Mean accuracy pre-training by modality.
|
Accuracy pre-training: Correct/Total, % (95 %CI)
|
P value
|
HDWL
|
408 /630, 64.8 % (59.1 – 70.4 %)
|
0.317
|
i-Scan
|
378 /630, 60.0 % (56.1 – 63.9 %)
|
Chromoendoscopy
|
392 /630, 62.2 % (58.2 – 66.2 %)
|
HDWL, high definition white light.
Table 7
Mean accuracy post-training by modality.
|
Accuracy post-training: Correct/Total, % (95 %CI)
|
Comparison vs HDWL P value
|
Comparison vs i-Scan P value
|
HDWL
|
459/630, 72.9 % (70.2 – 75.5 %)
|
|
|
i-Scan
|
410/630, 65.1 % (61.3 – 68.8 %)
|
0.002
|
|
Chromoendoscopy
|
464/630, 73.7 % (70.7 – 76.6 %)
|
0.927
|
< 0.001
|
HDWL, high definition white light.
Following training, accuracy for HDWL and chromoendoscopy images improved significantly
(P = 0.002 and P < 0.001, respectively) compared to pre-training. However, accuracy with i-Scan images
did not improve significantly post-training (P = 0.074).
Confidence ratings by subject group
Pre-training, there were significant differences between the three groups in the spread
of confidence ratings. Students rated few of their predictions as high confidence
(only 3.7 % overall). Registrars and consultants made more high confidence predictions,
although the majority of their predictions were still rated as low or medium confidence
([Table 8] and [Table 9]).
Table 8
Prediction confidence ratings pre-training.
|
Confidence in predictions pre-training
|
P value
|
|
Low
|
Medium
|
High
|
Students
|
428 (67.9 %)
|
179 (28.4 %)
|
23 (3.7 %)
|
< 0.001
|
Registrars
|
214 (34.0 %)
|
221 (35.1 %)
|
195 (31.0 %)
|
Consultants
|
187 (29.7 %)
|
176 (27.9 %)
|
267 (42.4 %)
|
All subjects
|
829 (43.9 %)
|
576 (30.5 %)
|
485 (25.7 %)
|
|
Experts
|
19 (10.6 %)
|
36 (20.0 %)
|
125 (69.4 %)
|
|
Table 9
Prediction confidence ratings post-training.
|
Confidence in predictions post-training
|
P value
|
|
Low
|
Medium
|
High
|
Students
|
191 (30.3 %)
|
247 (39.2 %)
|
192 (30.5 %)
|
< 0.001
|
Registrars
|
139 (22.1 %)
|
207 (32.9 %)
|
284 (45.1 %)
|
Consultants
|
145 (23.0 %)
|
176 (27.9 %)
|
309 (49.0 %)
|
All subjects
|
475 (25.1 %)
|
630 (33.3 %)
|
785 (41.5 %)
|
|
Post-training, there remained a significant difference between the three groups in
confidence levels ([Table 9]). However, the proportion of predictions made with high confidence by students had
risen from 3.7 % to 30.5 %, and with medium confidence from 28.4 % to 39.2 %. The
proportion of high confidence predictions also increased significantly for the registrar
and consultant groups but to a lesser degree ([Table 9] and [Table 10]). Registrars and consultants remained more likely to make a high confidence prediction
than students but the difference was much less marked than seen pre-training ([Table 8] and [Table 9]).
Table 10
Odds ratio of high confidence prediction pre- vs post-training.
|
Low/Medium
|
High
|
Odds Ratio high confidence prediction (95 % confidence interval)
|
Prediction confidence – Students
|
|
607
|
23
|
1.0
|
|
438
|
192
|
11.57 (7.38 – 18.14)
|
Prediction confidence – Registrars
|
|
435
|
195
|
1.0
|
|
346
|
284
|
1.83 (1.45 – 2.31)
|
Prediction confidence – Consultants
|
|
363
|
267
|
1.0
|
|
321
|
309
|
1.31 (1.05 – 1.63)
|
Accuracy of high confidence predictions
When only high confidence predictions in the post-training test were analyzed, mean
accuracy was significantly higher for all three study groups (P < 0.01 for all three groups) ([Table 11]). There was no difference in the mean accuracy of high confidence predictions between
the three study groups. Accuracy of high confidence predictions by the two experts
was higher that their overall performance (93.6 % vs 86.7 %) but this did not quite
reach statistical significance (P = 0.052).
Table 11
Mean accuracy of high confidence predictions.
|
High confidence accuracy post-training % (95 %CI)
|
P value
|
Students
|
82.4 % (74.3 – 90.5 %)
|
0.785
|
Registrars
|
79.8 % (76.2 – 83.5 %)
|
Consultants
|
82.9 % (72.2 – 93.5 %)
|
Experts
|
93.6 % (53.6 – 100 %)
|
|
Discussion
This large study examined the baseline in vivo characterization skills for assessing
DCPs amongst three groups with widely varying experience of endoscopy and polypectomy,
plus the impact of a novel web-based training module. Differences between three endoscopic
modalities, HDWL, i-Scan, and indigo carmine chromoendoscopy were also examined.
Perhaps the most striking results are the pre-training accuracy rates, which showed
no statistically significant difference in accuracy between the three participant
groups. Accuracy amongst medical students, who had observed at most a handful of colonoscopies,
did not differ from that of experienced endoscopists. One would logically assume that
the experience of performing several thousand colonoscopies, and hundreds/thousands
of polypectomies during those procedures, would lead to the acquisition of in vivo
diagnostic skills amongst consultants, but these results indicate that is not necessarily
the case. Studies assessing the accuracy of colonic lesion assessment, including early
colorectal cancers, have similarly found that experienced endoscopists performed no
better than trainees or non-endoscopist nurses [16]
[17].
Following the training module, performance improved significantly for all groups,
as did agreement between subjects and confidence in predictions. However, accuracy
remained significantly below that of expert endoscopists. Although not specifically
addressed by this study, the ASGE PIVI criteria for optical diagnosis of DCPs are
unlikely to have been met by any of the groups following training. To reach the levels
of accuracy shown by experts is likely to require an ongoing period of practicing
optical diagnostic skills in vivo with regular review and feedback of performance.
Previous studies have suggested that accurate characterization of DCPs using advanced
endoscopic imaging can be learnt following picture-based training lasting 15 – 20
minutes [7]
[8]. Similar video-based studies reported by Neumann et al. and Patel et al., including
feedback during training, elicited improvements in diagnostic accuracy [9]
[10]. This suggests that feedback on performance is a key component of learning optical
diagnosis skills. Both of these studies gave feedback after each question in their
test modules and hence ongoing learning occurred through the testing phase. Feedback
was not given in the current study, and if included, may have improved performance.
In contrast, a similar study by Coe et al. examined the impact of two 1-hour training
sessions on the use of NBI for small colon polyp characterization. They assessed the
real-time in vivo characterization accuracy of 15 endoscopists who were split into
those receiving training, and a control group, who did not receive training [11]. No significant improvement in prediction of polyp histology, or surveillance intervals,
occurred following the training sessions.
Studies of brief training interventions show varying results in the accuracy achieved
by participants post-training. In this current study, post-training accuracy was around
70 % for all three groups, which may suggest that training endoscopists to achieve
high accuracy rates is more difficult than suggested in other studies. In another
study using i-Scan, participants achieved 94 % accuracy for the final set of study
images [9]. Most other training studies have used NBI and reported that post-training accuracy
levels vary from 80 % [11]
[18] to around 90 % [7]
[8]
[10]. Several factors could influence post-training accuracy: ability of participants
to acquire new skills, effectiveness of the training module, imaging modality used,
and difficulty of the test module.
As demonstrated in [Fig. 5], there was notable variability between participants in the improvement in accuracy
following training. Of the 21 participants, 17 improved their scores following training,
three remained the same and one actually scored lower post-training. This variability
may be explained by differences between subjects in ability and motivation to acquire
new skills.
The training module devised for this study covered several factors used for optical
diagnosis – vascularity, surface patterns, and vascular patterns. NBI training modules
are likely to just cover vascularity and vascular patterns and hence may be simpler
and less likely to confuse participants unfamiliar with advanced imaging techniques.
The training module in this study also covered three modalities – HDWL, i-Scan, and
indigo carmine chromoendoscopy, which may have added complexity and the potential
to overload participants.
The test module used in this study may have been more challenging than those used
in other studies. In a similar study performed by Ignjatovic et al., an expert group
(one of whom also participated in this study) achieved 95 % accuracy [7], compared to 86.7 % in this study. We selected high quality images for this study
but did not restrict selection to polyps which showed very obvious and typical features
of either hyperplastic or adenomatous DCPs, and hence some of the polyps included
may have proven to be more “difficult” than others. There may be a propensity in studies
of this type for researchers to select images which demonstrate very apparent features
of adenomatous and hyperplastic polyps, which may not represent the full spectrum
of DCPs found in clinical practice. This study used still images for the test module
whereas other studies have used short videos [18]
[19] which may enable a more accurate assessment to be made.
Whereas most other studies have only assessed training in just one endoscopic modality,
this study compared three different endoscopic modalities, and participants’ ability
to acquire optical diagnosis assessment skills with each of them. Most published studies
suggest that advanced imaging techniques or chromoendoscopy are superior to white-light
endoscopy in the assessment of SCPs/DCPs [20]
[21]
[22]. The results of this study suggest that this may not be the case for non-expert
endoscopists and novices (medical students). Following the training module, accuracy
with HDWL was not significantly different to that with chromoendoscopy and was actually
significantly higher than accuracy with i-Scan. This may suggest that optical diagnosis
skills with HDWL are easier to acquire than with i-Scan. Endoscopists may have also
been unfamiliar with the i-Scan image, and despite undergoing the training module,
found it more difficult to interpret than a white light image that they would be used
to seeing in everyday practice.
Further studies are required comparing advanced imaging techniques with HDWL imaging
to explore whether HDWL is a viable alternative to NBI/Fujinon intelligent chromoendoscopy
(FICE)/i-Scan for optical diagnosis. Ongoing research is also required to determine
what further training is needed for a non-expert endoscopist to achieved optical diagnostic
accuracy that can meet the ASGE PIVI criteria. This will undoubtedly require a period
of monitored in vivo training and feedback over a period of time, in addition to computer/web-based
initial training. National and international endoscopy societies will need to set
out a clear training and certification process for optical diagnosis for this to become
widespread practice. At present, optical diagnosis skills do not form part of the
curriculum for higher gastroenterology trainees in the UK (Joint Royal Colleges of
Physicians Training Board. Specialty Training Curriculum for Gastroenterology August
2010 (Amendments August 2013). https://www.jrcptb.org.uk/sites/default/files/2010 %20Gastroenterology%20 %28amendment%202013 %29_0.pdf). Clearly, it would make sense to teach these skills alongside technical skills in
endoscopy so that newly qualified consultants have acquired them before entering independent
practice.