Introduction
Small colonic polyp characterization has been identified as a key goal for novel endoscopic advanced imaging techniques and is an area of much research activity [1]. While expert endoscopists and those with an interest in in vivo diagnostic techniques can obtain high levels of accuracy, this level of performance has not generally been seen in studies involving non-expert endoscopists who have no particular experience or expertise in this area [2]
[3]
[4]
[5]
[6].
Hence, the issue of training in advanced diagnostic imaging techniques is of paramount importance if use of these techniques is to become part of everyday practice in the endoscopy community as a whole.
Several studies have examined the impact of brief training interventions on small colonic polyp (SCP) characterization. Significant improvements in the accuracy of diminutive colonic polyp (DCP) characterization have been demonstrated using advanced endoscopic imaging techniques (narrow-band imaging (NBI), i-Scan) following still image or video-based training interventions lasting 20 – 60 minutes [7]
[8]
[9]
[10]. In contrast, a similar study by Coe et al. showed no significant improvement following two 1-hour training sessions on the use of NBI for SCP characterization [11]. The impact of training on high definition white light (HDWL) polyp characterization and its relative performance compared to advanced imaging techniques and chromoendoscopy has not been widely studied. There is some evidence that the accuracy of HDWL characterization of DCPs can match that obtained using advanced imaging [6]
[12]
[13].
Aims
This study aimed to examine the effect of a web-based training module on the accuracy of in vivo characterization of diminutive (< 5 mm) colonic polyps using HDWL, i-Scan, and chromoendoscopy. Differences between groups with varying degrees of endoscopic experience were also assessed.
Methods
This work was formally assessed and deemed to be an evaluation of an educational tool and therefore did not require formal research ethics committee approval.
Images obtained during a previous study were used [12]. Informed consent was obtained from all patients for the use of anonymized images of polyps for future teaching and training purposes. High quality images of diminutive polyps viewed with HDWL, i-Scan 3 (Surface enhancement + 3, Contrast enhancement + 2, Tone enhancement – colon), and 0.2 % indigo carmine chromoendoscopy were selected, with three images of each polyp used (one with each modality). In total, 30 polyps were included in this study, 15 adenomatous and 15 hyperplastic, which corresponds to the approximate proportions of each histological type found in DCPs in situ. Therefore, in total, 90 images of the 30 polyps were used in the study. Histopathological diagnosis as determined by a UK Bowel Cancer Screening Programme accredited histopathologist was used as the reference standard for all optical assessments.
Images were randomized using an online randomization tool (GraphPad) and incorporated into a web-based testing and training module hosted on the University of Portsmouth virtual learning platform (Moodle). Participants viewed each image in turn and were asked to predict polyp histology (adenomatous vs hyperplastic) and to also report their confidence level for each assessment based on the following confidence levels:
-
< 70 % certain = low confidence;
-
70 – 90 % certain = medium confidence;
-
> 90 % certain = high confidence.
All questions had to be answered to complete the training module. To avoid the influence of other participants, all tests and training were undertaken individually.
Following the initial ‘pre-training’ testing phase, subjects completed a novel web-based training tool on small colonic polyp characterization. The training module was developed using Prezi – a cloud-based presentation software program, and was designed to take 20 minutes to complete.
The training module covered several key areas:
-
Outlining the two main types of DCPs (hyperplastic and adenomatous);
-
Modes of assessing polyps in vivo – HDWL, i-Scan, and indigo carmine chromoendoscopy without magnification;
-
Key features used for DCP characterization (vascularity, vascular patterns, and surface patterns) and which features were suggestive of each histological type:
-
Hyperplastic polyps – paler, or similar in color to the surrounding mucosa; no visible surface patterns or large non-compact pits; no visible surface vessels, or a few thin thread-like surface vessels;
-
Adenomatous polyps – more vascular than the surrounding mucosa; small compact regular pits, tubular or branched pits; dense regular vessels following the edges of pits;
-
20 image examples of all key features;
-
An interactive “test” section where subjects were asked to predict histology, followed by feedback on key features visible and the histopathological diagnosis.
To avoid subject bias, the training module did not refer to which key features were visible with each imaging modality, nor whether any modality was likely to lead to a more accurate prediction of histology.
Screenshot images from the training module are shown in [Figs. 1 – 4].
Fig. 1 Screenshot image from the web-based training module.
Fig. 2 Screenshot image from the web-based training module.
Fig. 3 Screenshot image from the web-based training module.
Fig. 4 Screenshot image from the web-based training module.
Following completion of the training module, subjects underwent a second test module using the same 90 polyp images presented in a different randomized order.
Subjects were divided into three groups:
-
3 rd year medical students (n = 7)
-
Gastroenterology Registrars (n = 7)
-
Higher specialist gastroenterology trainees. All currently training in colonoscopy and polypectomy.
-
No previous formal training in in vivo diagnostic techniques.
-
Median lifetime colonoscopy cases performed: 120 (range 30 – 450).
-
Gastroenterology Consultants (n = 7)
Two expert endoscopists with experience of over 5000 colonoscopies and extensive experience in in vivo diagnostic techniques, including extensive research experience in this field, also undertook the test module to validate the image library.
Statistics
The pre-training test was expected to demonstrate a difference in overall accuracy between the three subject groups. Based on pilot studies performed with similar subjects groups viewing polyp images, the predicted overall accuracy for each group was:
-
Medical students 60 %;
-
Registrars 70 %;
-
Consultants 75 %.
Power calculations showed that seven subjects per group would be required to provide 80 % power to demonstrate these differences with a significance level of 5 %.
Overall accuracy was expected to be similar in all three groups after training [8]
[14]
[15]. It was anticipated that, following training, there would be a small difference in accuracy between white light (80 %) and i-Scan/indigo carmine chromoendoscopy (85 %).
Comparisons between groups for mean accuracy, sensitivity, specificity, and negative predictive value (NPV) were performed by one-way ANOVA. In those comparisons where a significant difference was detected using ANOVA, post-hoc testing was performed to detect significant differences between pairs of groups using Hochberg’s GT2 test (when sample sizes differed) or Tukey’s test. Comparisons of mean accuracy by modality were also performed using one-way ANOVA and post-hoc Tukey’s test. Differences in performance pre- and post-training were compared using McNemar’s test. Differences in confidence levels between groups and modality were compared using the Pearson’s chi-squared test. Odds ratios were calculated for two level confidence results (low/medium vs high) if a significant chi-squared statistic was detected.
Mean kappa values within groups pre- and post-training were also calculated and the accuracy of high confidence predictions was also assessed.
Results
Overall accuracy
Mean overall accuracy results in the pre-training test are shown in [Table 1]. No statistically significant differences were found between the three study groups pre-training (P > 0.5 for all three comparisons). The mean accuracy of the two experts was significantly higher than the three study groups (P < 0.01 for all three comparisons).
Table 1
Pre-training and post-training accuracy by subject group.
|
Accuracy pre-training: Correct/Total, % (95 %CI)
|
Accuracy post-training: Correct/Total, % (95 %CI)
|
P value pre-training vs post-training
|
Students
|
372/630, 59.1 % (50.6 – 67.5 %)
|
436/630, 69.2 % (66.1 – 72.3 %)
|
< 0.001
|
Registrars
|
414/630, 65.7 % (60.8 – 70.7 %)
|
448/630, 71.1 % (67.4 – 74.9 %)
|
0.013
|
Consultants
|
393/630, 62.4 % (54.0 – 70.7 %)
|
449/630, 71.3 % (64.7 – 77.9 %)
|
< 0.001
|
Experts
|
156/180, 86.7 % (58.4 – 100 %)
|
|
|
In the post-training test, mean overall accuracy was significantly higher for all three groups compared to pre-training (P < 0.02 for all three comparisons). Again, there were no statistically significant differences in accuracy between the three study groups post-training (P > 0.5 for all three comparisons) but mean accuracy remained significantly higher for the two experts (P < 0.025 for all three comparisons). Pre- and post-training accuracy results for individual participants are shown in [Fig. 5].
Fig. 5 Individual participant accuracy scores pre- and post-training.
Sensitivity, specificity, and negative predictive value
In analysis of sensitivity, specificity, and negative predictive value (NPV) for adenomatous histology, no statistically significant differences were found between the three groups either pre- or post-training ([Tables 2 – 4]).
Table 2
Pre-training and post-training sensitivity by group with comparisons between groups.
|
Sensitivity pre-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
Sensitivity post-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
175/315, 55.6 % (36.8 – 74.3 %)
|
0.285
|
223/315, 70.8 % (63.1 – 78.5 %)
|
0.520
|
< 0.001
|
Registrars
|
218/315, 69.2 % (59.8 – 78.7 %)
|
222/315, 70.5 % (65.6 – 75.3 %)
|
0.789
|
Consultants
|
203/315, 64.4 % (50.3 – 78.6 %)
|
234/315 74.3 % (68.4 – 80.2 %)
|
0.007
|
Table 3
Pre-training and post-training specificity by group with comparisons between groups.
|
Specificity pre-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
Specificity post-training: Correct/Total, % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
197/315, 62.5 % (49.6 – 75.5 %)
|
0.973
|
213/315, 67.6 % (55.9 – 79.4 %)
|
0.825
|
0.211
|
Registrars
|
195/315, 61.9 % (49.8 – 74.0 %)
|
226/315, 71.7 % (64.7 – 78.8 %)
|
0.010
|
Consultants
|
190/315, 60.3 % (37.0 – 83.6 %)
|
214/315 67.9 % (50.6 – 85.2 %)
|
0.045
|
Table 4
Pre-training and post-training negative predictive value (NPV) by group with comparisons between groups.
|
NPV pre-training: Mean % (95 %CI)
|
P value for comparison between groups
|
NPV post-training: Mean % (95 %CI)
|
P value for comparison between groups
|
P value pre-training vs post-training
|
Students
|
59.8 % (51.7 – 67.9 %)
|
0.220
|
70.2 % (67.4 – 73.1 %)
|
0.523
|
0.003
|
Registrars
|
67.2 % (61.4 – 73.0 %)
|
70.9 % (67.4 – 74.4 %)
|
0.294
|
Consultants
|
62.7 % (55.4 – 70.0 %)
|
72.3 % (69.2 – 75.4 %)
|
0.014
|
Interobserver agreement
Mean kappa values for each group pre- and post-training were calculated ([Table 5]). Agreement pre-training in the student group was slight, and fair in the registrar and consultant groups. There was a significant improvement in mean kappa for all three groups post-training. Post-training mean kappa was fair for the consultant group and moderate for the student and registrar groups.
Table 5
Interobserver agreement by group pre- and post-training.
|
Mean pre-training kappa (95 %CI)
|
Mean post-training kappa (95 %CI)
|
Pre vs post kappa change, P value
|
Students
|
0.106 ( – 0.009 – 0.222)
|
0.472 (0.417 – 0.528)
|
< 0.001
|
Registrars
|
0.298 (0.239 – 0.258)
|
0.541 (0.502 – 0.581)
|
< 0.001
|
Consultants
|
0.216 (0.128 – 0.304)
|
0.371 (0.321 – 0.422)
|
0.004
|
Performance by modality
Accuracy rates achieved when assessing images with each of the three modalities (HDWL/i-Scan/indigo carmine chromoendoscopy) were compared pre- and post-training ([Table 6] and [Table 7]). No significant difference in mean accuracy between the three modalities was found pre-training. Post-training mean accuracy for HDWL and chromoendoscopy images was significantly higher than for i-Scan images.
Table 6
Mean accuracy pre-training by modality.
|
Accuracy pre-training: Correct/Total, % (95 %CI)
|
P value
|
HDWL
|
408 /630, 64.8 % (59.1 – 70.4 %)
|
0.317
|
i-Scan
|
378 /630, 60.0 % (56.1 – 63.9 %)
|
Chromoendoscopy
|
392 /630, 62.2 % (58.2 – 66.2 %)
|
HDWL, high definition white light.
Table 7
Mean accuracy post-training by modality.
|
Accuracy post-training: Correct/Total, % (95 %CI)
|
Comparison vs HDWL P value
|
Comparison vs i-Scan P value
|
HDWL
|
459/630, 72.9 % (70.2 – 75.5 %)
|
|
|
i-Scan
|
410/630, 65.1 % (61.3 – 68.8 %)
|
0.002
|
|
Chromoendoscopy
|
464/630, 73.7 % (70.7 – 76.6 %)
|
0.927
|
< 0.001
|
HDWL, high definition white light.
Following training, accuracy for HDWL and chromoendoscopy images improved significantly (P = 0.002 and P < 0.001, respectively) compared to pre-training. However, accuracy with i-Scan images did not improve significantly post-training (P = 0.074).
Confidence ratings by subject group
Pre-training, there were significant differences between the three groups in the spread of confidence ratings. Students rated few of their predictions as high confidence (only 3.7 % overall). Registrars and consultants made more high confidence predictions, although the majority of their predictions were still rated as low or medium confidence ([Table 8] and [Table 9]).
Table 8
Prediction confidence ratings pre-training.
|
Confidence in predictions pre-training
|
P value
|
|
Low
|
Medium
|
High
|
Students
|
428 (67.9 %)
|
179 (28.4 %)
|
23 (3.7 %)
|
< 0.001
|
Registrars
|
214 (34.0 %)
|
221 (35.1 %)
|
195 (31.0 %)
|
Consultants
|
187 (29.7 %)
|
176 (27.9 %)
|
267 (42.4 %)
|
All subjects
|
829 (43.9 %)
|
576 (30.5 %)
|
485 (25.7 %)
|
|
Experts
|
19 (10.6 %)
|
36 (20.0 %)
|
125 (69.4 %)
|
|
Table 9
Prediction confidence ratings post-training.
|
Confidence in predictions post-training
|
P value
|
|
Low
|
Medium
|
High
|
Students
|
191 (30.3 %)
|
247 (39.2 %)
|
192 (30.5 %)
|
< 0.001
|
Registrars
|
139 (22.1 %)
|
207 (32.9 %)
|
284 (45.1 %)
|
Consultants
|
145 (23.0 %)
|
176 (27.9 %)
|
309 (49.0 %)
|
All subjects
|
475 (25.1 %)
|
630 (33.3 %)
|
785 (41.5 %)
|
|
Post-training, there remained a significant difference between the three groups in confidence levels ([Table 9]). However, the proportion of predictions made with high confidence by students had risen from 3.7 % to 30.5 %, and with medium confidence from 28.4 % to 39.2 %. The proportion of high confidence predictions also increased significantly for the registrar and consultant groups but to a lesser degree ([Table 9] and [Table 10]). Registrars and consultants remained more likely to make a high confidence prediction than students but the difference was much less marked than seen pre-training ([Table 8] and [Table 9]).
Table 10
Odds ratio of high confidence prediction pre- vs post-training.
|
Low/Medium
|
High
|
Odds Ratio high confidence prediction (95 % confidence interval)
|
Prediction confidence – Students
|
|
607
|
23
|
1.0
|
|
438
|
192
|
11.57 (7.38 – 18.14)
|
Prediction confidence – Registrars
|
|
435
|
195
|
1.0
|
|
346
|
284
|
1.83 (1.45 – 2.31)
|
Prediction confidence – Consultants
|
|
363
|
267
|
1.0
|
|
321
|
309
|
1.31 (1.05 – 1.63)
|
Accuracy of high confidence predictions
When only high confidence predictions in the post-training test were analyzed, mean accuracy was significantly higher for all three study groups (P < 0.01 for all three groups) ([Table 11]). There was no difference in the mean accuracy of high confidence predictions between the three study groups. Accuracy of high confidence predictions by the two experts was higher that their overall performance (93.6 % vs 86.7 %) but this did not quite reach statistical significance (P = 0.052).
Table 11
Mean accuracy of high confidence predictions.
|
High confidence accuracy post-training % (95 %CI)
|
P value
|
Students
|
82.4 % (74.3 – 90.5 %)
|
0.785
|
Registrars
|
79.8 % (76.2 – 83.5 %)
|
Consultants
|
82.9 % (72.2 – 93.5 %)
|
Experts
|
93.6 % (53.6 – 100 %)
|
|
Discussion
This large study examined the baseline in vivo characterization skills for assessing DCPs amongst three groups with widely varying experience of endoscopy and polypectomy, plus the impact of a novel web-based training module. Differences between three endoscopic modalities, HDWL, i-Scan, and indigo carmine chromoendoscopy were also examined.
Perhaps the most striking results are the pre-training accuracy rates, which showed no statistically significant difference in accuracy between the three participant groups. Accuracy amongst medical students, who had observed at most a handful of colonoscopies, did not differ from that of experienced endoscopists. One would logically assume that the experience of performing several thousand colonoscopies, and hundreds/thousands of polypectomies during those procedures, would lead to the acquisition of in vivo diagnostic skills amongst consultants, but these results indicate that is not necessarily the case. Studies assessing the accuracy of colonic lesion assessment, including early colorectal cancers, have similarly found that experienced endoscopists performed no better than trainees or non-endoscopist nurses [16]
[17].
Following the training module, performance improved significantly for all groups, as did agreement between subjects and confidence in predictions. However, accuracy remained significantly below that of expert endoscopists. Although not specifically addressed by this study, the ASGE PIVI criteria for optical diagnosis of DCPs are unlikely to have been met by any of the groups following training. To reach the levels of accuracy shown by experts is likely to require an ongoing period of practicing optical diagnostic skills in vivo with regular review and feedback of performance.
Previous studies have suggested that accurate characterization of DCPs using advanced endoscopic imaging can be learnt following picture-based training lasting 15 – 20 minutes [7]
[8]. Similar video-based studies reported by Neumann et al. and Patel et al., including feedback during training, elicited improvements in diagnostic accuracy [9]
[10]. This suggests that feedback on performance is a key component of learning optical diagnosis skills. Both of these studies gave feedback after each question in their test modules and hence ongoing learning occurred through the testing phase. Feedback was not given in the current study, and if included, may have improved performance.
In contrast, a similar study by Coe et al. examined the impact of two 1-hour training sessions on the use of NBI for small colon polyp characterization. They assessed the real-time in vivo characterization accuracy of 15 endoscopists who were split into those receiving training, and a control group, who did not receive training [11]. No significant improvement in prediction of polyp histology, or surveillance intervals, occurred following the training sessions.
Studies of brief training interventions show varying results in the accuracy achieved by participants post-training. In this current study, post-training accuracy was around 70 % for all three groups, which may suggest that training endoscopists to achieve high accuracy rates is more difficult than suggested in other studies. In another study using i-Scan, participants achieved 94 % accuracy for the final set of study images [9]. Most other training studies have used NBI and reported that post-training accuracy levels vary from 80 % [11]
[18] to around 90 % [7]
[8]
[10]. Several factors could influence post-training accuracy: ability of participants to acquire new skills, effectiveness of the training module, imaging modality used, and difficulty of the test module.
As demonstrated in [Fig. 5], there was notable variability between participants in the improvement in accuracy following training. Of the 21 participants, 17 improved their scores following training, three remained the same and one actually scored lower post-training. This variability may be explained by differences between subjects in ability and motivation to acquire new skills.
The training module devised for this study covered several factors used for optical diagnosis – vascularity, surface patterns, and vascular patterns. NBI training modules are likely to just cover vascularity and vascular patterns and hence may be simpler and less likely to confuse participants unfamiliar with advanced imaging techniques. The training module in this study also covered three modalities – HDWL, i-Scan, and indigo carmine chromoendoscopy, which may have added complexity and the potential to overload participants.
The test module used in this study may have been more challenging than those used in other studies. In a similar study performed by Ignjatovic et al., an expert group (one of whom also participated in this study) achieved 95 % accuracy [7], compared to 86.7 % in this study. We selected high quality images for this study but did not restrict selection to polyps which showed very obvious and typical features of either hyperplastic or adenomatous DCPs, and hence some of the polyps included may have proven to be more “difficult” than others. There may be a propensity in studies of this type for researchers to select images which demonstrate very apparent features of adenomatous and hyperplastic polyps, which may not represent the full spectrum of DCPs found in clinical practice. This study used still images for the test module whereas other studies have used short videos [18]
[19] which may enable a more accurate assessment to be made.
Whereas most other studies have only assessed training in just one endoscopic modality, this study compared three different endoscopic modalities, and participants’ ability to acquire optical diagnosis assessment skills with each of them. Most published studies suggest that advanced imaging techniques or chromoendoscopy are superior to white-light endoscopy in the assessment of SCPs/DCPs [20]
[21]
[22]. The results of this study suggest that this may not be the case for non-expert endoscopists and novices (medical students). Following the training module, accuracy with HDWL was not significantly different to that with chromoendoscopy and was actually significantly higher than accuracy with i-Scan. This may suggest that optical diagnosis skills with HDWL are easier to acquire than with i-Scan. Endoscopists may have also been unfamiliar with the i-Scan image, and despite undergoing the training module, found it more difficult to interpret than a white light image that they would be used to seeing in everyday practice.
Further studies are required comparing advanced imaging techniques with HDWL imaging to explore whether HDWL is a viable alternative to NBI/Fujinon intelligent chromoendoscopy (FICE)/i-Scan for optical diagnosis. Ongoing research is also required to determine what further training is needed for a non-expert endoscopist to achieved optical diagnostic accuracy that can meet the ASGE PIVI criteria. This will undoubtedly require a period of monitored in vivo training and feedback over a period of time, in addition to computer/web-based initial training. National and international endoscopy societies will need to set out a clear training and certification process for optical diagnosis for this to become widespread practice. At present, optical diagnosis skills do not form part of the curriculum for higher gastroenterology trainees in the UK (Joint Royal Colleges of Physicians Training Board. Specialty Training Curriculum for Gastroenterology August 2010 (Amendments August 2013). https://www.jrcptb.org.uk/sites/default/files/2010 %20Gastroenterology%20 %28amendment%202013 %29_0.pdf). Clearly, it would make sense to teach these skills alongside technical skills in endoscopy so that newly qualified consultants have acquired them before entering independent practice.