Endoscopy 2024; 56(S 02): S79
DOI: 10.1055/s-0044-1782860
Abstracts | ESGE Days 2024
Oral presentation
New Frontiers in Barretts esophagus surveillance 26/04/2024, 11:30 – 12:30 Room 8

Image quality pitfalls in AI: Safeguarding Barrett's neoplasia detection with robust deep learning training strategies

M. Jong
1   Amsterdam UMC, locatie VUmc, Amsterdam, Netherlands
,
T. Jaspers
2   Eindhoven University of Technology, Eindhoven, Netherlands
,
C. Kusters
2   Eindhoven University of Technology, Eindhoven, Netherlands
,
J. Jukema
1   Amsterdam UMC, locatie VUmc, Amsterdam, Netherlands
,
K. Fockens
1   Amsterdam UMC, locatie VUmc, Amsterdam, Netherlands
,
R. van Eijck van Heslinga
3   VU University Medical Center, Amsterdam, Netherlands
,
T. Boers
2   Eindhoven University of Technology, Eindhoven, Netherlands
,
F. Van Der Sommen
2   Eindhoven University of Technology, Eindhoven, Netherlands
,
P. De With
2   Eindhoven University of Technology, Eindhoven, Netherlands
,
J. De Groof
1   Amsterdam UMC, locatie VUmc, Amsterdam, Netherlands
,
J. Bergman
3   VU University Medical Center, Amsterdam, Netherlands
› Author Affiliations
 
 

    Aims Endoscopic artificial intelligence systems, developed in expert centers with high-quality imaging, may underperform in community hospitals due to image quality heterogeneity. This study aimed to quantify the performance degradation of a CADe system for Barrett’s neoplasia, when exposed to the heterogeneous imaging conditions of community hospitals. Subsequently, different state-of-the-art training strategies were evaluated to mitigate this performance loss.

    Methods We developed a CADe system using a high-quality, expert-acquired training set comprising 437 images from 173 neoplastic Barrett’s patients and 574 images from 200 non-dysplastic Barrett’s esophagus patients. We assessed its performance on high, moderate and low-quality test sets, each containing 120 images derived from the same group of 65 neoplastic Barrett’s patients and 55 non-dysplastic Barrett’s patients. These test sets were completely independent from the training set and simulated the heterogeneous image quality of community hospitals. We then applied four robustness enhancing strategies: diversified training data, domain-specific pretraining, targeted data augmentation, and architectural optimization.

    Results The CADe system, when trained exclusively on high-quality data, achieved an AUC score of 82% on the high-quality test set. AUC scores were significantly lower on the moderate (79%; p<0.001) and low-quality (70%; p<0.001) test sets. Incorporating robustness enhancing strategies significantly improved the AUC to 93% for high-quality (p=0.020), 94% for moderate-quality (p=0.006), and 84% for low-quality test sets (p=0.002). These robustness enhancing strategies also led to a significantly decreased performance drop on the moderate (+1% vs -3%; p<0.001) an low-quality test sets (-9% vs -12%; p=0.004).

    Conclusions CADe systems that are trained solely on high-quality images may not perform well on the variable image quality found in routine clinical practice. However, in this study we show that the use of state-of-the-art robustness enhancing strategies can significantly improve its robustness and absolute performance, increasing the likelihood of successful implementation of artificial intelligence systems in clinical practice.


    #

    Conflicts of interest

    Authors do not have any conflict of interest to disclose.

    Publication History

    Article published online:
    15 April 2024

    © 2024. European Society of Gastrointestinal Endoscopy. All rights reserved.

    Georg Thieme Verlag KG
    Rüdigerstraße 14, 70469 Stuttgart, Germany