Introduction
Barrett’s-related early esophageal adenocarcinoma (T1 EAC), which is localized to the mucosa (T1a) or submucosa (T1b), constitutes approximately 20 % of all diagnosed EAC [1 ]. While esophagectomy has been the standard of care, achieving a 5-year survival up to 90 % for T1a disease [2 ], this carries significant perioperative mortality (2 %) and morbidity (10 %), even in high-volume centers [3 ]. Moreover, many patients suffer from long-term irreversible digestive dysfunction, which significantly impairs quality of life, including medically refractory reflux (60 %), dumping syndromes (50 %), and dysphagia to solid food (25 %) [4 ].
In recent years, endoscopic resection (ER) for Barrett’s neoplasia has yielded excellent long-term outcomes that are comparable to surgical resection [3 ]
[5 ]
[6 ]
[7 ]
[8 ]. Endoscopic mucosal resection (EMR) is firmly established as the treatment of choice for nodular high grade dysplasia (HGD) and T1a disease [8 ]
[9 ]. Similarly, endoscopic submucosal dissection (ESD) and en bloc EMR are favored for T1b disease because an en bloc R0 excision may facilitate a cure due to the resection plane being at the level of the muscularis propria. In addition, en bloc excision enables the preservation of important biologic features in the resected specimen that predict lymph node metastasis, such as tumor differentiation and lymphovascular invasion [10 ]
[11 ]
[12 ]
[13 ]. Furthermore, it does not preclude or compromise subsequent surgery should more advanced histopathology be detected. There is also emerging evidence to suggest that high risk T1b cases (deep submucosal [SM2 +] and/or poorly differentiated and/or with lymphovascular invasion) may be suitable for close surveillance following an en bloc R0 excision without compromising the oncologic outcomes of further treatment such as surgery in the event of disease progression [14 ].
Thus, distinguishing between T1a and T1b disease is imperative under current treatment paradigms. Current assessment is limited to optical evaluation, as biopsies and endoscopic ultrasound are inaccurate for T staging in this setting [15 ]
[16 ]. Thus, we sought to ascertain whether expert Barrett’s endoscopists could distinguish between T1a and T1b EAC based on optical evaluation.
Methods
Study design and case selection
We retrospectively obtained high quality endoscopic images and pathology reports from patients who underwent either EMR or ESD for Barrett’s neoplasia over 36 months until October 2021 at Westmead Hospital in Sydney, Australia. All patients had previously provided informed consent for research purposes as part of a prospective registry. Approval for the current study was granted by the Human Research Ethics Committee at Westmead Hospital (2021 /ETH01154). Exclusion criteria included age < 18 years. All ER procedures were performed using an Olympus HQ190 adult gastroscope with optical capabilities including high definition white light, narrow-band imaging (NBI), and near-focus magnification (Olympus Medical System Corp., Tokyo, Japan).
A total of 60 sets of endoscopic images of histologically confirmed HGD, T1a, and T1b disease (20 sets for each) were compiled from consecutive patients over the previously defined time period. Each set contained four images, and were standardized to include an overview, a close-up in high definition white light, an NBI, and a near-focus magnification image. The Paris classification of each lesion and Prague classification of the extent of Barrett’s esophagus were provided in each case. Using this information, an online survey was created on Research Electronic Data Capture (REDCap), hosted by the University of Sydney [17 ]
[18 ]. REDCap is a secure, web-based application designed to support data capture for research studies, providing: 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data
downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources.
International Working Group
A total of 24 expert Barrett’s endoscopists from around the world were invited to participate, with 19 completing the survey. Experts were defined as individuals performing Barrett’s endotherapy (> 5 years) with a high resection case volume (> 50 per year) in the field of ER for Barrett’s neoplasia. The initial invitation was sent to experts via email communication by the Principal Investigator through REDCap. A reminder was sent after 4 weeks for cases of non-completion. Experts were required to assess each of the 60 sets of endoscopic images using the REDCap-based survey (see Fig. 1 s and Fig. 2 s in the online-only Supplementary material). Each participant was asked to predict the histology of Barrett’s neoplasia for each lesion: HGD, intramucosal adenocarcinoma (T1a), or submucosal invasive adenocarcinoma (T1b). Participants were also asked to indicate their level of confidence in the diagnosis (high or low confidence).
Outcomes
The primary outcomes of the study were the sensitivity of optical evaluation in identifying T1b disease compared with T1a disease, and the interobserver agreement between experts. The secondary outcome was identification of potential endoscopist-related factors contributing to accurate optical evaluation.
Statistical analysis
Given that multiple experts rated the same endoscopic data sets, observations within each set may be correlated (clustered data). To account for this, variance adjustment using a variance influence factor was performed [19 ]. The variance influence factor was calculated by determining: 1) the overall cluster size; and 2) intraclass coefficients, which represent the resemblance between any two observations within a cluster.
Sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and accuracy were calculated, along with their variance-adjusted 95 %CIs. Positive and negative predictive values were not calculated as these are dependent upon disease prevalence, which was artificially determined in this study by selecting 20 cases with HGD, T1a disease, and T1b disease. Chi-squared tests were used to test for pairwise association between categorical variables. Interobserver agreement between experts was calculated using Fleiss’ kappa statistics and their 95 %CIs, along with respective P values. A modified Likert scale developed by Landis and Koch was used to interpret kappa values (< 0.20 = poor; 0.21–0.40 = fair; 0.41–0.60 = moderate; 0.61–0.80 = substantial; 0.81–1.00 = very good) [20 ]. Statistical significance was defined as P < 0.05 using two-tailed P values. All statistical analyses were
performed using SPSS software version 29 (IBM Corp., Armonk, New York, USA).
Results
A total of 19 expert Barrett’s endoscopists from 8 countries (Australia, USA, Italy, Netherlands, Germany, Canada, Belgium, and Portugal) participated. The majority had been practicing for more than 20 years (n = 12, 63.2 %), and actively reviewed over 100 cases of Barrett’s esophagus each year (n = 10, 52.6 %). The median annual case volume was 50 (interquartile range [IQR] 28–90) for Barrett’s radiofrequency ablation, 50 (IQR 18–75) for Barrett’s EMR, and 25 (IQR 10–45) for Barrett’s ESD. The overall quality of images was graded by experts as excellent (n = 6), good (n = 10), and average (n = 3); none considered the images to be of below average or poor quality. All experts rated each of the 60 sets of images. The pooled results of the responses provided by the experts are shown in Table 1 s and Table 2 s .
EAC (T1a/b) could be distinguished from HGD with a pooled sensitivity of 89.1 % (95 %CI 84.7–93.4) ([Fig. 1 ]). Pooled specificity was 48.4 % (95 %CI 35.3–61.5), PLR was 1.73 (95 %CI 1.31–2.42), NLR was 0.23 (95 %CI 0.11–0.43), and accuracy was 75.5 % (95 %CI 68.2–82.8). Responses to individual cases are presented in [Fig. 2 ]. When predicting the T stage for T1b vs. T1a disease, the pooled sensitivity was 43.8 % (95 %CI 29.9–57.7). Pooled specificity was 59.4 % (95 %CI 46.7–72.1), PLR was 1.08 (95 %CI 0.56–2.07), NLR was 0.95 (95 %CI 0.59–1.50), and accuracy was 51.4 % (95 %CI 38.3–64.9). Responses to individual cases are presented in [Fig. 3 ]. Examples of cases where T1a and T1b lesions were identified correctly and incorrectly by most experts are presented in [Fig. 4 ] and [Fig. 5 ].
Fig. 1 Sensitivity and specificity of optical evaluation for Barrett’s neoplasia and esophageal adenocarcinoma (19 expert Barrett’s endoscopists).
Fig. 2 Predicted histology (T1 cancer versus high grade dysplasia) by 19 expert Barrett’s endoscopists, for each case of confirmed T1 cancer, yielding a sensitivity of 89.1 %.
Fig. 3 Predicted histology (T1b versus T1a cancer) by 19 expert Barrett’s endoscopists, for each case of confirmed T1a/T1b cancer, yielding a sensitivity of 43.8 %.
Fig. 4 Two examples of T1a disease: a identified correctly by 13/19 experts; b identified correctly by only 4/19 experts.
Fig. 5 Two examples of T1b disease: a identified correctly by 14 /19 experts; b identified correctly by only 1 /19 experts.
Of the 20 T1b cases, 12 were SM1, 3 were SM2, and 5 were SM3. When stratifying disease as potentially low risk (T1a/T1b-SM1) vs. high risk (T1b-SM2 /3) (Table 3 s ), predicting the T stage for high risk disease carried a pooled sensitivity of 48.2 % (95 %CI 24.1–72.4). Pooled specificity was 59.3 % (95 %CI 49.4–69.3), PLR was 1.19 (95 %CI 0.48–2.35), NLR was 0.87 (95 %CI 0.40–1.54), and accuracy was 57.0 % (95 %CI 44.3–69.9).
Overall interobserver agreement by Fleiss’ kappa was 0.326 (95 %CI 0.311–0.342; P < 0.001). When comparing T1a/T1b disease with HGD, Fleiss’ kappa was 0.394 (95 %CI 0.373–0.416; P < 0.001). When comparing T1b with T1a disease, Fleiss’ kappa was 0.421 (95 %CI 0.399–0.442; P < 0.001). There was no association between the sensitivity of optical evaluation and Barrett’s case volume (P = 0.39) or individual confidence (high or low) in predicting histology (P = 0.68).
Discussion
In our study, expert Barrett’s endoscopists reliably detected and distinguished early EAC (T1a/T1b disease) from HGD, with a pooled sensitivity of 89.1 %. However, determining the presence of submucosal invasion (i. e. distinguishing T1b from T1a adenocarcinoma) was challenging. Although there was fair-to-moderate interobserver agreement, pooled sensitivity was only 43.8 %. Thus, there remains a significant risk of understaging T1b disease based on optical evaluation. Given that current guidelines consider piecemeal EMR as an acceptable treatment modality for T1a disease but suggest that low risk T1b disease may be endoscopically cured following an en bloc R0 excision, our study has implications on clinical decision making and the selection of endoscopic treatment methods.
For early EAC, it is well reported that ER offers a disease-specific survival rate comparable to surgery, but with fewer adverse events, shorter hospital stays, fewer readmissions, and a lower 90-day mortality [21 ]
[22 ]. However, the choice between EMR and ESD is less clear. The 2022 European Society of Gastrointestinal Endoscopy guidelines recommend ESD for suspected T1b lesions and for “malignant lesions” > 20 mm, and EMR for lesions ≤ 20 mm with a low probability of submucosal invasion [23 ]. Similarly, the 2023 American Society for Gastrointestinal Endoscopy guidelines recommend that for suspected nonulcerated T1 cases, ESD should be performed if the lesion is “bulky” or > 20 mm, and either EMR or ESD for lesions ≤ 20 mm [24 ]. These definitions heavily rely upon the endoscopist’s interpretation of a lesion. Furthermore, factors such as
“bulkiness” (indicating a nodular Paris 0-Is component) and ulceration (indicating a possible Paris 0-IIc component) allude to the presence of submucosal invasion. Thus, at its core, choosing the most ideal resection modality hinges upon prior knowledge of histology (i. e. T1a vs. T1b disease). However, as we have shown in the present study, even among expert Barrett’s endoscopists, optical evaluation cannot facilitate such a distinction, with 56.2 % of T1b cases predicted as T1a.
This study also confirms that pre-resection staging of Barrett’s-related lesions is complex. A few studies have previously assessed lesion morphology in relation to the risk of T1b disease. In a retrospective study of 293 consecutive ERs at a Dutch tertiary center, lesions classified as Paris 0-Is and 0-IIc morphology were more commonly associated with T1b than T1a disease (26 % and 25 %, respectively) [25 ]. Similarly, in a study of 141 pathologically confirmed cases of early EAC from the Cancer Institute Hospital in Tokyo, a complex-type morphology (Paris 0-Is + 0-IIa/IIc/IIb) was associated with a higher incidence of T1b disease than a simple morphology (0-Is, 0-IIa, 0-IIb, or 0-IIc) (59.6 % vs. 22.5 %) [26 ]. As these studies demonstrate, lesion morphology remains imperfect and inadequate for directing the ER treatment algorithm [27 ]. NBI is one of the most extensively studied
tools for characterizing early neoplasia in patients undergoing surveillance for Barrett’s esophagus [28 ]
[29 ]. Recently, the BING Working Group, comprising experts from Europe, the USA, and Japan, validated a consensus-driven NBI classification system for identifying dysplasia and cancer in Barrett’s esophagus, with high accuracy and specificity [30 ]. However, no validated NBI or equivalent virtual chromoendoscopy classification on other endoscopy platforms exist for distinguishing between T1a and T1b disease. It is therefore unsurprising that despite providing experts with NBI images and information on lesion morphology, our study yielded a low sensitivity for identifying T1b disease.
Data from old surgical series report the risk of lymph node metastasis in T1b cases as between 27 % and 44 % [31 ]
[32 ]
[33 ]
[34 ]. In contrast, recent endoscopy studies have yielded rates of 2 %–4 % in low risk T1b disease (well-to-moderately differentiated, without lymphovascular invasion, and < 500 µm invasion into the submucosa) [35 ]
[36 ]. This may readily avoid the need for subsequent chemotherapy, radiotherapy, or surgery in the majority of cases. Furthermore, there is emerging evidence from an ongoing international trial that the risk of lymph node metastasis following an R0 endoscopic excision of high risk T1b lesions (poorly differentiated, and/or lymphovascular invasion, and/or > 500 µm invasion into the submucosa) may be as low as 5 % at a median of 19 months [36 ]. These incidences are comparable to the mortality risk after esophagectomy (2 %) in expert centers [37 ]. Therefore, a less invasive and organ-preserving approach may be appropriate not only for the frail and elderly, but for many patients with low or high risk T1b disease. The caveat to these outcomes is that an en bloc R0 excision is mandated, rendering it crucial to carefully select between piecemeal and en bloc resection techniques. When stratifying disease as potentially low risk (T1a/T1b-SM1) vs. high risk (T1b-SM2/3), sensitivity of optical evaluation in our study only increased from 43.8 % to 48.2 %, again indicating that T staging in early EAC is challenging. Despite fair-to-moderate interobserver agreement among experts in identifying T1b disease, sensitivity was poor. Therefore, to prevent the inadvertent piecemeal resection of T1b disease that may otherwise have been cured or surveilled following an R0 excision, we
suggest that an en bloc resection strategy (ESD or en bloc EMR) be considered for all suspected T1a lesions and T1b lesions.
We found no correlation between the individual confidence (high or low) of experts in predicting histology (P = 0.68), suggesting that confidence may more closely reflect subjective perception rather than objective diagnostic ability. Moreover, there was no correlation between individual sensitivity of T1b detection and the annual case volume (P = 0.39). It naturally follows that additional advanced technologies may be required to improve endoscopic assessment. Short-wavelength endoscopy is a technology that is able to visualize mucosal architecture to a depth of 200 µm and has shown promise in dysplasia characterization, with higher sensitivity compared with high definition white light (88.1 % vs. 73.4 %) [38 ]. However, its ability to distinguish T1a from T1b disease remains unclear. Recently, Ebigbo et al. reported on the innovative application of artificial intelligence (AI) for differentiating between T1a and T1b
Barrett’s-related cancers. The authors developed, trained, and tested a convolutional neural network to estimate the risk of submucosal invasion using high definition white light imaging. The AI system achieved a sensitivity, specificity, and accuracy of 77 %, 64 %, and 71 %, respectively. Although sensitivity appeared to be higher than in our study, when AI was compared with the optical performance of five international experts on the same dataset, there was no significant difference [39 ]. Nonetheless, AI is promising and has the potential to become a key tool for guiding the ER algorithm. Until such a time, it may be prudent to consider en bloc resection as the default strategy for all suspected T1 lesions.
We recognize that our study has some limitations. Optical evaluation relied on still images, which inherently lack the depth and detail obtained during live procedures. Although the quality of these images varied, they were graded by experts who deemed the majority to be good or excellent. Furthermore, the study did not include images obtained through retroflexion, nor could they account for dynamic changes that might occur during live endoscopy, such as those induced by suction. This is the primary reason why experts were not asked to suggest a resection modality for each case. Furthermore, such responses would also have been biased from prior responses pertaining to predicting histology.
Optical evaluation remains a critical step in determining the appropriate ER strategy for Barrett’s-related neoplasia. In our study, international experts were proficient in differentiating cancers from HGD, yet predicting submucosal invasion remained a challenge. Based on these results, and the potential to obtain a cure with an R0 excision of T1b disease, en bloc ER (ESD or en bloc EMR) should be considered for any suspected T1a or T1b lesion. Future studies will be crucial in assessing the potential role of advanced technologies, such as AI, in submucosal invasion prediction. Additionally, new data on the outcomes of T1a and T1b resections will further enrich the post-ER management algorithm in the future.