Introduction
In the United States, incidence of esophageal squamous cell carcinoma has decreased
over the last few decades, whereas that of esophageal adenocarcinoma (EAC) has increased
[1]. Endoscopic surveillance aims to alter the natural history of the disease by identifying
esophageal neoplasia in its early stages, thus allowing curative endoscopic therapy
to be instituted [2].
In recent years, advanced imaging enhancement techniques such as confocal laser endomicroscopy
(CLE) and chromoscopy with narrow-band imaging (NBI) have been developed to improve
the detection of dysplasia and adenocarcinoma in Barrett’s esophagus (BE) [3]. However, current guidelines still advocate use of the classic endoscopic surveillance
protocol [2]
[3], in which multiple biopsies are necessary, especially in cases of long BE, which
requires greater technical expertise and time. Therefore, endoscopists rarely adhere
to that guideline in such cases, leading to a considerable number of cases going underdiagnosed
[3].
In studies involving small patient samples, it has been demonstrated that image enhancement
techniques increase the detection of dysplasia in BE, although use of such techniques
is restricted to tertiary care centers, thus limiting generalization of the results
to other centers [4].
One such technique is CLE, which allows adequate evaluation and visualization of short
BE and specific areas that are suspicious. However, using CLE to evaluate a long BE
requires longer endoscopy times, making it exhaustive.
Optical coherence tomography (OCT) and volumetric laser endomicroscopy (VLE) are new
technologies that use infrared light, allowing acquisition of high-resolution microscopic
images, in real time, without need for contrast. OCT is an optical imaging technique
consisting of an infrared light catheter, which obtains cross-sectional images of
tissues evaluated in high resolution, analogous to ultrasound but using infrared light
rather than acoustic energy. Transverse images obtained have a resolution of 10 µm,
which is 10 times better than that of high-frequency ultrasound [5]. The evaluation consists in introducing the catheter through the working channel
of a conventional endoscope and positioning the catheter over the specific area of
interest to be analyzed.
The images are acquired by linear longitudinal scanning of the length and depth, the
scan dimensions varying according to the catheter used. Sequential image frames are
continuously obtained and updated at a rate of four frames per second (fps), as well
as being numbered consecutively during acquisition for reference and subsequent data
analysis [5]
[6]. After acquisition of the images, the catheter is removed and biopsy forceps can
immediately be inserted through the working channel in order to biopsy any suspicious
area of mucosa observed during the procedure. OCT devices have evolved from their
first incarnation to the current catheters, presenting improvements not only in axial
and transverse resolution, allowing evaluation of microvascular characteristics such
as OCT angiography (OCTA) improving detection of low- and high-grade dysplasia (LGD
and HGD, respectively), but also in speed of image acquisition, as well as in linear-scanning
diameters, through incorporation of micromotor catheters that allow upper axial scans
with a velocity 100 times greater (400 fps) [7]. Axial and transverse resolution of the catheters used in the studies ranged from
10 µm × 25 µm [6] to 8 µm × 20 µm [7] and 5 µm × 5 µm [8]
[9], the last with five times better resolution, available in ultra-high resolution
OCT (UHR-OCT) and three-dimensional OCT (3 D OCT) [9]. The linear-scanning diameters (length × depth) are 3 mm × 2.5 mm [8], 5.5 mm × 2.5 mm [6], 10 mm × 16 mm [7], and 8 mm × 20 mm [9], which together with the sequential acquisition frame rate – 4 fps [6], 60 fps [9], or 400 fps [7] – depends on the OCT catheter used. External diameters of the catheters vary from
1.8 mm to 2.0 mm and 2.5 mm.
VLE is second-generation, advanced OCT that uses near-infrared light and provides
high-resolution cross-sectional images in real time, the technology involving balloon-centered
imaging probes, a console and monitor [10]. The probe is introduced through the working channel of the therapeutic endoscope
and centralized by a balloon, available in diameters of 14 mm, 17 mm, and 20 mm, with
a length of 6 cm. Imaging is performed by automatic helical pullback of the probe
from the distal to the proximal end of the balloon over a 90-second period. VLE images
have an axial resolution of 7 µm, have a transverse resolution of 30 to 40 µm, and
can reach a depth of up to 3 mm, allowing detailed visualization of the esophageal
mucosa and submucosa. A total of 1200 cross-sectional images are acquired over a 6-cm
VLE scan [10]. It is an interesting option because it allows larger BE segments to be evaluated
in a shorter time [10]. Reconstruction of the images is done in the console allowing real-time diagnosis
of esophageal mucosa abnormalities, as well as guiding endoscopic treatment. VLE with
laser marking (VLEL) has become available, and it is possible to apply VLE-guided
superficial cauterization marks on the esophageal mucosa, without the need to change
devices. Those temporary marks allow the endoscopist to refer directly to the tissue
for subsequent direct histological sampling or to delineate a lesion for subsequent
resection [11].
In a systematic review evaluating accuracy of OCT in the identification of dysplasia
and early-stage cancer, Kohli et al. [12] reported that the technique had a sensitivity of 68 % to 83% and a specificity of
75 % to 82 %. To our knowledge, there have been no previous systematic reviews evaluating
accuracy of VLE in BE. There have also few studies assessing accuracy of OCT and VLE.
This is the first systematic review that evaluates accuracy of VLE in identification
of dysplasia and neoplasia in BE.
Methods
Study protocol and registration
The study protocol, including the search strategies, inclusion criteria, and methods
of statistical analysis, was previously established and registered in the PROSPERO
database (http://www.crd.york.ac.uk/prospero) under no. CRD42018089362.
Eligibility criteria
We selected prospective and retrospective observational studies that employed OCT
and VLE in surveillance of patients with BE and provided sufficient data to calculate
sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio
(LR−), diagnostic odds ratio (DOR), and area under the summary receiver operating
characteristic curve (SROC AUC), either by patient or by lesion, regardless of the
primary outcome defined by the authors of the studies. Studies evaluating squamous
cell carcinoma or other types of esophageal neoplasms were excluded.
Types of subjects
Only studies involving individuals with a histological diagnosis of BE and under endoscopic
surveillance were included. We imposed no restrictions regarding the characteristics
of the subjects (gender, age, risk factors, comorbidities, time since BE diagnosis,
or surveillance after endoscopic eradication for dysplasia or early-stage cancer)
Diagnostic methods
We included studies on use of OCT or VLE for identification of intestinal metaplasia
(IM), LGD, HGD, and intramucosal carcinoma (IMC) in BE. The gold standard for comparison
of diagnostic methods was histopathological analysis of specimens obtained by biopsy,
endoscopic mucosal resection, or endoscopic submucosal dissection of suspicious and
apparently normal areas.
Types of outcome measures
We selected studies in which the primary outcome measure was diagnostic accuracy of
OCT and VLE in identifying of IM and LGD when feasible in the studies, as well as
the identification of HGD and IMC in BE.
Search strategies
We performed a search of the literature, up through mid-January 2019, via the following
indices: Medline (PubMed); Excerpta Medica; Cochrane Central Register of Controlled Trials (CENTRAL); Literatura Latinoamericana y del Caribe en Ciencias de la Salud (LILACS, Latin-American and Caribbean Health Sciences Literature); and Scopus. We
also conducted hand searches of the bibliographies of the studies selected. The search
strategies varied by database:
Medline (PubMed) – (esophagus OR esophageal) AND (neoplasms OR cancer OR adenocarcinoma
OR dysplasia OR dysplastic OR Barrett OR metaplasia) AND (narrow band imaging OR optical
imaging OR NBI OR chromoendoscopy OR chromoscopy OR indigo carmine OR acetic acid
OR methylene blue OR virtual imaging OR FICE OR flexible spectral imaging color enhancement
OR i-scan OR BLI OR blue laser imaging OR high definition OR confocal laser endomicroscopy
OR AFI OR autofluorescence imaging OR volumetric laser endomicroscopy OR VLE OR endoscopic
optical coherence tomography OR OCT OR endoscopy OR endoscopic).
Excerpta Medica – Barrett esophagus AND volumetric laser endomicroscopy AND optical coherence tomography.
CENTRAL and LILACS – Barrett esophagus AND optical coherence tomograph.
Scopus – Barrett esophagus AND optical coherence tomography AND volumetric laser endomicroscopy.
Study selection
Two independent reviewers evaluated the titles and abstracts of the articles initially
identified. Disagreements were resolved by consensus, in consultation with the other
authors.
Because OCT and VLE are new technologies that have not been widely studied, we did
not exclude studies that interpreted image datasets or used offline evaluation. We
also included conference abstracts, as long as they allowed extraction of all data
and those data were clearly presented, given that our objective was to generate a
meta-analysis and not just a systematic review. When more than one article reported
the same study, we selected the article that provided the most information. The analysis
included four types of results: detection of IM; detection of dysplasia in general
(LGD, HGD, or EAC), when it was possible to perform those types of analyses; identification
of HGD; and identification of IMC.
Data collection
Data were collected in the form of absolute values that were provided directly or
were inferred in the text. These data were extracted into 2 × 2 tables including true-positive,
false-positive, true-negative, and false-negative results, to perform the different
types of analyses and subgroup analyses, either by patient or by lesion. When the
studies provided sufficient data to perform the various types of subgroup analyses,
the data were included according to the positivity criterion for the analyzed group,
independent of the primary outcome of the study. Otherwise, subgroup analysis was
performed if feasible. When data were inconclusive or missing, we contacted the authors.
If a study included multiple outcomes from multiple evaluators, the result from the
best evaluator was used in the calculations; when the best evaluator was not identified,
the mean of the results was used. The entire process was concluded by two of the authors,
working independently, and was reviewed by all of the authors. Disagreements were
resolved by consensus.
Criteria for positivity
Criteria applied in order to categorize a result as positive were established by the
authors of each study, and we honored those categories for extraction of the data,
provided that they were suitable for analysis. When interpreting the images obtained
with VLE in the studies selected, we used diagnostic criteria for positivity that
were based on diagnostic algorithms, scores, or simply the mention of suspicious findings,
in order to determine the type of diagnosis in non-neoplastic lesions (indefinite
for dysplasia or LGD) or neoplasia (HGD or IMC). Unfortunately, those diagnostic criteria
were not standardized across the studies. A positive result was defined according
to the criterion proposed by the author(s). Although the scores are not standardized,
there is considerable overlap in the criteria identified, because they are based on
the description of the suspicious findings found in the evaluations employing OCT
and VLE.
Among the OCT scores is the OCT dysplasia index, and we found that, when a cut-off
score ≥ 2 was applied, the index had a sensitivity of 83.3 % and a specificity of
75.0 % for diagnosis of IMC and HGD. In the OCT dysplasia index, the main findings
for HGD/IMC positivity are surface maturation (surface OCT signal stronger than subsurface = 2)
and gland architecture (moderate/severe irregularity, highly asymmetric dilated glands,
or debris within the gland lumen = 2). For diagnosis of IM, the suspicious OCT findings
are as follows: absence of the layered structure of normal squamous epithelium and
of the vertical “pit and crypt” morphology of normal gastric mucosa; disorganized
architecture with irregular mucosal surface; and submucosal glands of low reflectance
below the epithelial surface or invaginations through the epithelium.
In the studies employing VLE, the following are the main findings for HGD/IMC positivity:
effacement of the mucosal layer, defined as layering in < 50 % of the scan; signal
intensity distribution (surface signal > subsurface signal); and gland architecture
(> 5 irregular glands). Details of the scores and scoring are shown in [Fig. 1].
Fig. 1 Optical coherence tomography (OCT) and volumetric laser endomicroscopy (VLE) scores.
Risk of bias in individual studies
Two independent reviewers assessed quality of the studies included in the meta-analysis
on the basis of predefined criteria and discussions involving the remaining authors.
To facilitate that process, we used the Quality Assessment of Diagnostic Accuracy
Studies, version 2 (QUADAS-2) [13], the criteria for which were used to analyze risk of bias and applicability in the
patient selection process; how the OCT and VLE were conducted and interpreted; the
way in which the lesions were classified in the histopathological evaluation; and
the clinical significance of the findings.
Cross-sectional studies with adequate homogeneity between the groups were evaluated
with the technologies under study. Risk of bias in patient selection was considered
unknown when the patient selection process was not clearly defined. Applicability
of patient selection was considered low when the included patients were undergoing
BE surveillance or follow-up after endoscopic treatment for dysplasia or IMC.
We evaluated whether lesion classifications were standardized and whether an appropriate
criterion for positivity was used; if not, risk of bias was considered high. For the
gold standard (biopsy), blinding the pathologist to the endoscopic findings effectively
reduced risk of bias and increased applicability. For studies in which LGD was considered
a true-positive result and there were sufficient data to distinguish LGD from HGD
and adenocarcinoma, LGD findings were reclassified as true-positive results and included
in the subgroup analysis. If the final outcome was not assessed in all patients included
in the studies, the risk of bias was considered high.
Statistical analysis
For the meta-analysis, we used STATA IC/64 software, version 13.1 (Stata Corp., College
Station, Texas, United States), with the MIDAS and METANDI modules, and the Statistical
Analysis System, version 9.3 (SAS Institute Inc., Cary, North Carolina, United States)
with the METADAS macro. for each study, we determined the sensitivity (true-positive
rate); specificity (true-negative rate); LR+and LR− (estimated by calculating the
ratio between the proportion of positive tests and that of negative tests in diseased
vs. nondiseased subjects); and DOR (the LR+ divided by the LR−), with a 95 % confidence
interval. Those values were subsequently combined. We used the I2 statistic to assess the heterogeneity of the studies included. Meta-regression was
used if there was high heterogeneity (I2 > 50 %). We also constructed the SROC curve and calculated the respective AUC that
serves a global measure of the test performance [14].
Results
Articles selected
In the initial search, 10,464 relevant articles were identified. After the titles,
abstracts, and texts had been evaluated, 10,444 articles were excluded ([Fig. 2]). Of the 20 remaining articles, six were excluded. Three articles were excluded
because they did not provide the gold standard result required for construction of
the 2 × 2 table [15]
[16]
[17]. Two articles were excluded because they evaluated buried BE after endoscopic treatment,
one using OCT [18] and one using VLE [19], neither providing sufficient data for calculation of diagnostic accuracy. Another
article was excluded because it evaluated feasibility of laser marking with VLE without
allowing extraction of data for the calculation of diagnostic accuracy [20]. Therefore, the final sample comprised 14 articles.
Fig. 2 Study selection process.
Study characteristics
The 14 studies evaluated provided all of the necessary data to assess diagnostic accuracy
of OCT and VLE, either by patient or by lesion, in identification of IM, dysplasia
in general, HGD, and IMC in patients with BE. The studies evaluated a collective total
of 721 patients (404 in the studies employing OCT and 317 in those employing VLE)
and 1,565 areas of interest (984 in the studies employing OCT and 581 in those employing
VLE), with 4 % losses of the lesions being evaluated only with VLE ([Table 1]). All of the studies selected were cross-sectional [21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34], OCT or VLE being performed sequentially after standard endoscopy: eight studies
included patients undergoing BE surveillance [22]
[24]
[25]
[26]
[29]
[30]
[33]
[34]; one included patients undergoing post-ablation surveillance [23]; two included patients undergoing surveillance after endoscopic eradication [21]
[31]; and three included patients undergoing surveillance after endoscopic mucosal resection
[27]
[28]
[32]. Eight of the studies employed an established diagnostic algorithm, defining true
positives and true negatives [21]
[25]
[26]
[27]
[28]
[31]
[32]
[34], whereas three studies reported suspicious findings [22]
[23]
[24] and three did not establish a clear criterion [29]
[30]
[33].
Table 1
Characteristics of the studies selected.
|
Study
|
Country
|
Patients evaluated (n)
|
Lesion evaluated (n)
|
Gold standard
|
Type of biopsy
|
Real-Time evaluation
|
Type of evaluation
|
Study design
|
Study inclusion criteria
|
Diagnostic criteria score
|
Test method
|
|
Leggett [27]
|
United States
|
20
|
[1]
|
Histology
|
Seattle + TB
|
Yes
|
Vivo
|
Cross-sectional
|
Pos ablation/EMR Surveillance
|
VLE-DA
|
VLE
|
|
Benjamin [23]
|
United States
|
9
|
[1]
|
Histology
|
RB + TB
|
Yes
|
Vivo
|
Cross-sectional
|
Pos ablation surveillance
|
Suspicious Findings
|
VLE
|
|
Trindade [22]
|
United States
|
6
|
[1]
|
Histology
|
RB + TB
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
Suspicious Findings
|
VLE
|
|
Han [29]
|
United States
|
66
|
286
|
Histology
|
RB + TB
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
[2]
|
VLE
|
|
Konda [30]
|
United States
|
141
|
145
|
Histology
|
RB + TB
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
[2]
|
VLE
|
|
Leggett [31]
|
United States
|
27
|
50
|
Histology
|
EMR
|
No
|
Images Datasets
|
Cross-sectional
|
BE surveillance EMR therapy
|
VLE-DA
|
VLE
|
|
Swager [28]
|
United States
|
29
|
40
|
Histology
|
EMR + RB
|
No
|
Images Datasets
|
Cross-sectional
|
BE surveillance EMR therapy
|
VLE prediction score
|
VLE
|
|
Swager [32]
|
United States
|
19
|
60
|
Histology
|
Histology database
|
No
|
Images Datasets
|
Cross-sectional
|
EMR therapy
|
Computer Algotithm
|
VLE
|
|
Isenberg [33]
|
United States
|
33
|
314
|
Histology
|
Seatle + TB
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
[2]
|
OCT
|
|
Evans [34]
|
United States
|
55
|
177
|
Histology
|
Seatle + TB
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
OCT-SI
|
OCT
|
|
Y Chen [24]
|
United States
|
50
|
194
|
Histology
|
Target biopsy
|
Yes
|
Vivo
|
Cross-sectional
|
BE surveillance
|
Suspicious Findings
|
UHR OCT
|
|
Hsiang-Chieh Lee [21]
|
United States
|
32
|
54
|
Histology
|
Seattle + EMR
|
No
|
Images Datasets
|
Cross-sectional
|
BE surveillance, EET surveillance
|
OCTA Criteria
|
OCTA
|
|
Evans [25]
|
United States
|
113
|
123
|
Histology
|
Target biopsy
|
Sim
|
Vivo
|
Cross-sectional
|
BE surveillance
|
OCT- image criteria
|
OCT
|
|
Poneros [26]
|
United States
|
121
|
122
|
Histology
|
Target biopsy
|
Sim
|
Vivo
|
Cross-sectional
|
BE surveillance
|
OCT image criteria
|
OCT
|
|
Total
|
|
721
|
1,565
|
|
|
|
|
|
|
|
|
BE, Barrett’s esophagus; OCT, optical coherence tomography; EMR, endoscopic mucosal
resection; VLE, volumetric laser endomicroscopy; RB, random biopsy; TB, target biopsy
1 Evaluation by patients.
2 Without diagnostic criteria
Positive results were evaluated as follows: HGD and IMC were merged into a single
diagnosis (neoplasia); LGD was considered an adjunct to HGD and IMC in the subgroup
analysis of overall accuracy in the detection of dysplasia; and IM was evaluated in
a separate subgroup analysis.
Bias risk and applicability
Among the eight VLE studies, risk of bias in patient selection was low in five studies
(62.5 %) and high in three (37.5 %). Risk of bias in the index test (OCT- or VLE-guided
targeted biopsy) was high in four studies (50.0 %) and low in two (25.0 %). Risk of
bias in the gold-standard test (random biopsy) was low in all eight studies. Risk
of bias in the QUADAS-2 flow and timing domain was low in six studies (80.0 %) and
high in two (25.0 %). Applicability of the patient selection, index test, and gold-standard
test was low in all eight studies ([Fig. 3]).
Fig. 3 Quality assessment of diagnostic accuracy studies, (QUADAS-2) assessment of risk
of bias in VLE.
Among the six OCT studies, risk of bias in patient selection was low in five studies
(83.3 %) and high in one (16.7 %). Risk of bias in the index test was low in five
studies (83.3 %) and unclear in one (16.7 %). Risk of bias in the gold-standard test
was low in all six studies. Risk of bias in the QUADAS-2 flow and timing domain was
low in five studies (83.3 %) and high in one (16.7 %). Applicability of the patient
selection, index test, and gold-standard test was low all six studies ([Fig. 4]).
Fig. 4 Quality assessment of diagnostic accuracy studies, (QUADAS-2) assessment of risk
of bias in OCT.
Results of individual studies and data synthesis
Analyses of the VLE findings, including subgroup analyses of diagnostic accuracy for
detection of dysplasia in general (LGD, HGD, or IMC) and for detection of HGD/IMC,
were performed by patient and by lesion. Analyses of the OCT findings, including subgroup
analyses of diagnostic accuracy for detection of IM, dysplasia in general, and HGD/IMC,
were performed only by lesion.
OCT findings
HGD/IMC
Per-lesion analysis of diagnostic accuracy for detection of HGD/IMC was based on four
articles [21]
[24]
[33]
[34]. As depicted in [Fig. 5], that analysis yielded a pooled sensitivity of 89 %, pooled specificity of 91 %,
pooled LR+of 9.6 (95 % CI: 1.1 – 86.4), pooled LR− of 0.12 (95 % CI: 0.02 – 0.57),
DOR of 81 (95% CI: 4 – 1702), and SROC AUC of 0.95 (95 % CI: 0.82 – 0.99). In addition,
the overall I2 value was 83 (95 % CI: 64 – 100), indicating high heterogeneity. Therefore, we adjusted
the meta-regression models to identify possible sources of heterogeneity among the
estimates. To that end, the following were considered as predictor variables: use
of diagnostic algorithms; use of conventional OCT; real-time evaluation; and offline
evaluation. As can be seen in [Fig. 6], specificity was significantly higher in articles that employed real-time evaluation
than in those that employed offline evaluation (P = 0.020).
Fig. 5 Forest plot of optical coherence tomography (OCT) sensitivity and specificity for
detection of high-grade dysplasia/intramucosal carcinoma, by lesion and summary receiver
operating characteristic (SROC) curve and area under the curve (AUC).
Fig. 6 Sensitivity and specificity of optical coherence tomography (OCT), estimated by meta-regression.
The per-lesion analysis for real-time (in vivo) evaluation was based on three articles
[24]
[33]
[34]. That analysis yielded a pooled sensitivity of 79 % (95 % CI: 56 – 92 %), pooled
specificity of 94 % (95 % CI: 36 – 99 %), pooled LR+of 15.6 (95 % CI: 0.49 – 490),
pooled LR− of 0.21 (95 % CI: 0.08 – 0.57), and DOR of 73.20 (95 % CI: 1.09 – 489),
denoting a drop in sensitivity due to the exclusion of one article [21], which, in isolation, had a sensitivity of 100 % due to better image quality.
Detection of dysplasia in general
Analysis of diagnostic accuracy for detection of dysplasia in general was based on
three articles [21]
[24]
[33]. Due to the small number of cases evaluated in those three articles, it was feasible
to calculate only the main measures of accuracy or diagnostic performance (i. e.,
it was not possible to calculate an AUC). Therefore, the analysis yielded a pooled
sensitivity of 89% (95 % CI: 69 – 96 %), pooled specificity of 95 % (95 % CI: 48 – 99 %),
pooled LR+of 19.85 (IC 0.93 – 422), pooled LR− of 0.11 (95 % CI: 0.033 – 0.38) and
DOR of 175.74 (95 % CI: 3.425 – 9015.73).
IM
Diagnosis of IM was analyzed based on two articles [25]
[26]. Again, due to the small number of cases, only the main measures of accuracy were
evaluated. For identification of IM, OCT had a pooled sensitivity of 92 % (95 % CI:
66 – 98 %), a pooled specificity of 81 % (95 % CI: 56 – 93 %), pooled LR+of 5.06 (95 %
CI: 3.09 – 15.60), pooled LR− of 0.091 (IC 95 % CI: 0.01 – 0.59), and DOR of 55.58
(95 % CI: 3.09 – 999.49).
VLE findings
Findings by lesion
Per-lesion analysis of diagnostic accuracy for detection of HGD/IMC was based on five
articles
[28]
[29]
[30]
[31]
[32]. That analysis yielded a pooled sensitivity of 85 % (95 % CI: 75 – 91 %), pooled
specificity of 73 % (95 % CI: 52 – 87 %), pooled LR+of 3.2 (95 % CI: 1.6 – 6.4), pooled
LR− of 0.21 (95 % CI: 0.11 – 0.39) and DOR of 15 (95 % CI: 4 – 53). As shown in [Fig. 7], the SROC AUC was 0.87 (95 % CI: 0.66 – 0.96) and the I2 was 53 (95 % CI: 0 – 100), indicating moderate heterogeneity. Meta-regression was
performed to identify possible sources of heterogeneity. Sample size, positivity criterion
established, age, real-time evaluation, and proportion of males were considered as
predictor variables. As can be seen in [Fig. 8], specificity was significantly lower in articles that employed real-time evaluation
than in those that employed offline evaluation (P = 0.010).
Fig. 7 Volumetric laser endomicroscopy (VLE) sensitivity and specificity for detection of
high-grade dysplasia/intramucosal carcinoma, by lesion and summary receiver operating
characteristic (SROC) curve and area under the curve (AUC).
Fig. 8 Sensitivity and specificity of volumetric laser endomicroscopy (VLE), by lesion,
estimated by meta-regression.
Findings by patient
Per-patient analysis of diagnostic accuracy for real-time (in vivo) detection of HGD/EAC
was based on three articles [22]
[23]
[27]. Once again, due to the small number of cases, only the main measures of accuracy
were evaluated. That analysis had a pooled sensitivity of 100 %, pooled specificity
of 55 %(95 % CI: 29 – 79 %), pooled LR+of 2.27 (95 % CI: 1.22 – 4.19), pooled LR−
of 0.00, and DOR of 1.028.
Findings by patient and by lesion
Per-lesion and per-patient analyses of diagnostic accuracy for detection of HGD/IMC
were based on eight articles [22]
[23]
[27]
[28]
[29]
[30]
[31]
[32]. As shown in [Fig. 9], those analyses had a pooled sensitivity of 87 % (95 % CI: 77 – 93 %), pooled specificity
of 68 % (95% CI: 51 %-82 %), pooled LR+of 2.7 (95 % CI: 1.6 – 4.5), pooled LR− of
0.20 (95 % CI: 0.10 – 0.37), DOR of 14 (95 % CI: 5 – 38), and an SROC AUC of 0.87
(95 % CI: 0.67 – 0.96 ). The I2 was 54 (95 % CI: 0 – 100), indicating moderate heterogeneity, and we adjusted the
meta-regression models accordingly. Sample size, positivity criterion established,
age, real-time evaluation, and proportion of males were considered as predictor variables.
As can be seen in [Fig. 10], specificity was again significantly lower in articles that employed real-time evaluation
than in those that employed offline evaluation (P = 0.010).
Fig. 9 Forest plot of volumetric laser endomicroscopy (VLE) sensitivity and specificity
for detection of high-grade dysplasia/intramucosal carcinoma, by lesion and by patient
and summary receiver operating characteristic (SROC) curve and area under the curve
(AUC).
Fig. 10 Sensitivity and specificity of volumetric laser endomicroscopy (VLE), by lesion and
by patient, estimated by meta-regression.
Detection of dysplasia in general
Analysis of diagnostic accuracy for detection of dysplasia in general was based on
four articles [23]
[27]
[29]
[30]. That analysis yielded a pooled sensitivity of 93 % (95 % CI: 80 %-98 %), pooled
specificity of 54 % (95 % CI: 37 – 70 %), pooled LR+of 2.0 (95 % CI: 1.4 – 2.8), pooled
LR− of 0.12 (95 % CI: 0.04 – 0.35), and DOR of 16 (95 % CI: 6 – 46). As shown in [Fig. 11], the SROC AUC was 0.85 (95 % CI: 0.81 – 0.88) and the overall I2 was 42 (95 % CI: 0 – 100), indicating mild heterogeneity.
Fig. 11 Forest plot of volumetric laser endomicroscopy (VLE) sensitivity and specificity
for dysplasia and summary receiver operating characteristic (SROC) curve and area
under the curve (AUC).
Discussion
Because of the importance of identifying esophageal cancer in its early stages, when
a cure is still possible, various advanced diagnostic imaging methods are being studied.
Lack of a recommendation for routine use of such methods in surveillance of patients
with BE is due in part to the fact that their use in daily practice has yet to be
validated in large studies or does not meet the threshold established for surveillance
of patients with BE in the American Society for Gastrointestinal Endoscopy Preservation
and Incorporation of Valuable Endoscopic Innovations (PIVI) initiative [3]. Neither OCT nor VLE has been validated.
Use of OCT and VLE not only facilitates identification and differentiation of lesions
by distinguishing between benign and malignant characteristics by direct microscopic
investigation but also allows evaluations to be performed simultaneously with the
usual endoscopic examination, playing the role of an “optical biopsy,” identifying
suspicious areas that can be biopsied or resected under guidance, thus reducing sampling
errors and improving overall diagnostic sensitivity [35].
Perhaps the greatest challenge in BE surveillance identifying dysplasia and neoplasia
in long BE segments. If such abnormalities are diagnosed during surveillance, ablative
endoscopic therapy can be performed. In a recent study of patients with BE, Alshelleh
et al. [36] demonstrated a significant statistically significant difference between VLE and
VLEL, as compared by using the Seattle protocol, in detection of HGD (14 % vs. 1 %;
P = 0.001) and IMC (11 % vs. 1 %; P = 0.003), supporting use of VLE at teaching facilities. In a study conducted by Leggett
et al. [31], the use of VLE and the VLE diagnostic algorithm showed a sensitivity of 86 % and
a specificity of 88 % for detection of HGD/IMC. In our meta-analysis, we showed that
VLE by lesion had a sensitivity of 85 % and a specificity of 73 % for the detection
of HGD/IMC, with an AUC SROC of 0.87, with values similar to those published previously,
corroborating findings of the studies cited above [31]
[36] and showing that VLE can differentiate neoplastic lesions from non-neoplastic lesions
in patients with BE. However, in the per-patient analysis of detection of HGD/IMC,
we found that VLE showed a higher sensitivity (100 %) and a lower specificity (55 %).
That is because, although most of the VLE studies analyzed reported high sensitivity,
two presented low specificity [22]
[23]. With the exception of the Leggett et al. study [27], in which the authors showed a specificity of 76.9 %, none of the articles evaluated
employed a standardized diagnostic algorithm. That demonstrates the importance of
standardizing a diagnostic algorithm for true-positive results. That also serves to
explain results obtained in the analysis of detection of dysplasia in general, in
which VLE was found to have a sensitivity of 93 % and a specificity of 54 %. That
demonstrates that creation of scores improves detection of neoplastic lesions. However,
LGD is considered a difficult diagnosis to make on the basis of imaging findings as
well as on the basis of pathology findings [28], because presence of inflammation can confound diagnosis of dysplasia [33].
In a previous systematic review, Kholi et al. [12] reported that, for identification of IM, OCT has a sensitivity of 81 % to 97 % and
a specificity of 57 % to 92 %, whereas it has a sensitivity of 68 % to 83 % and a
specificity of 75 % to 82 % for identification of dysplasia and early-stage cancer.
In addition to studies employing first-generation OCT, we included studies employing
the latest generation OCT, which makes it possible to evaluate microvasculature and
differentiate more easily between LGD and neoplasia, as well as to extract data for
subgroup analyses.
As recommended in the most recent guideline [2], endoscopic therapy should be considered the gold-standard treatment modality for
patients with LGD. Complete elimination of IM is also recommended after successful
endoscopic therapy in patients with HGD or IMC. Therefore, we performed a subgroup
analysis to calculate accuracy of OCT for identification of IM, although such an analysis
was not possible for VLE. When identified in the VLE or OCT studies, LGD was considered
a positive result and was grouped with HGD/IMC for the analysis of the detection of
dysplasia in general.
For identification of HGD/IMC, OCT had a pooled sensitivity of 89 % and specificity
of 91 %, values close to those reported in the review article authored by Kholi et
al. [12], as well as an SROC AUC of 0.95. These results indicate that OCT has a strong ability
to differentiate neoplasms from non-neoplasms in patients with BE. For identification
of IM, OCT had a sensitivity and specificity of 92 % and 81 %, respectively, again
corroborating the values reported previously.
Our study has certain limitations, some related to technology and some related to
methodology. The OCT studies were not standardized in terms of the technology employed,
given that it is constantly evolving. The most recent devices have higher resolution
and therefore tend to have a higher rate of lesion detection. Two other factors that
affected diagnostic performance were non-standardization or absence of a diagnostic
algorithm for positivity in the studies and the manner in which the evaluations were
made (in real time or offline). However, although these factors were present in the
VLE and OCT studies. In the OCT only one that was significant in relation to specificity
was the real-time evaluation. In fact, lower accuracy in evaluation of image datasets
(offline evaluation) is likely due to the appearance of the tissues, which is different
in evaluation of images in real time [28]. The type of comparison made in our review allows us to confirm that the best accuracy
is obtained with real-time evaluation.
Among the OCT studies, there was only one that did not establish a diagnostic algorithm
as a criterion, that one study having little effect on the heterogeneity. That was
not so for the four VLE studies that did not establish a diagnostic algorithm or only
reported suspicious findings. All of those studies involved real-time (in vivo) interpretation,
which was found to be a significant, indirect predictor of heterogeneity, with a lower
specificity for detection of HGD/IMC, as previously stated. Algorithms for automated
analysis of VLE data can make a valuable contribution to their interpretation [32], given the large amount of data to be analyzed in real time. Van der Sommen et al.
[37] investigated the potential of algorithm-based computer-aided detection (CADe) for
identification of neoplasms. The authors found that the ex vivo use of VLE and CADe
had an AUC of 0.90 – 0.93, versus 0.81 for specialist physicians, showing that computer-aided
methods can achieve considerably better performance than do human observers.
Another feature that improves VLE performance is use of VLEL, because it allows the
lesions to be delimited and direct histological samples to be obtained with adequate
safety margins, thus improving detection and delineation of neoplastic lesions in
BE [20]. Alshelleh et al. [36] demonstrated that, in groups of patients undergoing VLEL surveillance in accordance
with the Seattle protocol, VLEL showed statistically higher rates of detection of
dysplasia in general (19.6 % vs. 33.7 %; OR = 2.1; P = 0.03).
Following the diagnostic thresholds recommended in the PIVI initiative [35] for detection of HGD/IMC in patients undergoing BE surveillance, it would be necessary
to achieve a sensitivity ≥ 90 %, a negative predictive value ≥ 98 %, and a specificity ≥ 80 %
to replace use of random biopsies with that of targeted biopsies. In the current review,
OCT had thresholds close to or above those targets, with a sensitivity of 89 % and
specificity of 91 %. However, the analyses were made by lesions, due to the lack of
per-patient studies. When the analysis was limited to studies employing real-time
evaluation, sensitivity and specificity were 79 % and 94 %, respectively. Therefore,
OCT does not meet the thresholds needed to replace the current surveillance protocol.
In the per-patient analysis of identification of HGD/IMC, VLE had a pooled sensitivity
of 100 % and a negative predictive value of 100 %, although it had a specificity of
only 55 %. Therefore, it is still not yet possible to replace random biopsies with
targeted biopsies, given the current state of the technology.
It is too early to assess the in vivo diagnostic performance of VLE, given that there
are limited data available. Multicentric studies, with adequate standardization of
screening criteria and diagnostic algorithms, as well as incorporation of VLEL in
the studies are needed to lay the groundwork for its broader use in clinical contexts.
Conclusions
OCT and VLE are both effective methods for differentiating and detecting IM, dysplasia,
and neoplasia in patients with BE. Concomitant use of these technologies and the current
surveillance protocol could improve the rate of detection of dysplasia and neoplasia.