Keywords
Pancreatobiliary (ERCP/PTCD) - Strictures - ERC topics
Introduction
In parallel with recent advances in high-performance computing technologies, machine
learning algorithms have opened opportunities for reliable predictions of clinical
outcomes of diseases by modeling linear and nonlinear contributions of and interactions
between multiple parameters. Convolutional neural network (CNN)-based image recognition
has emerged as a platform of artificial intelligence, which is increasingly utilized
as a means of extracting abstract data from a series of medical images in an unbiased
fashion [1]
[2]
[3]
[4]
[5]. In the field of diagnostic endoscopy for pancreatobiliary diseases, emerging evidence
suggests the potential for a computer-assisted diagnosis system based on the CNN platform
for endoscopic ultrasound (EUS) images to accurately identify incidental pancreatic
cancer [6] or intraductal papillary mucinous neoplasm-derived carcinoma [7]. However, the utility of machine learning algorithms including CNN-based technologies
has not been explored in clinical research on interventional endoscopy for pancreatobiliary
diseases.
Endoscopic transpapillary deployment of a self-expandable metal stent (SEMS) has been
the mainstay of nonsurgical palliative management of distal malignant biliary obstructions
(MBOs) [8]
[9]
[10], given a longer functional time compared with plastic stents [11]. Pancreatitis has been a common adverse event (AE) of this procedure with potentially
fatal outcomes [12]
[13]. Therefore, a substantial number of studies have investigated risk factors for this
AE, including stent mechanical properties, non-pancreatic cancer tumors, and contrast
injection to the pancreatic duct during endoscopic retrograde cholangiopancreatography
(ERCP) [14]
[15]
[16]
[17]. Given evidence linking baseline morphological features of the pancreas to risk
of pancreatitis following endoscopic SEMS placement [18], it is considered that there are ample opportunities for imaging information to
facilitate risk prediction. However, the utility of deep learning algorithms (e.g.,
CNN) for pre-procedure cross-sectional images has not been investigated in this context.
To better predict risk of pancreatitis following endoscopic SEMS placement for nonresectable
distal MBO, we developed machine learning-based models using not only clinical parameters
but also the probabilities predicted via the CNN model using pre-procedure computed
tomography (CT) images. We assessed the predictabilities of the models and examined
the potential of CNN-based metrics to increase model performance for prediction of
post-ERCP pancreatitis.
Patients and methods
Study design
The current study aimed to construct a CT-based CNN model to predict post-ERCP
pancreatitis following endoscopic SEMS placement for nonresectable distal MBO and
evaluated
the additive effect of the CNN-based probabilities on the machine learning models
based on
clinical parameters ([Fig. 1]
a). We compared the models using the performance metrics
including area under the curve (AUC) in the receiver operating characteristic (ROC)
analysis, positive predictive value (PPV), accuracy, and specificity. Given the pilot
nature
of this feasibility study based on pancreatic morphology, we analyzed a single type
of SEMS
placed for nonresectable distal MBO (WallFlex; Boston Scientific, Massachusetts, United
States) to exclude the possibility of differential pancreatitis risks by stent mechanical
properties [17]
[19]. This SEMS was one of the widely used SEMSs worldwide during the study period, and
its properties and clinical outcomes were well documented [20]
[21]
[22].
Fig. 1 Schematic overview of the current study. a Clinical
scenario of endoscopic placement of a SEMS for nonresectable distal MBO. b Simplified structure of the CNN model. The CNN model based on
pre-procedure CT images was constructed to calculate the predicted probabilities for
post-ERCP pancreatitis. c Machine learning models for the
prediction of post-ERCP pancreatitis. BN, batch normalization; CNN, convolutional
neural
network; Conv, convolutional layer; CT, computed tomography; ERCP, endoscopic retrograde
cholangiopancreatography; FC, fully connected layer; MBO, malignant biliary obstruction;
PD, pancreatic duct; RBF, radial basis function; SEMS, self-expandable metal
stent.
Informed consent for research use of the data was obtained from all participants on
an opt-out basis given the retrospective nature of the study. This study was designed
and conducted according to the guidelines in the Helsinki Declaration (the latest
version updated in 2013). The study was approved by the Ethics Committee at The University
of Tokyo (Tokyo, Japan; approval number, #2058).
Study population
We screened consecutive patients who received initial SEMS placement for nonresectable
distal MBO at The University of Tokyo Hospital (Tokyo, Japan) between March 2009 and
March
2021 ([Fig. 2]). We only included cases with available stent-naïve CT images. We excluded patients
with a history of biliary SEMS placement and patients without available CT images.
Distal
MBO was defined as a malignant biliary stricture located ≥ 2 cm from communication
of the
bilateral hepatic ducts. All SEMS placement procedures were conducted on an inpatient
basis,
and the patients were followed every 2 to 4 weeks after discharge on an outpatient
basis and
through telephone interviews by study physicians until death or June 30, 2023, whichever
came first.
Fig. 2 Flow diagram of selection of patients receiving endoscopic placement of a biliary
metal stent for nonresectable distal malignant biliary obstruction. CT, computed tomography;
MBO, malignant biliary obstruction.
Assessment of post-ERCP pancreatitis
The primary study endpoint was post-ERCP pancreatitis following biliary SEMS placement,
which was defined as follows: 1) new or worsened abdominal pain; 2) prolonged hospitalization
for at least 2 days; and 3) an elevated level of serum amylase (above three times
the upper limit of normal), measured > 24 hours after ERCP [23]
[24]. We routinely evaluated serum levels of amylase and pancreatic amylase 3 hours after
ERCP and the following day. When pancreatitis was suspected based on clinical symptoms
(e.g., abdominal pain, nausea) or blood tests, abdominal CT was conducted. We included
pancreatitis events occurring within 14 days of the index ERCP procedure. Pancreatitis
severity was graded according to the American Society of Gastrointestinal Endoscopy
lexicon guidelines [25].
Endoscopic procedures
SEMSs were deployed via ERCP in a standard manner [26]. Cholangiography along with intraductal ultrasonography was performed to assess
the status of biliary stricture and confirm the absence of accessory bile ducts. Sphincterotomy
was carried out at the discretion of the endoscopist [27]. SEMSs were placed with the distal end being 5 to 10 mm in the duodenum. Prophylactic
nonsteroidal anti-inflammatory drugs were not used routinely because the medications
had not been approved for this indication during the study period in Japan. A prophylactic
pancreatic duct stent was not placed in any of the patients. Procedure time was defined
as the duration from the insertion to withdrawal of an endoscope.
CNN model based on pre-procedure CT images
The deep CNN model was constructed using a series of pre-procedure CT images of the
pancreas as input data (the model structure is summarized in [Fig. 1]
b). The study population (n = 70) was randomly split into
five groups (n = 14 each) with the number of pancreatitis cases balanced between the
groups.
We trained the CNN model in a combined cohort of four groups and computed predictive
probabilities of post-ERCP pancreatitis for each case in the remaining group. This
procedure
was repeated using each of the five groups as a validation set. A study radiologist
(K.Y.)
blinded to other clinical data conducted the image processing and model construction.
The
CNN models were constructed using Python (version, 3.8; Python Software Foundation,
Wilmington, Delaware, United States) and the Chainer library. Image processing (i.e.,
cropping and augmentation) was performed using the Pillow library.
Contrast-enhanced (late arterial-phase) coronal-plane CT images (384 × 384 pixels)
were extracted in the DICOM (Digital Imaging and Communications in Medicine) format.
For each case, a consecutive series of 16 images (2 mm thick) covering the pancreas
were selected, and 256 × 256 pixels well delineating the pancreatic parenchyma with
the pancreatic duct were cropped from each image. Therefore, the shape of the input
data for the CNN model was 256 × 256 × 16 pixels. The CNN model consisted of three
convolutional layers and three batch normalizations, followed by three fully connected
layers. The number of kernels for the convolutional layers were 8, 32, and 128. At
each convolutional layer, 3 × 3-pixel kernels were convolved with a slide of two pixels
for the first two convolutional layers and four pixels for the last convolutional
layer, and the zero-padding. The number of units for the fully connected layers were
2,048, 2,048, and 2. Other hyperparameters used in the CNN model were as follows:
optimizer, AdaGrad; error function, softmax cross entropy; batch size, 20; and the
number of epochs, 20. The image data in the training process were augmented: i.e.,
parallel-shifted, rotated, flipped, contrast-changed, or brightness-changed images
were generated for each original image (≥ 1,728 augmented images per case and ≥ 120,960
images in total).
Statistical analyses including other machine learning models
We utilized five machine learning algorithms that have been extensively validated
and
widely used for prediction of clinical outcomes of diseases [28]
[29]: i.e., the logistic regression model, support vector machine (SVM) with a linear
or
radial basis function (RBF) kernel, random forest classifier, and gradient boosting
classifier ([Fig. 1]
c). For the logistic regression model and SVM with a linear
kernel, continuous variables were standardized. We assessed the importance of variables
in a
random forest classifier including the following variables: age (continuous), sex
(male vs.
female), etiology of MBO (pancreatic cancer vs. bile duct cancer vs. lymph node metastasis),
body mass index (BMI, continuous), SEMS type (fully covered vs. partially covered
vs.
uncovered), SEMS length (continuous), endoscopic sphincterotomy (not performed vs.
performed
during a previous session vs. performed during the index session), a plastic stent
in situ
(absent vs. present), number of guidewire insertions to the pancreatic duct (continuous),
contrast injection to the pancreatic duct (absent vs. present), and procedure time
(continuous) as well as CNN-based probabilities. In addition to the CNN-based probabilities,
we selected four variables with the highest feature importance for the final models:
i.e.,
BMI, procedure time, age, and MBO etiology (Supplementary Fig. 1).
We confirmed that inclusion of the whole set of covariates did not improve model performance
(data not shown). Hyperparameters were determined using the grid search with 5-fold
cross-validation (Supplementary Table 1). With a leave-one-out
cross-validation, we computed the predicted probability of post-ERCP pancreatitis
for each
case in all models. Using the optimal threshold of probability determined based on
the
Youden index, we calculated performance metrics for the models. In secondary analyses,
we
assessed model performance stratified by BMI levels (which exhibited the highest feature
importance) or other relevant parameters. All machine learning analyses were conducted
using
Python 3.11 and the Scikit-learn and xgboost libraries.
To compare clinical characteristics between patients with and without post-ERCP pancreatitis,
we used Student’s t-test for continuous variables and the chi-square or Fisher’s exact test, as appropriate,
for categorical variables. Two-sided P < 0.05 was considered statistically significant. Given multiple comparisons, the
results were interpreted cautiously. These statistical analyses were conducted using
SAS software (version 9.4; SAS Institute, Cary, North Carolina, United States).
Results
Among 70 patients with available coronal-plane images of pre-procedure CT, 21 patients
(30%) developed post-ERCP pancreatitis with mild, moderate, and severe grades in 14,
six, and one patient(s), respectively ([Fig. 2]). Supplementary Table 2 summarizes treatment and outcomes of patients with moderate- or severe-grade pancreatitis.
Compared with patients with no pancreatitis, patients with pancreatitis were more
likely to have higher BMI and non-pancreatic cancer as an etiology of distal MBO ([Table 1]).
Table 1 Demographic and procedure characteristics of patients receiving endoscopic placement
of a biliary metal stent for nonresectable distal malignant biliary obstruction, overall
or
by the presence of post-procedure pancreatitis.
|
|
Post-ERCP pancreatitis
|
|
|
Characteristics*
|
All cases
|
Absent
|
Present
|
P
|
|
(n = 70)
|
(n = 49)
|
(n = 21)
|
|
*Percentage indicates the proportion of cases with a specific characteristic in all
cases or each stratum of post-ERCP pancreatitis. Total percentages may not equal 100%
due to rounding.
ERCP, endoscopic retrograde cholangiopancreatography; IQR, interquartile range; NA,
not available; PD, pancreatic duct; SD, standard deviation; SEMS, self-expandable
metal stent.
|
|
Demographics
|
|
Mean age ± SD, years
|
68.1 ± 11.1
|
69.0 ± 10.7
|
66.0 ± 11.8
|
0.30
|
|
Sex
|
0.53
|
|
|
36 (51%)
|
24 (49%)
|
12 (57%)
|
|
|
|
34 (49%)
|
25 (51%)
|
9 (43%)
|
|
|
Cause of biliary obstruction
|
0.011
|
|
|
65 (93%)
|
48 (98%)
|
17 (81%)
|
|
|
|
3 (4.3%)
|
0
|
3 (14%)
|
|
|
|
2 (2.9%)
|
1 (2.0%)
|
1 (4.8%)
|
|
|
Mean body mass index ± SD, kg/m2
|
21.5 ± 3.4
|
20.9 ± 2.9
|
23.0 ± 3.9
|
0.015
|
|
Procedure characteristics
|
|
Type of SEMS
|
0.61
|
|
|
47 (67%)
|
31 (63%)
|
16 (76%)
|
|
|
|
21 (30%)
|
16 (33%)
|
5 (24%)
|
|
|
|
2 (2.9%)
|
2 (4.1%)
|
0
|
|
|
Length of SEMS
|
0.15
|
|
|
4 (5.7%)
|
1 (2.0%)
|
3 (14%)
|
|
|
|
35 (50%)
|
26 (53%)
|
9 (43%)
|
|
|
|
31 (44%)
|
22 (45%)
|
9 (43%)
|
|
|
Diameter of SEMS
|
|
|
70 (100%)
|
49 (100%)
|
21 (100%)
|
NA
|
|
Endoscopic sphincterotomy
|
0.82
|
|
|
10 (14%)
|
7 (14%)
|
3 (14%)
|
|
|
|
23 (33%)
|
15 (31%)
|
8 (38%)
|
|
|
|
37 (53%)
|
27 (55%)
|
10 (48%)
|
|
|
Plastic stent in situ
|
0.79
|
|
|
35 (50%)
|
25 (51%)
|
10 (48%)
|
|
|
|
35 (50%)
|
24 (49%)
|
11 (52%)
|
|
|
Guidewire insertion to the PD
|
0.21
|
|
|
48 (69%)
|
36 (73%)
|
12 (57%)
|
|
|
|
17 (24%)
|
11 (22%)
|
6 (29%)
|
|
|
|
5 (7.1%)
|
2 (4.1%)
|
3 (14%)
|
|
|
PD injection
|
0.16
|
|
|
58 (83%)
|
43 (88%)
|
15 (71%)
|
|
|
|
12 (17%)
|
6 (12%)
|
6 (29%)
|
|
|
Median procedure time (IQR), minutes
|
33 (20–53)
|
35 (25–50)
|
26 (18–57)
|
0.61
|
|
Prophylactic rectal diclofenac
|
0.30
|
|
|
69 (99%)
|
49 (100%)
|
20 (95%)
|
|
|
|
1 (1.4%)
|
0
|
1 (4.8%)
|
|
In an analysis of the CNN-based probabilities, PPV, accuracy, and specificity were
0.45, 0.66, and 0.71, respectively ([Table 2]). In the ROC analysis, the CNN-based probabilities yielded an AUC of 0.67. When
integrated into machine learning models, the CNN-based probabilities were associated
with higher model performance ([Table 2]). Taking an example of the logistic regression model yielding the largest AUC value
based only on clinical parameters, the addition of the CNN-based probabilities was
associated with increases in AUC (0.72 to 0.74), PPV (0.78 to 0.85), and accuracy
(0.77 to 0.83). However, the addition resulted in no increase in specificity. The
ROC curves are illustrated in Supplementary Fig. 2. In contrast, the addition of the CNN-based probabilities was not associated with
higher performance of the other machine learning models for prediction of post-ERCP
pancreatitis ([Table 2]).
Table 2 Performance metrics of the models for prediction of pancreatitis following endoscopic
placement of a biliary metal stent, with or without convolutional neural network (CNN)-based
probabilities.
|
Performance metric
|
|
Model
|
AUC*
|
PPV
|
Accuracy
|
Specificity
|
|
*The AUC values were calculated using the receiver operating characteristic analysis.
AUC, area under the curve; CNN, convolutional neural network; PPV, positive predictive
value; RBF, radial basis function.
|
|
CNN-based probabilities
|
0.67
|
0.45
|
0.66
|
0.63
|
|
Model without CNN-based probabilities
|
|
|
0.72
|
0.78
|
0.77
|
0.96
|
|
|
0.72
|
0.78
|
0.77
|
0.96
|
|
|
0.59
|
1.00
|
0.79
|
1.00
|
|
|
0.64
|
0.55
|
0.73
|
0.82
|
|
|
0.68
|
0.53
|
0.71
|
0.84
|
|
Model with CNN-based probabilities
|
|
|
0.74
|
0.85
|
0.83
|
0.96
|
|
|
0.71
|
0.82
|
0.80
|
0.96
|
|
|
0.59
|
1.00
|
0.79
|
1.00
|
|
|
0.65
|
0.53
|
0.71
|
0.84
|
|
|
0.67
|
0.56
|
0.73
|
0.84
|
We conducted secondary analyses stratified by the parameters that might affect risk
of
post-ERCP pancreatitis. Compared with patients with lower BMI, patients with higher
BMI were
more likely to experience post-ERCP pancreatitis (40% vs. 20%, respectively; P = 0.068). In an analysis stratified by BMI levels ([Table 3]),
the logistic regression model was associated with the highest AUC in the ROC analysis
in both
strata (0.70 and 0.68 for low and high levels of BMI, respectively). In the logistic
regression model, all performance metrics examined were higher in the BMI-low group
than in
the BMI-high group. In an analysis stratified by age (Supplementary Table
3) or presence of a plastic stent in situ (Supplementary Table
4), the logistic regression model yielded the highest AUC in the ROC analysis in
generally high-risk patients (i.e., younger patients and patients without a plastic
stent) but
not in low-risk patients. In addition, an analysis limited to covered SEMS yielded
consistent
results with the highest AUC in the ROC analysis in the logistic regression model
(Supplementary Table 5).
Table 3 Performance metrics of the models with convolutional neural network-based probabilities
for prediction of pancreatitis following endoscopic placement of a biliary metal stent,
stratified by body mass index.
|
Performance metric
|
|
Model
|
AUC*
|
PPV
|
Accuracy
|
Specificity
|
|
*The AUC values were calculated using the receiver operating characteristic analysis.
†Body mass index was dichotomized at the median value.
AUC, area under the curve; PPV, positive predictive value; RBF, radial basis function.
|
|
Body mass index, low (n = 35)†
|
|
|
0.70
|
1.00
|
0.86
|
1.00
|
|
|
0.59
|
0.67
|
0.83
|
0.96
|
|
|
0.13
|
1.00
|
0.86
|
1.00
|
|
|
0.45
|
0
|
0.74
|
0.93
|
|
|
0.51
|
0
|
0.77
|
0.96
|
|
Body mass index, high (n = 35)†
|
|
|
0.68
|
0.58
|
0.66
|
0.76
|
|
|
0.62
|
0.55
|
0.63
|
0.76
|
|
|
0.31
|
0.80
|
0.69
|
0.95
|
|
|
0.67
|
0.54
|
0.63
|
0.71
|
|
|
0.62
|
0.47
|
0.57
|
0.62
|
Discussion
In a consecutive series of patients with nonresectable distal MBO managed via endoscopic
SEMS placement, the CNN-based profiling of pre-procedure CT images provided a modest
predictive ability for post-ERCP pancreatitis. Adding the CNN-based probabilities
to machine learning models resulted in increases in performance metrics (e.g., PPV
and accuracy) without a decrease in specificity. The findings appeared to be robust
in subpopulations classified by the most influential factor BMI. Our data support
the potential for unbiased information extraction via the deep learning algorithm
in facilitating prediction of post-ERCP pancreatitis in this patient population.
Utilizing pre-procedure CT images of pancreatic morphology under the deep CNN framework,
we attempted to better predict risk of post-ERCP pancreatitis associated with endoscopic
SEMS placement for distal MBO. In an investigation using pre-procedure CT images [18], estimated volume of the pancreatic parenchyma was well correlated with risk of
post-ERCP pancreatitis in this setting. Using axial-plane CT images, the researchers
manually measured the thickness of the pancreatic parenchyma (excluding the main pancreatic
duct) at three prespecified pancreatic regions and found a positive association of
the sum of the thickness values with pancreatitis risk. Despite its simplicity, this
approach was limited by bias due to the non-automated and thus operator-dependent
measurement. The CNN captures the morphological information about the pancreatic ductal
system, parenchymal texture, and pancreatic nodularity in a comprehensive unbiased
fashion. Contrast-enhanced CT is routinely examined before biliary SEMS placement
to evaluate the location and resectability of MBO and exclude the possibility of an
accessory hepatic duct, and therefore, data on CT images are readily available in
clinical practice. Therefore, predictive models via a deep learning approach for CT
images may characterize risk of post-ERCP pancreatitis at no additional material cost
in contrast to biomarker-based prediction approaches. Our data suggest that CNN-based
probabilities per se may only provide a modest predictive ability for post-ERCP pancreatitis.
However, in some of the widely utilized machine learning models, CNN-based probabilities
contributed to reasonable model performance, working synergistically with relevant
clinical parameters. Taken together, there may be ample opportunities for CNN-based
pattern recognition to facilitate our prediction of post-ERCP pancreatitis.
Post-ERCP pancreatitis has been associated with substantial morbidity and mortality
among patients receiving endoscopic biliary interventions [12]
[13]; therefore, a large number of studies have investigated clinical parameters as potential
risk factors as well as preventive measures for this procedure-related AE [30]. Risk stratification based on factors related to patients, procedures, and operators
would help personalize peri-procedure care for patients receiving the treatment. Preventive
measures (e.g., prophylactic pancreatic stent, nonsteroidal anti-inflammatory drugs,
and hydration [31]) can be undertaken for patients who are predicted to be at high risk of developing
post-ERCP pancreatitis. In randomized controlled trials of patients with nonresectable
distal MBO [32]
[33], researchers have demonstrated the feasibility of EUS-guided choledochoduodenostomy
using a lumen-apposing metal stent with a higher technical success rate and comparable
stent patency compared with ERCP-based SEMS placement. During EUS-guided drainage,
the mechanical burden on the pancreatic ductal system and resultant pancreatic inflammation
may be minimized. Given the risks of specific AEs associated with EUS-guided drainage
(e.g., bile leak), further research is warranted to determine the most appropriate
drainage strategy for patients with distal MBO overall. Nonetheless, EUS-guided drainage
may be a reasonable alternative to endoscopic transpapillary drainage in patients
who are predicted to be at high risk of post-ERCP pancreatitis based on the CNN-based
pipeline. Further research is warranted to optimize drainage strategies for distal
MBO according to the CNN-based risk prediction for post-ERCP pancreatitis. A much
larger number of patients receive ERCP for various indications, and prevention of
post-ERCP pancreatitis in this population may have a substantial impact on burdens
to patients and the health care system. Therefore, the utility of CNN-based stratification
of risk of post-ERCP pancreatitis should be investigated in patients receiving ERCP
overall. In addition, scoring systems based on different sets of clinical parameters
from the current study have been reported for prediction of post-ERCP pancreatitis
[34]
[35]. Therefore, future research is warranted to investigate whether integration of deep
learning-based image recognition with these scoring systems improves predictive ability.
We acknowledge the potential limitations of the current investigation. First, we analyzed
a single type of SEMS with the same diameter. This inclusion criterion excluded the
influence of differential risks of post-ERCP pancreatitis associated with mechanical
properties of the SEMSs [17]
[19] while allowing us to focus on clinical parameters and morphological signatures of
the pancreas in risk stratification of post-ERCP pancreatitis. Nonetheless, external
validation including various types of SEMSs is required to ensure the generalizability
of our conclusions. Second, despite our extensive efforts in data augmentation during
construction of the CNN model, the small sample size of the total study population
might limit the robustness of our findings. In the machine learning models (e.g.,
SVM), the small sample size of cases and events compared with the number of variables
might result in model overfitting, thereby reducing the predictive abilities in the
validation sets. Therefore, further investigation of the machine learning models using
multicenter data with an external test set is desirable.
Conclusions
In conclusion, the deep CNN model based on pre-procedure CT images appeared to synergize
with clinical parameters in risk stratification for pancreatitis following endoscopic
SEMS placement for nonresectable distal MBO. There may be opportunities for the deep
learning platform to contribute to refinement of prognostic models in therapeutic
endoscopy for pancreatobiliary diseases.