Subscribe to RSS
DOI: 10.1055/a-0855-3532
A deep neural network improves endoscopic detection of early gastric cancer without blind spots
Corresponding author
Publication History
submitted 10 April 2018
accepted after revision 14 September 2018
Publication Date:
12 March 2019 (online)
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Appendix e1 Supplementary methods
- References
Abstract
Background Gastric cancer is the third most lethal malignancy worldwide. A novel deep convolution neural network (DCNN) to perform visual tasks has been recently developed. The aim of this study was to build a system using the DCNN to detect early gastric cancer (EGC) without blind spots during esophagogastroduodenoscopy (EGD).
Methods 3170 gastric cancer and 5981 benign images were collected to train the DCNN to detect EGC. A total of 24549 images from different parts of stomach were collected to train the DCNN to monitor blind spots. Class activation maps were developed to automatically cover suspicious cancerous regions. A grid model for the stomach was used to indicate the existence of blind spots in unprocessed EGD videos.
Results The DCNN identified EGC from non-malignancy with an accuracy of 92.5 %, a sensitivity of 94.0 %, a specificity of 91.0 %, a positive predictive value of 91.3 %, and a negative predictive value of 93.8 %, outperforming all levels of endoscopists. In the task of classifying gastric locations into 10 or 26 parts, the DCNN achieved an accuracy of 90 % or 65.9 %, on a par with the performance of experts. In real-time unprocessed EGD videos, the DCNN achieved automated performance for detecting EGC and monitoring blind spots.
Conclusions We developed a system based on a DCNN to accurately detect EGC and recognize gastric locations better than endoscopists, and proactively track suspicious cancerous lesions and monitor blind spots during EGD.
#
Introduction
Gastric cancer is the third most lethal and the fifth most common malignancy from a global perspective [1]. It is estimated that about 1 million new gastric cancer cases were diagnosed and about 700 000 people died of gastric cancer in 2012, which represents up to 10 % of the cancer-related deaths worldwide [1]. The 5-year survival rate for gastric cancer is 5 % – 25 % in its advanced stages, but reaches 90 % in the early stages [2] [3]. Early detection is therefore a key strategy to improve patient survival.
In recent decades, endoscopic technology has seen remarkable advances and endoscopy has been widely used as a screening test for early gastric cancer (EGC) [4]. In one series, 7.2 % of patients with gastric cancer had however been misdiagnosed at an endoscopy performed within the previous year, and 73 % of these cases arose from endoscopist errors [5].
The performance quality of EGD varies significantly because of cognitive and technical factors [6]. In the cognitive domain, EGC lesions are difficult to recognize because the mucosa often shows only subtle changes, which require endoscopists to be well trained and armed with a thorough knowledge [4] [7]. In addition, endoscopists could be affected by their subjective state during endoscopy, which restricts the detection of EGC to a large extent [8]. In the technical domain, guidelines to map the entire stomach exist, but are often not well followed, especially in developing countries [9] [10]. Therefore, it is important to develop a feasible and reliable method to alert endoscopists to possible EGC lesions and blind spots.
A potential solution to mitigate the skill variations is to apply artificial intelligence (AI) to EGD examinations. The past decades have seen an explosion of interest in the application of AI in medicine [11]. More recently, a method of AI known as a deep convolutional neural network (DCNN), a method transforming the representation at one level into a more abstract level to make predictions, has opened the door to elaborate image analysis [12]. Recent studies have successfully used DCNNs in the field of endoscopy. Chen et al. achieved accurate classification of diminutive colorectal polyps based on colonoscopy images [13], and Byrne et al. achieved real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps using colonoscopy videos [14]. However, in real-time EGC detection alone with blind spot monitoring, the application of DCNN has not yet been researched.
In this work, we first developed a novel system using DCNN to analyze EGD images of EGC and gastric locations. Furthermore, we exploited an activation map to proactively track suspicious cancerous regions and built a grid model for the stomach to indicate the existence of blind spots on unprocessed EGD videos.
#
Methods
Datasets, data preparation, and sample distribution
The flowchart of the data preparation and training/testing procedure of the DCNN is shown in [Fig. 1]. Networks playing different roles were independently trained, and their functions, inclusion criteria, exclusion criteria, image views, data sources, and data preparation are described in [Table e1] ([Fig. e2]; [15] [16]), available online in Supplementary materials. The sample distribution is presented in [Fig. e3]. Images of the same lesion from multiple viewpoints, or similar lesions from the same person were contained. Extensive attention was paid to ensure that images from the same person were not split between the training, validation, and test sets.
Function |
Inclusion criteria |
Exclusion criteria |
Image views |
Data sources |
Data preparation |
|
Network 1 |
Filter unqualified images |
Patients undergoing EGD examination |
Age < 18 years and residual stomach |
NBI, BLI and white light |
Stored images from patients between 10 February 2016 and 10 March 2018 in Renmin Hospital of Wuhan University |
Two doctoral students classified images into blurry or clear |
Network 2 |
Identify EGC |
Patients with gastric cancer, superficial gastritis, and mild erosive gastritis |
Poor gastric preparation, age < 18 years and residual stomach |
NBI, BLI and white light |
Stored images from patients between 10 February 2016 and 10 March 2018 in Renmin Hospital of Wuhan University, images in published EGC atlas of Olympus company, and open-access EGC repositories |
Two doctoral students classified images into malignancy or non-malignancy by pathology evidence; two endoscopists with more than 10 years of EGD experience reviewed these images |
Network 3 |
Classify gastric locations |
Patients undergoing EGD examination |
Poor gastric preparation, age < 18 years and residual stomach |
White light |
Stored images from patients between 10 February 2018 and 10 March 2018 in Renmin Hospital of Wuhan University |
Two endoscopists with more than 10 years of EGD experience classified images into their corresponding locations according to the European ESGE [15] and Japanese SSS guidelines [16] ([Fig. e2]) |
EGD, esophagogastroduodenoscopy; NBI, narrow-band imaging; BLI, blue-laser imaging; EGC, early gastric cancer; ESGE, European Society of Gastrointestinal Endoscopy; SSS, systematic screening protocol for the stomach.
The videos used came from stored data at Renmin Hospital of Wuhan University. Instruments that had been used included gastroscopes with an optical magnification function (CVL-290SL, Olympus Optical Co. Ltd., Tokyo, Japan; VP-4450HD, Fujifilm Co., Kanagawa, Japan).
The number of enrolled images was based on the data availability, which led to malignant images being relatively rare compared with non-malignant images and the number of images from different locations varying widely. The standards of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) were used to justify these numbers [17].
#
Training algorithm
VGG-16 [18] and ResNet-50 [19], two state-of-the-art DCNN architectures pretrained with 1.28 million images from 1000 object classes, were used to train our system. Using transfer learning [20], we replaced the final classification layer with another fully connected layer, retrained it using our datasets, and fine-tuned the parameters of all layers. Images were resized to 224 × 224 pixels to suit the original dimensions of the models. Google’s TensorFlow [21] deep learning framework was used to train, validate, and test our system. Confusion matrices, learning curves, and the methods of avoiding overfitting are described in Appendix e1 ([Table e2]; [Figs. e4 – e6] [22] [23] [24] [25] [26]), available online in Supplementary materials.
EGC, early gastric cancer; NA, not applicable.
Fig. e6 The process of building a synthesized model. a Flowchart illustrating the synthesis of VGG-16 and Resnet-50 models. In the task of identifying early gastric cancer, after typical k-fold cross-validation, candidate VGG-16 and Resnet-50 were differently combined (5 VGG; 4 VGG + 1 Resnet; 3 VGG + 2 Resnet; 2 VGG + 3 Resnet; 1 VGG + 4 Resnet; and 5 Resnet) and tested in the independent test set. The combination of the best performance was selected for the subsequent experiments. b The performance of different combined models. The accuracy of the models synthesized with 5 VGG, 4 VGG + 1 Resnet, 3 VGG + 2 Resnet, 2 VGG + 3 Resnet, 1 VGG + 4 Resnet, and 5 Resnet were 90.5 %, 92.0 %, 92.5 %, 90.5 %, 90.0 %, and 89.0 %, respectively. The combination of 3 VGG-16 and 2 Resnet-50 achieved the highest accuracy.
#
Comparison between DCNN and endoscopists
To evaluate the DCNN’s diagnostic ability for EGC, 200 images independent from the training/validation sets were selected as the test set. The performance of the DCNN was compared with that of six expert endoscopists, eight seniors, and seven novices. Lesions that are easily missed, including EGC type 0-I, 0-IIa, 0-IIb, 0-IIc, and 0-mixed were selected ([Table 3]; [Fig. 7]). Two endoscopists with more than 10 years of EGD experience reviewed these images. In the test, endoscopists were asked if there was a suspicious malignant lesion shown in each image. The calculation formulas of comparison metrics are described Appendix e1, available online in Supplementary materials.
To evaluate the DCNN’s ability to classify gastric locations, we compared the accuracy of DCNN to that of 10 experts, 16 seniors, and 9 novices. The test dataset consisted of 170 images, independent from the training/validation sets, randomly selected from each gastric location. In the test, endoscopists were asked which location a picture referred to. Tests were performed using Document Star, a Chinese online test service company. The description of the endoscopists participating in the experiment is presented in Appendix e1, available in Supplementary materials.
#
Class activation maps
Class activation maps (CAMs) indicating suspicious cancerous regions were established, as described previously [27]. In brief, before the final output layer of Resnet-50, a global average pooling was performed on the convolutional feature maps and these were used as the features for a fully-connected layer that produced the desired output. The color depth of CAMs is positively correlated with the confidence of the prediction.
#
A grid model for the stomach
In order to automatically remind endoscopists of blind spots during EGD, a grid model for the stomach was developed to present the covered parts. The onion-skin display method was used to build a grid model for the stomach, as previously described [28]. The characteristics of each part of the stomach were extracted and put together to generate a virtual stomach model. The model was set to be transparent before EGD. As soon as the scope was inserted into the stomach, the DCNN model began to capture images and filled them into the corresponding part in the model, coloring the various parts.
#
Running the DCNN on videos
Frame-wise prediction was used on unprocessed videos using client-server interaction ([Fig. e8a]). Images were captured at 2 frames per second (fps). Noises were smoothed by the Random Forest Classifier model [29] and the rule of “output results only when three of the five consecutive images show a same result” ([Fig. e8b]). The time used for outputting a prediction per frame in the videos in the clinical setting includes time consumed in the client (image capture, image resizing, and rendering images based on predicted results), network communication, and the server (reading and loading images, running the three networks, and saving images).
The speed of the DCNN in the clinical setting was evaluated by 926 independent tests, calculating the total time used to output a prediction per frame in the endoscopy center of Renmin Hospital of Wuhan University.
#
Human subjects
Endoscopists that participated in our tests were under informed consent. This study was approved by the Ethics Committee of Renmin Hospital of Wuhan University, and was registered as trial number ChiCTR1800014809 of the Primary Registries of the WHO Registry Network.
#
Statistical analysis
A two-tailed unpaired Student's t test with a significance level of 0.05 was used to compare differences in the accuracy, sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively) of the DCNN and endoscopists. Interobserver and intraobserver agreement of the endoscopists and intraobserver agreement of the DCNN were evaluated using Cohen’s kappa coefficient. All calculations were performed using SPSS 20 (IBM, Chicago, Illinois, USA).
#
#
Results
The performance of DCNN on identification of EGC
Comparison between the performance of DCNN and endoscopists
[Table 4] shows the predictions of the DCNN and endoscopists for identifying EGC. Among 200 gastroscope images, with or without malignant lesions, the DCNN correctly diagnosed malignancy with an accuracy of 92.5 %, a sensitivity of 94 %, a specificity of 91 %, a PPV of 91.3 %, and an NPV of 93.8 %. Six experts, eight seniors, and seven novices attained an accuracy of 89.7 % (standard deviation [SD] 2.2 %), 86.7 % (SD 5.6 %), and 81.2 % (SD 5.7 %) for each picture, respectively. The accuracy of DCNN was significantly higher than that of all endoscopists. [Fig. 9] shows representative images predicted by the model in the test set. The CAMs highlighted the cancerous regions after images were evaluated by the model.
DCCN |
Experts (n = 6) |
Seniors (n = 8) |
Novices (n = 7) |
|
Accuracy |
92.50 |
89.73 (2.15)[1] |
86.68 (5.58)[2] |
81.16 (5.72)[1] |
Sensitivity |
94.00 |
93.86 (7.65) |
90.00 (6.05) |
75.33 (6.31)[1] |
Specificity |
91.00 |
87.33 (7.43) |
85.05 (16.18) |
88.83 (6.03) |
PPV |
91.26 |
91.75 (4.15) |
90.91 (5.69) |
80.47 (8.75)[1] |
NPV |
93.81 |
92.52 (5.76) |
88.01 (6.55)[2] |
82.32 (11.46) |
PPV, positive predictive value; NPV, negative predictive value.
1 P < 0.01
2 P < 0.05
#
Comparison between the stability of DCNN and endoscopists
To evaluate the stability of DCNN and endoscopists on identifying EGC, we mixed up all of the test pictures and randomly selected six endoscopists (2 experts, 2 seniors, and 2 novices) to do the same test again. As shown in [Table 5], the experts had substantial interobserver agreement (kappa 0.80), and the seniors and novices achieved moderate interobserver agreement (kappa 0.49 and 0.42, respectively). The intraobserver agreement of experts and nonexperts was moderate or better (kappa 0.84 in the expert group and 0.54 – 0.77 in the nonexpert group). The DCNN achieved perfect intraobserver agreement (kappa 1.0).
#
#
The performance of DCNN on classification of gastric locations
Comparison between the performance of DCNN and endoscopists
[Table 6] shows the predictions of the DCNN and endoscopists for classifying gastric locations. A group of 10 experts, 16 seniors, and 9 novices classified EGD images into 10 stomach parts with an accuracy of 90.2 % (SD 5.1 %), 86.8 % (5.2 %), and 83.3 % (10.3 %), respectively, and into 26 sites with an accuracy of 63.8 % (6.9 %), 59.3 % (6.4 %), and 46.5 % (7.2 %), respectively. The DCNN correctly identified EGD images into 10 parts with an accuracy of 90 % and into 26 parts with an accuracy of 65.9 %, showing no significant difference with any of the levels of endoscopists. [Fig. 7] and [Fig. 10] show representative images in the test set that were predicted by the DCNN in the task of classifying gastric locations into 26 parts and 10 parts, respectively.
10 parts |
26 parts |
|
DCNN |
90.00 |
65.88 |
Experts (n = 10) |
90.22 (5.09) |
63.76 (6.87) |
Seniors (n = 16) |
86.81 (5.19) |
59.26 (6.36) |
Novices (n = 9) |
83.30 (10.27) |
46.47 (7.23)[*] |
* P < 0.01
#
Comparison between the stability of DCNN and endoscopists
In the task of classifying gastric locations into 10 parts, all endoscopists achieved substantial interobserver or intraobserver agreement (kappa 0.75 – 0.96). In the 26-part classification, all endoscopists achieved moderate interobserver or intraobserver agreement (kappa 0.50 – 0.68) ([Table 7]). The DCNN achieved perfect intraobserver agreement (kappa 1.0).
#
#
Testing of the DCNN in unprocessed gastroscope videos
To explore the ability of the DCNN in detecting EGC and monitoring blind spots in a real-time clinical setting, we checked the model in two unprocessed gastroscope videos. In [Video 1], which had no cancerous lesion, the DCNN accurately presented the covered parts synchronized with the process of EGD to verify that the entire stomach was mapped.
Video 1 Testing of the deep convolution neural network (DCNN) in an unprocessed esophagogastroduodenoscopy (EGD) video from case 1. The DCNN accurately presented the covered parts synchronized with the process of EGD to verify that the entire stomach had been mapped. In the grid model for the stomach, any transparent area indicated that this part had not been observed during the EGD. No cancerous lesion was detected.
Quality:
In [Video 2], which had cancerous lesions, the DCNN alerted about blind spots synchronized with the process of EGD, and automatically indicated the suspicious EGC regions with CAMs. All lesions were successfully detected; however, a false-positive error occurred when the mucosa was covered by unwashed foam.
Video 2 Testing of the deep convolution neural network (DCNN) model in an unprocessed esophagogastroduodenoscopy (EGD) video from case 2. The DCNN indicated suspicious gastric cancer regions and presented the covered parts synchronized with the process of EGD. Suspicious cancerous regions were indicated by class activation maps (CAMs) and the color depth of the CAMs was positively correlated with the confidence of DCNN prediction. In the grid model for the stomach, any transparent area indicated that this part had not been observed during the EGD.
Quality:
To test the speed of the DCNN, 926 independent tests were conducted in a clinical setting. The total time to output a prediction using all three networks for each frame was 230 milliseconds (SD 60; range 180 – 350). In the test of identifying EGC, six experts, eight seniors, and seven novices took 3.29 seconds for each picture (SD 0.42), 3.96 (0.80), and 6.19 (1.92), respectively. In the test of classifying gastric locations into 10 parts, 10 experts, 16 seniors, and 9 novices required 4.51 seconds per picture (SD 2.07), 4.52 (0.65), and 4.76 (0.67), respectively; for classification into 26 parts, they took 14.23 seconds per picture (SD 2.41), 19.33 (9.34), and 24.15 (6.93), respectively. The prediction time of the DCNN in the clinical setting was considerably shorter compared with the time taken by the endoscopists.
#
#
Discussion
Endoscopy plays a pivotal role in the diagnosis of gastric cancer, the third leading cause of cancer death worldwide [1] [4]. Unfortunately, endoscopic diagnosis of gastric cancer at an early stage is difficult, requiring endoscopists first to obtain thorough knowledge and good technique [7]. The training process of a qualified endoscopist is time- and cost-consuming. In many countries, especially in Western Europe and China, the demand for endoscopists familiar with the diagnosis of EGC means they are in short supply, which greatly limits the effectiveness of endoscopy in the diagnosis and prevention of gastric cancer [7] [8] [9] [10].
DCNN is one of the most important deep learning methods for computer vision and image classification [12] [13] [14]. Chen et al. [13] achieved accurate classification of diminutive colorectal polyps based on images captured during colonoscopy using DCNN, and Byrne et al. [14] achieved real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps on colonoscopy videos. The most recent study [30] used DCNN to detect EGC with an overall sensitivity of 92.2 % and a PPV of 30.6 % in their dataset. Here, we developed a DCNN system to detect EGC, with a sensitivity of 94 % and a PPV of 91.3 %, and distinguish gastric locations on a par with the level of expert endoscopists. Furthermore, we compared the competence of the DCNN with endoscopists and used the DCNN in unprocessed EGD videos to proactively track suspicious cancerous lesions without blind spots.
Observing the whole stomach is a basic prerequisite for the diagnosis of gastric cancer at an early stage [7] [16]. In order to avoid blind spots, standardized procedures have been made to map the entire stomach during gastroscopy. The European Society of Gastrointestinal Endoscopy (ESGE) published a protocol including 10 images of the stomach in 2016 [15]. Japanese researchers published a minimum required “systematic screening protocol for the stomach” (SSS) standard including 22 images of the stomach so as not to miss suspicious cancerous lesions [16]. However, these protocols are often not well followed, and endoscopists may ignore some parts in the stomach because of subjective factors or limited operative levels, which can lead to the misdiagnosis of EGC [7] [8] [10].
In the present study, using 24 549 images from EGDs, we developed a DCNN model that accurately and stably recognizes each part of the stomach on a par with the expert level, automatically captures images during endoscopy, and maps these onto a grid model of the stomach to prompt the operator about blind spots. This real-time assistance system will improve the quality of EGD and ensure that the whole stomach is observed during endoscopy, thereby providing an important prerequisite for the detection of EGC.
White-light imaging is the standard endoscopic examination for the identification of gastric cancer lesions, although it is difficult to make an accurate detection of EGC [31] [32]. It has been reported that the sensitivity of white-light imaging in the diagnosis of superficial EGC ranges from 33 % to 75 % [33]. Many new technologies have been developed to improve diagnostic abilities for EGC, including image-enhanced endoscopy and magnifying endoscopy [34]. Plenty of time and money have been put into training endoscopists to become familiar with the characteristics of early cancer lesions under different views [7]. However, while current technologies are still not strong for some secluded lesions, such as the IIb-type EGC [34], manual diagnosis, on the other hand, greatly depends on the experience and subjective state of the operator performing the endoscopy [8].
This subjective dependence on the operator decreases the accuracy and stability of EGC diagnosis [8]. To overcome this limitation, DCNN, with its strong learning ability and good reproducibility, has gained attention as a clinical tool for endoscopists. In the present study, 3170 gastric cancer and 6541 normal control images from EGD examinations were collected to train a DCNN model with reliable and stable diagnosis ability for EGC. An independent test was conducted to evaluate the diagnostic ability of DCNN and endoscopists. In our study, the DCNN achieved an accuracy of 92.5 %, a sensitivity of 94 %, a specificity of 91 %, a PPV of 91.3 %, and an NPV of 93.8 %, outperforming all levels of endoscopists.
In terms of stability evaluation, the nonexperts achieved moderate interobserver agreement (kappa 0.42 – 0.49), and the experts achieved substantial interobserver agreement (kappa 0.80). Because of the subjective interpretation of the EGC characteristics, human learning curves in the diagnosis of EGC exist, and therefore an objective diagnosis is necessary [5] [8]. In our study, we used up-to-date neural network models to develop a DCNN system. It achieved perfect intraobserver agreement (kappa 1.0), while endoscopists had variable intraobserver agreement. Our results indicate that this gastric cancer screening system based on a DCNN has adequate and consistent diagnostic performance, removes some of the diagnostic subjectivity, and could be a powerful tool to assist endoscopists, especially nonexperts, in detecting EGC.
The diagnostic ability and stability of the DCNN seems to outperform that of experienced endoscopists. In addition, the diagnostic time of the DCNN was considerably shorter than that of the endoscopists. It should be noted that the shorter screening time and the absence of fatigue with the DCNN may make it possible to provide quick predictions of EGC following an endoscopic examination. Importantly, the diagnosis of EGC by the DCNN can be achieved completely automatically and online, which may contribute to the development of telemedicine, thereby alleviating the problem of inadequate numbers and experience of doctors in remote regions.
Another strength of this study is that the DCNN is armed with CAMs and a grid model for the stomach to cover suspicious cancerous regions and indicate the existence of potential blind spots. The CAMs are a weighted linear sum of the presence of visual patterns with different characteristics, by which the discriminative regions of target classification are highlighted [27]. As soon as we insert the scope into the stomach, we can determine whether and where EGC is present in the background mucosa using the DCNN with CAMs. In addition, the observed areas are instantaneously recorded and colored in the grid model of the stomach, indicating whether blind spots exist during the EGD. Through these two auxiliary tools, it is possible for the DCNN to proactively track suspicious cancerous lesions without blind spots, so reducing the pressure and workload on endoscopists during real-time EGD.
There are some limitations in the present study. First, the detection of EGC was based on images only in white light, narrow-band imaging (NBI), and blue-laser imaging (BLI) views. With images under more views, such as chromoendoscopy using indigo carmine [7], i-scan optical enhancement [34], and even optical coherence tomography [35], it is possible to design a more universal EGC detection system.
Second, in the control group of the gastric cancer dataset, only normal, superficial gastritis, and mild erosive gastritis mucosa were enrolled. Other benign diseases, such as atrophic gastritis, gastritis verrucose, and typical benign ulcer, could be enrolled in the control group later. In this way, the practicability of the DCNN in detection of EGC will be further improved.
Third, when the DCNN was applied to unprocessed EGD videos, false-positive errors occurred when the mucosa was not washed clean. We plan to train the DCNN to recognize mucosa that is poorly prepared to avoid these mistakes and to transfer the false-positive errors into a suggestion to the operator with regard to cleaning the mucosa.
Fourth, although the DCNN presented satisfactory results in the detection of EGC and monitoring of EGD quality in real-time unprocessed videos, its competence was only quantitatively evaluated in still images, not in videos. We will keep collecting data to assess its ability in unprocessed videos and to provide accurate effectiveness data for the DCNN in a real clinical setting in the near future.
In summary, a computer-aided system based on a DCNN provides automated, accurate, and consistent diagnostic performance for the detection of EGC and the recognition of gastric locations. The CAMs and grid model for the stomach enable the DCNN to proactively track suspicious cancerous lesions without blind spots during EGD. The DCNN is a promising technique in computer-based recognition and is not inferior to experts. It may be a powerful tool to assist endoscopists in detecting EGC without blind spots. More research should be conducted and clinical applications should be tested to further verify and improve this system’s effectiveness.
#
Appendix e1 Supplementary methods
The methods of avoiding overfitting
To minimize the overfitting risk, we used four methods, including data augmentation [22], k-fold cross-validation [23], early stopping [24], and synthesis of different models. Images were randomly transformed using methods including height/width shift, shear, zoom, horizontal flip, and fill mode. Parameters of the augmentation methods used in the three independent networks are shown in [Table e2]. Through data augmentation, images could be differently presented in every round of training. The k-fold cross-validation procedure was implemented with k = 5, dividing the dataset into five subsets and validating each subset individually, with the remaining used for training. ([Fig. e4]) Furthermore, early stopping was used to watch a validation curve while training and to stop updating the weights when the validation error did not decrease for four times (patience = 4). Early stopping epochs are indicated in [Fig. e5].
It has been reported that synthesizing different DCNN models [25] and taking average effect of optimal K candidates [26] could reduce the variance of the proposed estimates. In the task of identifying blurry images and classifying gastric locations, the performance of VGG-16 was much better than that of Resnet-50. While in the task of identifying EGC, the performance of VGG-16 and Resnet-50 was comparative. Therefore, we used VGG-16 alone in the task of identifying blurry images and classifying gastric locations, and synthesized VGG-16 and Resnet-50 in the task of EGC identification; the final prediction was based on the average confidence of each category estimated by each model. Candidate VGG-16 and Resnet-50 were differently combined and tested in the independent test set illustrated in the Methods section of the main text ([Fig. e6a]). It turned out that the combination of 3 VGG-16 and 2 Resnet-50 achieved the highest accuracy ([Fig. e6b]). Therefore, this combination was chosen for subsequent experiments.
#
Confusion matrices and learning curves
The confusion matrices and learning curves of our method over independent neural networks are presented to demonstrate the accuracy of the DCNN ([Fig. e5]). In the matrix, element (x, y) represents the number of the predicted class (x) based on the true class (y). In the learning curves, the loss and accuracy of the DCNN in training datasets after different times of training were presented. The loss value was calculated by binary crossentropy in binary classification (networks 1 and 2), and by categorical crossentropy in multivariate classification (network 3).
#
The calculation formulas of comparison metrics in the comparison between DCNN and endoscopists
The comparison metrics are accuracy, sensitivity, specificity, PPV, and NPV, where accuracy = true predictions / total number of cases, sensitivity = true positive/positive, specificity = true negative / negative, PPV = true positive / (true positive + false positive), and NPV = true negative / (true negative + false negative). The “true positive” is the number of correctly predicted malignancy images, “false positive” is the number of mistakenly predicted malignancy images, “positive” is the number of images showing malignancy, “true negative” is the number of correctly predicted non-malignancy images, “false negative” is the number of mistakenly predicted non-malignancy images, and “negative” is the number of non-malignancy images shown.
#
Description of endoscopists participating in the experiment
Endoscopists were blinded to histologic data and the study design. Experts were staff members in the Gastroenterology Department of Renmin Hospital, Wuhan University, with more than 5 years of EGD experience, and their average annual EGD volume was higher than 1000. Senior endoscopy physicians were staff members in the Gastroenterology Department of Renmin Hospital, Wuhan University, with EGD experience of more than 1 year and less than 3 years, and their average annual EGD volume was higher than 800. Novices were fellows in the Gastroenterology Department of Renmin Hospital, Wuhan University, with less than 1 year of EGD experience, and their average annual EGD volume was 50 – 200.
#
#
#
Competing interests
None
Acknowledgments
This work was partly supported by a grant from the Research Funds for Key Laboratory of Hubei Province (No. 2016CFA066), the National Natural Science Foundation of China (grant nos. 81672387 [to Yu Honggang]), and the China Youth Development Foundation (grant no. 81401959 [to Zhou Wei] and grant no. 81703030 [to Ding Qianshan]).
* Contributed equally to this work
-
References
- 1 Torre LA, Bray F, Siegel RL. et al. Global cancer statistics, 2012. CA Cancer J Clin 2015; 65: 87-108
- 2 Soetikno R, Kaltenbach T, Yeh R. et al. Endoscopic mucosal resection for early cancers of the upper gastrointestinal tract. J Clin Oncol 2005; 23: 4490-4498
- 3 Laks S, Meyers MO, Kim HJ. Surveillance for gastric cancer. Surg Clin 2017; 97: 317-331
- 4 Pasechnikov V, Chukov S, Fedorov E. et al. Gastric cancer: prevention, screening and early diagnosis. World J Gastroenterol 2014; 20: 13842-13862
- 5 Yalamarthi S, Witherspoon P, McCole D. et al. Missed diagnoses in patients with upper gastrointestinal cancers. Endoscopy 2004; 36: 874-879
- 6 Rutter MD, Senore C, Bisschops R. et al. The European Society of Gastrointestinal Endoscopy quality improvement initiative: developing performance measures. United European Gastroenterol J 2016; 4: 30-41
- 7 Yao K, Uedo N, Muto M. et al. Development of an e-learning system for teaching endoscopists how to diagnose early gastric cancer: basic principles for improving early detection. Gastric Cancer 2017; 20: S28-S38
- 8 Scaffidi MA, Grover SC, Carnahan H. et al. Impact of experience on self-assessment accuracy of clinical colonoscopy competence. Gastrointest Endosc 2018; 87: 827-836.e2
- 9 Kim GH, Bang SJ, Ende AR. et al. Is screening and surveillance for early detection of gastric cancer needed in Korean Americans?. Korean J Int Med 2015; 30: 747
- 10 O'Mahony S, Naylor G, Axon A. Quality assurance in gastrointestinal endoscopy. Endoscopy 2000; 32: 483-488
- 11 Torkamani A, Andersen KG, Steinhubl SR. et al. High-definition medicine. Cell 2017; 170: 828-843
- 12 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436-444
- 13 Chen PJ, Lin MC, Lai MJ. et al. Accurate classification of diminutive colorectal polyps using computer-aided analysis. Gastroenterology 2018; 154: 568-575
- 14 Byrne MF, Chapados N, Soudan F. et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 2018;
- 15 Bisschops R, Areia M, Coron E. et al. Performance measures for upper gastrointestinal endoscopy: a European Society of Gastrointestinal Endoscopy (ESGE) quality improvement initiative. Endoscopy 2016; 48: 843-864
- 16 Yao K. The endoscopic diagnosis of early gastric cancer. Ann Gastroenterol 2013; 26: 11-22
- 17 Russakovsky O, Deng J, Su H. et al. Imagenet large scale visual recognition challenge. Int J Comput Vision 2015; 115: 211-252
- 18 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. CoRR arXiv: 1409.1556 https://arxiv.org/abs/1508.06576
- 19 He K, Zhang X, Ren S. et al. Deep residual learning for image recognition. Proc IEEE Conf Comput Vision Pattern Recogn 2016; 770-778
- 20 Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22: 1345-1359
- 21 Abadi M, Agarwal A, Barham P. et al. Tensorflow: A system for large-scale machine learning. 12th Symposium on Operating Systems Design and Implementation. 2016 265 – 283 https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
- 22 Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. J Am Stat Assoc 1987; 82: 528-540
- 23 Wen Z, Li B, Ramamohanarao K. et al. Improving efficiency of SVM k-fold cross-validation by alpha seeding. AAAI 2017; 2768-2774
- 24 Prechelt L. Automatic early stopping using cross validation: quantifying the criteria. Neural Netw 1998; 11: 761-767
- 25 Li S, Liu G, Tang X. et al. An ensemble deep convolutional neural network model with improved DS evidence fusion for bearing fault diagnosis. Sensors 2017; 17: 1729
- 26 Jung Y, Hu J. AK-fold averaging cross-validation procedure. J Nonparametr Stat 2015; 27: 167-179
- 27 Zhou B, Khosla A, Lapedriza A. et al. Learning deep features for discriminative localization. Proc IEEE Conf Comput Vision Pattern Recogn 2016; 2921-2929
- 28 O'Hailey T. Hybrid animation: integrating 2D and 3D assets. Abingdon: OXON: Taylor and Francis; 2010
- 29 Liaw A, Wiener M. Classification and regression by randomForest. R news 2002; 2: 18-22
- 30 Hirasawa T, Aoyama K, Tanimoto T. et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer 2018; 21: 653-660
- 31 Teh JL, Hartman M, Lau L. et al. Mo1579 duration of endoscopic examination significantly impacts detection rates of neoplastic lesions during diagnostic upper endoscopy. Gastrointest Endosc 2011; 73: AB393
- 32 Zhang Q, Wang F, Chen ZY. et al. Comparison of the diagnostic efficacy of white light endoscopy and magnifying endoscopy with narrow band imaging for early gastric cancer: a meta-analysis. Gastric Cancer 2016; 19: 543-552
- 33 Ezoe Y, Muto M, Uedo N. et al. Magnifying narrowband imaging is more accurate than conventional white-light imaging in diagnosis of gastric mucosal cancer. Gastroenterology 2011; 141: 2017-2025
- 34 Song M, Ang TL. Early detection of early gastric cancer using image-enhanced endoscopy: Current trends. Gastrointest Intervent 2014; 3: 1-7
- 35 Tsai TH, Leggett CL, Trindade AJ. et al. Optical coherence tomography in gastroenterology: a review and future outlook. J Biomed Opt 2017; 22: 1-17
Corresponding author
-
References
- 1 Torre LA, Bray F, Siegel RL. et al. Global cancer statistics, 2012. CA Cancer J Clin 2015; 65: 87-108
- 2 Soetikno R, Kaltenbach T, Yeh R. et al. Endoscopic mucosal resection for early cancers of the upper gastrointestinal tract. J Clin Oncol 2005; 23: 4490-4498
- 3 Laks S, Meyers MO, Kim HJ. Surveillance for gastric cancer. Surg Clin 2017; 97: 317-331
- 4 Pasechnikov V, Chukov S, Fedorov E. et al. Gastric cancer: prevention, screening and early diagnosis. World J Gastroenterol 2014; 20: 13842-13862
- 5 Yalamarthi S, Witherspoon P, McCole D. et al. Missed diagnoses in patients with upper gastrointestinal cancers. Endoscopy 2004; 36: 874-879
- 6 Rutter MD, Senore C, Bisschops R. et al. The European Society of Gastrointestinal Endoscopy quality improvement initiative: developing performance measures. United European Gastroenterol J 2016; 4: 30-41
- 7 Yao K, Uedo N, Muto M. et al. Development of an e-learning system for teaching endoscopists how to diagnose early gastric cancer: basic principles for improving early detection. Gastric Cancer 2017; 20: S28-S38
- 8 Scaffidi MA, Grover SC, Carnahan H. et al. Impact of experience on self-assessment accuracy of clinical colonoscopy competence. Gastrointest Endosc 2018; 87: 827-836.e2
- 9 Kim GH, Bang SJ, Ende AR. et al. Is screening and surveillance for early detection of gastric cancer needed in Korean Americans?. Korean J Int Med 2015; 30: 747
- 10 O'Mahony S, Naylor G, Axon A. Quality assurance in gastrointestinal endoscopy. Endoscopy 2000; 32: 483-488
- 11 Torkamani A, Andersen KG, Steinhubl SR. et al. High-definition medicine. Cell 2017; 170: 828-843
- 12 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436-444
- 13 Chen PJ, Lin MC, Lai MJ. et al. Accurate classification of diminutive colorectal polyps using computer-aided analysis. Gastroenterology 2018; 154: 568-575
- 14 Byrne MF, Chapados N, Soudan F. et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 2018;
- 15 Bisschops R, Areia M, Coron E. et al. Performance measures for upper gastrointestinal endoscopy: a European Society of Gastrointestinal Endoscopy (ESGE) quality improvement initiative. Endoscopy 2016; 48: 843-864
- 16 Yao K. The endoscopic diagnosis of early gastric cancer. Ann Gastroenterol 2013; 26: 11-22
- 17 Russakovsky O, Deng J, Su H. et al. Imagenet large scale visual recognition challenge. Int J Comput Vision 2015; 115: 211-252
- 18 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. CoRR arXiv: 1409.1556 https://arxiv.org/abs/1508.06576
- 19 He K, Zhang X, Ren S. et al. Deep residual learning for image recognition. Proc IEEE Conf Comput Vision Pattern Recogn 2016; 770-778
- 20 Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22: 1345-1359
- 21 Abadi M, Agarwal A, Barham P. et al. Tensorflow: A system for large-scale machine learning. 12th Symposium on Operating Systems Design and Implementation. 2016 265 – 283 https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
- 22 Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. J Am Stat Assoc 1987; 82: 528-540
- 23 Wen Z, Li B, Ramamohanarao K. et al. Improving efficiency of SVM k-fold cross-validation by alpha seeding. AAAI 2017; 2768-2774
- 24 Prechelt L. Automatic early stopping using cross validation: quantifying the criteria. Neural Netw 1998; 11: 761-767
- 25 Li S, Liu G, Tang X. et al. An ensemble deep convolutional neural network model with improved DS evidence fusion for bearing fault diagnosis. Sensors 2017; 17: 1729
- 26 Jung Y, Hu J. AK-fold averaging cross-validation procedure. J Nonparametr Stat 2015; 27: 167-179
- 27 Zhou B, Khosla A, Lapedriza A. et al. Learning deep features for discriminative localization. Proc IEEE Conf Comput Vision Pattern Recogn 2016; 2921-2929
- 28 O'Hailey T. Hybrid animation: integrating 2D and 3D assets. Abingdon: OXON: Taylor and Francis; 2010
- 29 Liaw A, Wiener M. Classification and regression by randomForest. R news 2002; 2: 18-22
- 30 Hirasawa T, Aoyama K, Tanimoto T. et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer 2018; 21: 653-660
- 31 Teh JL, Hartman M, Lau L. et al. Mo1579 duration of endoscopic examination significantly impacts detection rates of neoplastic lesions during diagnostic upper endoscopy. Gastrointest Endosc 2011; 73: AB393
- 32 Zhang Q, Wang F, Chen ZY. et al. Comparison of the diagnostic efficacy of white light endoscopy and magnifying endoscopy with narrow band imaging for early gastric cancer: a meta-analysis. Gastric Cancer 2016; 19: 543-552
- 33 Ezoe Y, Muto M, Uedo N. et al. Magnifying narrowband imaging is more accurate than conventional white-light imaging in diagnosis of gastric mucosal cancer. Gastroenterology 2011; 141: 2017-2025
- 34 Song M, Ang TL. Early detection of early gastric cancer using image-enhanced endoscopy: Current trends. Gastrointest Intervent 2014; 3: 1-7
- 35 Tsai TH, Leggett CL, Trindade AJ. et al. Optical coherence tomography in gastroenterology: a review and future outlook. J Biomed Opt 2017; 22: 1-17