Introduction
Technological progress and the development of complex mathematical models that allow
the analysis of large and partially unstructured data have led to the rapid development
of artificial intelligence (AI) since the 2010 s [1]. New AI applications, such as the recent “chatbot” ChatGPT, regularly attract a
great deal of media attention with headlines ranging from euphoric to critical. As
a result, the population developed specific expectations of the benefits, but also
concerns about the potential risks of AI. In a survey published in 2023 by the digital
association Bitkom, 73 % of the 1007 people surveyed saw AI as an opportunity [2]. Two thirds wanted AI to be used when it would bring specific benefits, for example
in medicine or transportation. 14 % and 10 % of respondents saw AI rather or exclusively
as a risk, respectively. The majority of respondents assumed that AI would noticeably
change our society in the coming years. These survey results impressively show how
much is already expected of AI. And indeed, AI accompanies us consciously or unconsciously
in many everyday situations. There are also several examples in clinical medicine,
e. g., in the automated evaluation of ECGs, differential blood counts, etc. [1]. Intensive research has been carried out into the possible use of AI in medical
imaging since its beginnings over 80 years ago. In addition to approaches for image
optimization, this primarily includes automated diagnosis for disease detection and
classification as well as therapeutic monitoring. Enormous progress has been made
in the field of imaging in recent years through the use of deep learning (DL) technologies.
In contrast to classic forms of machine learning (ML), DL is based on neural networks
in which several network levels are linked together [3]. Convolutional neural networks (CNNs) are frequently used in the field of image
recognition. These are characterized by a hierarchical recognition of image patterns
by the different network levels [3]. If initial structures such as corners, edges, or simple shapes are recognized,
the linking of these simple structures in the deeper network levels enables the classification
of complex structures such as malignancies in clinical imaging. In addition to better
predictability, CNNs are more flexible than traditional ML. Furthermore, the time-consuming
extraction of diagnostically relevant image information (“feature extraction”), which
is necessary with classic ML, is no longer required, as image features are recognized
independently by CNNs.
Particularly in radiological imaging (including computed tomography and magnetic resonance
imaging), CNNs have achieved outstanding results in clinical diagnostics and therapeutic
monitoring. These have already led to various commercially available and approved
AI applications in the field of oncological imaging, among others [4] ([Fig. 1]).
Fig. 1 Use of artificial intelligence (AI) in oncological imaging. Previous studies have
investigated the possible use of AI in the detection, characterization, and therapeutic
monitoring of oncological diseases. Illustration modified according to El Naqa et
al., Br J Radiol, 2020 [5].
Compared to radiological imaging, the use of AI in sonographic imaging involves particular
challenges [6]. Due to the examiner dependency in sonographic image acquisition, possible image
material for training AI generally has a higher variability. The representation of
identical findings in different scanning planes can, for example, lead to considerable
differences in image interpretation and thus make correct classification by the AI
more difficult. Similar effects can be caused by differences in the devices or transducers
used.
In the following, we will discuss the extent to which these effects impact the use
of AI in sonography, where AI currently stands in sonography, what limitations are
to be expected, and what steps are needed next to enable the successful translation
of AI into clinical sonography, using the example of the sonographic diagnosis of
hepatocellular carcinoma (HCC).
AI for early screening and characterization of focal liver lesions in sonographic
imaging
AI for early screening and characterization of focal liver lesions in sonographic
imaging
The incidence of malignant liver tumors has increased continuously in recent decades.
Primary hepatocellular carcinoma (HCC) is the most common malignant liver tumor worldwide,
in the western world mainly due to liver cirrhosis related to chronic HCV infection
or alcohol and the increase in metabolic dysfunction-associated steatotic liver disease
(MASLD) with advanced fibrosis or cirrhosis of the liver [7]
[8]. For this reason, the various professional societies recommend that patients at
increased risk of HCC participate in an early detection program based on biannual
ultrasound examinations of the liver [8]
[9].
The sensitivity of B-scan ultrasonography for the early screening of HCC is reported
to be between 47 and 84 % [10]. The reasons for the great variability in sensitivity are the experience of the
examiner, inadequate acoustic conditions in very obese patients, or the inhomogeneous
echotexture of cirrhosis of the liver. While sonography has a comparable detection
rate to computed tomography (CT) and magnetic resonance imaging (MRI) for larger lesions,
the sensitivity of sonography is significantly lower than MRI when it comes to detecting
small lesions (< 2 cm). The question arises as to whether the use of AI-supported
procedures in sonography can lead to improved detection of early forms of HCC.
Studies published to date have primarily investigated the possibility of using AI
to detect focal liver lesions in sonographic image data. For example, Tiyarattanachai
trained a CNN with more than 20 000 individual images from the B-scan ultrasonography
of almost 3500 patients [11]. The authors achieved a sensitivity of 83.9 % for the detection of focal liver lesions
in an internal validation dataset and 84.9 % in an external validation dataset. Yang
et al. achieved a sensitivity of 86.5 % and a specificity of 85.5 % with a CNN that
was trained using over 20 000 ultrasound images from more than 2000 patients in conjunction
with clinical information (including age, gender, AFP value) [12]. In this study, the CNN was superior to experienced examiners and achieved similar
results to contrast-enhanced computed tomography.
If a suspected HCC lesion is detected during surveillance, it should be characterized
using a contrast media-based procedure in accordance with the guideline recommendations
[8]. In addition to contrast media-enhanced magnetic resonance imaging (MRI) and computed
tomography (CT), contrast medium-enhanced ultrasound (CEUS) can also be used for HCC
diagnosis [13]. As the data of the prospective DEGUM multicenter study show, a typical perfusion
pattern in contrast media ultrasonography (arterial hypervascularization and washout
in the portal venous and venous phase) allows the diagnosis of HCC with a sensitivity
of 94 % and a specificity of 65 % (or 79 % when using standardized CEUS algorithms)
[8]
[14]
[15]. However, similar results in everyday clinical practice require experienced examiners.
It is therefore not surprising that the use of AI to characterize focal liver lesions
and to diagnose HCC has also been investigated in various studies. In a recently published
summary of the review, the AI-supported characterization of focal liver lesions in
some of the studies published to date was based only on B-scan data, in some cases
with CEUS data [16]. The diagnostic accuracy of B-scan-trained AI was between 69 and 98.6 %, while the
diagnostic accuracy of CEUS-trained AI was between 64 and 98.3 %. Only a small proportion
of the studies published to date have compared the results of AI-assisted sonography
with medical assessment [16]. Primarily, however, the studies to date indicate that AI-based classification is
comparable to experienced examiners, but can achieve better results than inexperienced
examiners. Despite this positive assessment, however, the data to date must also be
critically scrutinized.
Limitations of the clinical use of AI in HCC sonography
Limitations of the clinical use of AI in HCC sonography
The systematic assessment of the scientific quality of a total of 52 studies on the
characterization of focal liver lesions in sonographic datasets using the QUADAS-2
criteria showed that the transferability of the results of many studies is limited
[16]. This is primarily due to the fact that no independent datasets were used in the
studies for final testing (so-called test dataset) of the fully trained CNN. In addition,
not all of the most common types of focal liver lesions were included in some studies,
which further limits the use of the AI algorithms used in everyday clinical practice.
A known risk when training neural networks is “bias”, i. e., data distortion, e. g.,
due to the lack of inclusion of certain patient groups. In the aforementioned QUADAS-2
analysis, it was not possible to make a statement on the risk due to bias in many
studies, as relevant information was missing in the description of the methodology
used in individual studies [16]. In order to check the bias of the AI methods, an understanding of the underlying
criteria for image classification would be necessary. As it is generally not possible
to check the algorithms of the CNNs due to the high complexity of the systems, careful
planning of the data used during training is essential to avoid bias. This requires
not only appropriate expertise in conducting AI-based studies, but also among the
reviewers and readers in order to be able to adequately classify the significance
and quality of the studies. Kuang et al. summarized various quality criteria for the
use of AI in sonography, which should also help readers who are inexperienced with
AI to evaluate the relevant studies [17]. Here, too, the urgent need for independent test datasets (ideally external datasets)
was pointed out. In addition to other points, care should be taken to ensure that
the AI algorithms are freely accessible and thus visible, that the performance of
the AI is compared with the results of experienced examiners, and that the results
are compared with data from comparable published studies.
Conclusion
Even if the majority of the data published to date on the use of AI in the sonographic
diagnosis of HCC is based on a retrospective analysis of previously acquired image
data and cannot easily be transferred to a real-time assessment of the liver in everyday
clinical practice, the data to date are already promising.
Similar to ultrasound of HCC, data on the use of AI in sonography are already available
for numerous applications, albeit often with similar limitations to the studies discussed
here. A frequent problem is the transferability to the very heterogeneous situation
of everyday clinical diagnostics (e. g., possible influence of different devices,
presets, transducers, examiners, etc.). Due to the high number of possible influencing
factors in sonography, correspondingly larger datasets from different centers are
required to counteract the heterogeneity of the image data. However, a targeted and
critically questioned use of AI with regard to its added value, together with careful
planning and a multicenter collection of training, validation, and test datasets,
could provide valuable support in the future, especially for inexperienced examiners.