Endoscopy
DOI: 10.1055/a-2324-3743
Editorial

We are not in a Woody Allen film yet

Referring to Ghersin I et al. doi: 10.1055/a-2289-5732
1   Department of Medical Sciences and Surgery, University of Bologna, Bologna, Italy
2   Gastroenterology Unit, IRCCS-Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
› Author Affiliations

The ability to repeat the famous scene from Woody Allen’s Annie Hall film (1977), in which the main character begins a heated discussion about Marshall McLuhan’s theories with a particularly annoying neighbor, and Marshall McLuhan himself comes out from behind a display board to clarify exactly how things are [1], would be very nice every time we are unsure about how to manage or what to suggest to a patient.

Unfortunately, reality is very far from Woody Allen’s film. It is true, one might argue, that there are always guidelines available to come to our aid. Nevertheless, the guidelines contain recommendations or suggestions based on published evidence and do not necessarily adapt to the heterogeneous clinical cases that come to our attention on a daily basis. For this reason, we should always have one or more expert colleagues with whom to compare ourselves (i.e. Marshall McLuhan coming out from behind a display board), but this is not always feasible, and the same expert may, in turn, need a comparison.

Hope seems to come from ChatGPT. It is time that this word becomes familiar to us because in the coming years, more and more studies will be published on this new technology. What does it consist of? ChatGPT is an artificial intelligence (AI) chatbot that uses natural language processing to create human-like conversational dialogue. GPT stands for “Generative Pretrained Transformer,” which refers to how ChatGPT processes requests and formulates responses.

“… at this moment ChatGPT offers the same fickleness as humans, but is as sufficiently credible as colleagues who are, more or less, expert on the topic.”

In this issue of Endoscopy, Ghersin et al. aimed to assess the accuracy of ChatGPT in producing recommendations for colorectal dysplasia screening, surveillance, and endoscopic management in 30 clinical scenarios of patients with inflammatory bowel disease (IBD) according to the European Crohn’s and Colitis Organization (ECCO) guidelines [2]. The authors then compared the recommendations produced by ChatGPT with those provided by eight gastroenterologists, of whom four were experts and four were nonexperts in the field of IBD. Finally, two additional IBD specialist were asked to assess all responses provided and judge their accuracy according to ECCO guidelines.

The results were as follows: 1) correct response rates ranged from 85.8% to 89.2%; 2) there was no significant difference between ChatGPT and gastroenterologists, whether experts or not; 3) there was no significant difference between IBD experts and nonexperts.

Therefore, based on this preliminary and very limited experience, we can conclude that ChatGPT can represent a useful aid in the decision-making process. However, we must remember that in this phase of the development of this technology, the response of ChatGPT is fickle. Every time I ask the same question, it could give me slightly different answers because of the stochastic nature of its output generation. Furthermore, the most important word to keep in mind in ChatGPT is “pretrained,” which means if you were trained to recognize apples you will not recognize oranges. This simple truth has often been ignored when evaluating the expectations of using AI, for example, in recognizing colonic tumor lesions. If you have been trained to recognize adenomas, you will not recognize serrated lesions or carcinomas. Therefore, it is very important before using any ChatGPT to know exactly how it was trained in order to set our expectations and view the answers in the correct context. Nevertheless, if you are looking for the comfort of a comparison with someone reliable, at this moment ChatGPT offers the same fickleness as humans, but is as sufficiently credible as colleagues who are, more or less, expert on the topic.

The absence of any difference between IBD experts and nonexperts in the Ghersin et al. article could have several explanations: extreme clarity of the ECCO guidelines; the use of clinical scenarios that were not so uncommon and difficult that experience and knowledge would have made a difference; and, finally, the Hawthorne effect.

What are the take-home messages from this innovative study? Technology is moving quickly, and the future is just around the corner, and we must be prepared to understand it. In the meantime, we should continue to find time to read the guidelines carefully and, most importantly, be very kind to our more expert colleagues, because, sadly, no Marshal McLuhan is waiting behind the display board (yet).



Publication History

Article published online:
11 June 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany