Endoscopy 2025; 57(03): 262-268
DOI: 10.1055/a-2388-6084
Innovations and brief communications

The role of generative language systems in increasing patient awareness of colon cancer screening

1   Department of Medicine and Surgery, University of Enna “Kore,” Enna, Italy
,
Daryl Ramai
2   Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA
,
3   Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway
4   Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan
,
Mário Dinis-Ribeiro
5   Porto Comprehensive Cancer Center & RISE@CI-IPO, University of Porto, Porto, Portugal
6   Gastroenterology Department, Portuguese Institute of Oncology of Porto, Porto, Portugal
,
7   Gastroenterology Unit, Department of Medical Sciences, University of Foggia, Foggia, Italy
,
Cesare Hassan
8   Endoscopy Unit, Humanitas Clinical and Research Hospital, IRCCS, Rozzano, Italy
9   Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
,
and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group
› Institutsangaben


Zoom Image

Abstract

Background This study aimed to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patientsʼ questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs.

Methods 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 nonexperts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility, and trustability).

Results According to expert rating, the mean (SD) accuracy score was 4.8 (1.1), on a scale ranging from 1 to 6. The mean (SD) scores for completeness and comprehensibility were 2.1 (0.7) and 2.8 (0.4), respectively, on scales ranging from 1 to 3. Overall, the mean (SD) accuracy (4.8 [1.1] vs. 5.6 [0.7]; P < 0.001) and completeness scores (2.1 [0.7] vs. 2.7 [0.4]; P < 0.001) were significantly lower for the experts than for the nonexperts, while comprehensibility was comparable among the two groups (2.8 [0.4] vs. 2.8 [0.3]; P = 0.55). Patients rated all questions as complete, comprehensible, and trustable in between 97 % and 100 % of cases.

Conclusions ChatGPT shows good performance, with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.

Supplementary Material



Publikationsverlauf

Eingereicht: 14. März 2024

Angenommen nach Revision: 14. August 2024

Accepted Manuscript online:
14. August 2024

Artikel online veröffentlicht:
23. Oktober 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany