The role of generative language systems in increasing patient awareness of colon cancer screening

Marcello Maida; Daryl Ramai; Yuichi Mori; Mário Dinis-Ribeiro; Antonio Facciorusso; Cesare Hassan; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group

doi:10.1055/a-2388-6084

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000012.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

Endoscopy 2025; 57(03): 262-268
DOI: 10.1055/a-2388-6084

Innovations and brief communications

The role of generative language systems in increasing patient awareness of colon cancer screening

Marcello Maida

¹Department of Medicine and Surgery, University of Enna “Kore,” Enna, Italy

,

Daryl Ramai

²Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA

,

Yuichi Mori

³Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway

⁴Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan

,

Mário Dinis-Ribeiro

⁵Porto Comprehensive Cancer Center & RISE@CI-IPO, University of Porto, Porto, Portugal

⁶Gastroenterology Department, Portuguese Institute of Oncology of Porto, Porto, Portugal

,

Antonio Facciorusso

⁷Gastroenterology Unit, Department of Medical Sciences, University of Foggia, Foggia, Italy

,

Cesare Hassan

⁸Endoscopy Unit, Humanitas Clinical and Research Hospital, IRCCS, Rozzano, Italy

⁹Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy

,

and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group

(

Marianna Arvanitakis

¹⁰Department of Gastroenterology, Hepatopancreatology and Digestive Oncology, Erasme University Hospital, Université Libre de Bruxelles, Brussels, Belgium

,

Monique T. Barakat

¹¹Division of Gastroenterology and Hepatology, Stanford University Medical Center, Stanford, California, USA

,

Raf Bisschops

¹²Department of Gastroenterology and Hepatology, KU Leuven University Hospitals Leuven Gasthuisberg Campus, Leuven, Belgium

¹³Department of Translational Research in Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium

,

Kathryn Byrne

²Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA

,

Evelien Dekker

¹⁴Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, The Netherlands

¹⁵Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam, The Netherlands

,

John Fang

²Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA

,

Lorenzo Fuccio

¹⁶IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy

¹⁷Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy

,

Paraskevas Gkolfakis

¹⁸Department of Gastroenterology, General Hospital of Nea Ionia "Konstantopoulio-Patision," Athens, Greece

,

Jonathan A. Leighton

¹⁹Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA

,

Vincente Lorenzo-Zúñiga

²⁰Department of Gastroenterology and Endoscopy Unit IIS La Fe, Hospital Universitari i Politecnic La Fe, Valencia, Spain

,

David J. Hass

²¹Division of Gastroenterology, Yale New Haven Hospital, Hamden, Connecticut, USA

,

Carlo D. Maida

²²Division of Internal Medicine, S. Elia Hospital, Caltanissetta, Italy

,

Andrew Ofosu

²³Division of Digestive Diseases, University of Cincinnati, Cincinnati, Ohio, USA

,

Amy S. Oxentenko

¹⁹Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA

,

Franco Radaelli

²⁴Gastroenterology Unit, Valduce Hospital, Como, Italy

,

Luigi Ricciardiello

²⁵Gastroenterology, Hepatology and Nutrition Department, The University of Texas at MD Anderson Cancer Center, Houston, Texas, United States

,

Matthew D. Rutter

²⁶Hepatogastroenterology Unit, 2nd Department of Propaedeutic Internal Medicine, Medical School, National and Kapodistrian University of Athens, Attikon University General Hospital, Athens, Greece

²⁷Gastroenterology, North Tees and Hartlepool NHS Foundation Trust, Stockton-on-Tees, UK

,

Jessica Stout

²Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA

,

Konstantinos Triantafyllou

²⁸Newcastle University, Newcastle upon Tyne, UK

)› Institutsangaben

› Weitere Informationen

Auch verfügbar auf

Abstract
Volltext
Referenzen
Abbildungen
Zusatzmaterial

als PDF herunterladen Lizenzen und Reprints

Introduction
Methods
Results

Expert assessment

Nonexpert assessment

Patients’ assessment

Discussion
References

Abstract

Background This study aimed to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patientsʼ questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs.

Methods 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 nonexperts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility, and trustability).

Results According to expert rating, the mean (SD) accuracy score was 4.8 (1.1), on a scale ranging from 1 to 6. The mean (SD) scores for completeness and comprehensibility were 2.1 (0.7) and 2.8 (0.4), respectively, on scales ranging from 1 to 3. Overall, the mean (SD) accuracy (4.8 [1.1] vs. 5.6 [0.7]; P < 0.001) and completeness scores (2.1 [0.7] vs. 2.7 [0.4]; P < 0.001) were significantly lower for the experts than for the nonexperts, while comprehensibility was comparable among the two groups (2.8 [0.4] vs. 2.8 [0.3]; P = 0.55). Patients rated all questions as complete, comprehensible, and trustable in between 97 % and 100 % of cases.

Conclusions ChatGPT shows good performance, with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.

Introduction

Colorectal cancer (CRC) remains a paramount healthcare concern globally, ranking as the third most prevalent cancer in men and the second most prevalent in women; it is the fourth leading cause of cancer fatalities worldwide [1]. Early detection and the removal of precancerous and cancerous lesions through CRC screening may significantly reduce disease incidence and mortality [2] [3] [4]. To date, numerous interventions have been proposed to increase screening uptake [5]. Among these, an electronic patient portal has been shown to increase adherence to CRC screening and improve the time to screening completion [6] [7]. Despite these efforts, adherence to CRC screening remains suboptimal, ranging from 30 % to 80 % [8] [9] [10]. This may be because people have limited knowledge of the effectiveness of screening, the different screening techniques, or the fear of undergoing invasive tests such as colonoscopy.

Artificial intelligence (AI)-assisted chatbots such as ChatGPT (Chat Generative Pretrained Transformer; OpenAI Foundation) are emerging as a revolutionary tool that has the potential to provide educational support to patients through an accessible question–answer system [11]. As it has the potential to serve as a complementary resource for healthcare, it is currently being evaluated in various areas of medicine, including gastroenterology.

A recent study assessed the performance of ChatGPT in providing answering responses for patients with nonalcoholic fatty liver disease, showing high accuracy, reliability, and comprehensibility [12]. Similar studies evaluating ChatGPT in gastroenterology have been performed in the settings of acute pancreatitis [13] and Helicobacter pylori [14], and for patient questions about colonoscopy [15]. In line with these researches, this study aimed to evaluate the effectiveness of ChatGPT in answering patients' questions about CRC screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs worldwide.

Methods

The study took place from February to June 2024, across Europe and the USA. As neither patient-identifiable data nor intervention approaches were used, institutional review board approval was not required.

A working group composed of three authors (M.M., D.R., and C.H.) created a list of 15 questions on CRC screening and its diagnostic and therapeutic implications ([Table 1]). Questions were focused on general screening (Q1–Q6 and Q11), and endoscopic (Q7–Q10) and therapeutic measures (Q12–Q15). Upon reaching a consensus on the list of questions, the working group entered the queries into ChatGPT (version GPT 4.0) and registered the corresponding answers (Appendix 1 s, see online-only Supplementary material).

Table 1
List of questions posed to ChatGPT.
Question number	Question
1	Why should I undergo colon cancer screening?
2	At what age should I get colon cancer screening?
3	How is colon cancer screening carried out?
4	If the fecal occult blood test is negative, when will I need to repeat it?
5	If the fecal occult blood test is positive, what is my risk of having colon cancer?
6	If the fecal occult blood test is positive, what other tests should I undergo and within how long?
7	What symptoms or discomfort can I experience during a colonoscopy?
8	What are the potential risks of colonoscopy?
9	What diet should I follow before the colonoscopy?
10	What bowel preparation solution should I take before the colonoscopy?
11	If the colonoscopy turns out negative, which screening test should I repeat and how long after?
12	If a precancerous lesion is identified during the colonoscopy, can it be removed endoscopically?
13	In which cases of colon cancer is surgery necessary?
14	In which cases of colon cancer is chemotherapy necessary?
15	If I receive a diagnosis of colon cancer, at what age must my relatives undergo screening?

Subsequently, the responses were reviewed in parallel by 20 experts and 20 nonexperts. Experts were well-recognized international gastroenterologists with extensive clinical and scientific experience in gastroenterology and CRC screening. Nonexperts were physicians who were not board-certified in gastroenterology, who lacked specific expertise in CRC screening.

Both experts and nonexperts scored each response according to three domains – accuracy, completeness, and comprehensibility – using Likert scales that ranged between 1 and 6 for accuracy, and between 1 and 3 for completeness and comprehensibility. The use of these domains to evaluate the answers of ChatGPT, as well as their definitions, was derived from a previous similar study [12].

The ChatGPT responses were also given to 100 consecutive patients, aged 50–69 years, who were participating in the Italian national CRC screening program, who rated each response dichotomously (yes/no) as being understandable, complete, and trustable. For this purpose, the responses were first translated into Italian by a native Italian–English speaker to ensure high accuracy close to the original.

The results were analyzed, with mean (SD) reported for continuous variables and frequency and percentage for categorical variables. Comparisons of the variables were made by t test and chi-squared test as appropriate. A P value of < 0.05 was considered as indicating statistical significance.

The internal consistency of the scale was assessed using the Cronbach alpha coefficient and cutoff points of < 60 %, 61 %–70 %, 71 %–80 %, 81 %–90, and > 90 % were considered to suggest poor, questionable, acceptable, good, and excellent reliability, respectively [16].

All statistical analyses were performed using SPSS v. 29.0 for Macintosh (SPSS Inc., Chicago, Illinois, USA).

Results

Expert assessment

According to the expert assessment, the mean (SD) accuracy score, on a scale ranging from 1 to 6 points, was 4.8 (1.1), with 10/15 questions (Q1,Q2,Q7–Q12,Q14,Q15) receiving a mean rating ≥ 5 (“nearly all correct” judgment) ([Fig. 1a]; [Table 2]). Q12 and Q14 received the highest scores (5.2), while Q6 had the lowest score (3.9).

Fig. 1 Box plots showing the rating of ChatGPT answers by experts in terms of: a accuracy; b completeness; c comprehensibility.

Table 2
Assessment of the ChatGPT answers by the 20 experts.
Question	Accuracy (1 to 6 point scale)¹		Completeness (1 to 3 point scale)²		Comprehensibility (1 to 3 point scale)³
Question	Mean (SD)	≥ 4 points, %	Mean (SD)	≥ 2 points, %	Mean (SD)	3 points, %
Q1	5.1 (1.0)	90 %	2.5 (0.6)	95 %	2.9 (0.3)	90 %
Q2	5.2 (1.0)	95 %	2.2 (0.90	70 %	2.8 (0.4)	75 %
Q3	4.2 (1.1)	70 %	1.9 (0.7)	65 %	2.6 (0.5)	60 %
Q4	4.5 (1.4)	75 %	2.3 (0.6)	90 %	2.6 (0.6)	65 %
Q5	4.9 (1.0)	85 %	2.1 (0.8)	75 %	2.8 (0.4)	80 %
Q6	3.9 (1.3)	55 %	1.9 (0.7)	70 %	2.7 (0.5)	65 %
Q7	5.0 (0.9)	90 %	2.3 (0.7)	90 %	2.9 (0.4)	85 %
Q8	5.1 (0.9)	95 %	2.3 (0.7)	90 %	2.9 (0.4)	85 %
Q9	5.1 (0.9)	95 %	2.1(0.7)	80 %	3.0 (0.20	95 %
Q10	5.0 (1.3)	90 %	1.7 (0.7)	55 %	2.8 (0.4)	75 %
Q11	5.0 (1.1)	90 %	2.0 (0.7)	75 %	2.9 (0.4)	85 %
Q12	5.2 (0.8)	90 %	2.2 (0.7)	80 %	3.0 (0.2)	95 %
Q13	4.3 (1.3)	80 %	2.0 (0.6)	80 %	2.6 (0.5)	60 %
Q14	5.2 (0.8)	95 %	2.6 (0.6)	95 %	2.8 (0.6)	80 %
Q15	5.1 (0.9)	95 %	2.3 (0.7)	90 %	2.8 (0.4)	80 %

¹ 6-point Likert scale: 1, completely incorrect; 2, more incorrect than correct; 3, approximately equally correct and incorrect; 4, more correct than incorrect; 5, nearly all correct; 6, correct; ≥ 4 points was considered accurate.

² 3-point Likert scale: 1, incomplete (addresses some aspects of the question, but significant parts are missing or incomplete); 2, adequate (addresses all aspects of the question and provides the minimum amount of information required to be considered complete); 3, comprehensive (addresses all aspects of the question and provides additional information or context beyond what was expected); ≥ 2 points was considered complete.

³ 3-point Likert scale: 1, difficult to understand; 2, partially difficult to understand; 3, easy to understand; 3 points was considered comprehensible.

The mean (SD) completeness score, on a scale ranging from 1 to 3 points, was 2.1 (0.7), with all questions except Q3, Q6, and Q10 receiving a mean rating ≥ 2 (“adequate” judgment) ([Fig. 1b]). Q1 and Q14 received the highest scores (2.5 and 2.6, respectively), while Q10 received the lowest score (1.7).

Concerning comprehensibility, the mean (SD) score, on a scale ranging from 1 to 3, was 2.8 (0.4) ([Fig. 1c]). All questions received a rating ≥ 2.5 points (where 3 = “easy to understand”). Among these, Q3, Q4, and Q13 received the lowest scores (2.6).

The internal consistency of responses using Cronbach alpha coefficient was excellent for accuracy (0.92) and completeness (0.91), and acceptable for comprehensibility (0.77).

Nonexpert assessment

Nonexpert physicians were internal medicine doctors (55 %), general practitioners (20 %), pulmonologists, geriatricians, nephrologists, and radiologists (25 %).

The mean (SD) accuracy score, on a scale of 1 to 6, was 5.6 (0.7), with all questions receiving a mean rating ≥ 5 (“nearly all correct” judgment) ([Table 3]). The mean (SD) completeness score, on a scale of 1 to 3, was 2.7 (0.4), with all questions receiving a mean rating ≥ 2 (“adequate” judgment). The mean (SD) comprehensibility score, on a scale of 1 to 3, was 2.8 (0.3), with all questions scored ≥ 2.5 points (where 3 = “easy to understand”).

Table 3
Assessment of the ChatGPT answers by the 20 nonexperts.
Question	Accuracy (1 to 6 point scale)¹		Completeness (1 to 3 point scale)²		Comprehensibility (1 to 3 point scale)³
Question	Mean (SD)	≥ 4 points, %	Mean (SD)	≥ 2 points, %	Mean (SD)	3 points, %
Q1	5.6 (0.7)	100 %	2.8 (0.4)	100 %	2.9 (0.4)	100 %
Q2	5.6 (0.7)	100 %	2.8 (0.4)	100 %	2.9 (0.3)	100 %
Q3	5.8 (0.4)	100 %	2.7 (0.5)	100 %	2.8 (0.4)	100 %
Q4	5.3 (1.2)	95 %	2.7 (0.5)	100 %	2.8 (0.4)	100 %
Q5	5.8 (0.4)	100 %	2.7 (0.5)	100 %	2.8 (0.4)	100 %
Q6	5.7 (0.6)	100 %	2.7 (0.5)	100 %	2.9 (0.4)	100 %
Q7	5.7 (0.7)	95 %	2.9 (0.4)	100 %	2.9 (0.3)	100 %
Q8	5.6 (0.6)	100 %	2.8 (0.4)	100 %	2.9 (0.4)	100 %
Q9	5.5 (0.7)	100 %	2.6 (0.5)	100 %	2.9 (0.3)	100 %
Q10	5.5 (0.7)	100 %	2.8 (0.4)	100 %	2.8 (0.4)	100 %
Q11	5.7 (0.6)	100 %	2.8 (0.6)	95 %	2.9 (0.3)	100 %
Q12	5.5 (0.8)	95 %	2.9 (0.4)	100 %	2.8 (0.4)	100 %
Q13	5.5 (0.9)	95 %	2.9 (0.4)	100 %	2.9 (0.3)	95 %
Q14	5.7 (0.6)	100 %	2.9 (0.4)	100 %	2.8 (0.6)	100 %
Q15	5.7 (0.6)	100 %	2.7 (0.5)	100 %	2.8 (0.4)	100 %

¹ 6-point Likert scale: 1, completely incorrect; 2, more incorrect than correct; 3, approximately equally correct and incorrect; 4, more correct than incorrect; 5, nearly all correct; 6, correct; ≥ 4 points was considered accurate.

² 3-point Likert scale: 1, incomplete (addresses some aspects of the question, but significant parts are missing or incomplete); 2, adequate (addresses all aspects of the question and provides the minimum amount of information required to be considered complete); 3, comprehensive (addresses all aspects of the question and provides additional information or context beyond what was expected); ≥ 2 points was considered complete.

³ 3-point Likert scale: 1, difficult to understand; 2, partially difficult to understand; 3, easy to understand; 3 points was considered comprehensible.

The internal consistency of responses using Cronbach alpha coefficient was excellent for accuracy (0.97), and good for completeness (0.85) and comprehensibility (0.89).

Overall, in comparison with the nonexperts, the experts had significantly lower mean (SD) scores for accuracy (4.8 [1.1] vs. 5.6 [0.7]; P < 0.001) and completeness (2.1 [0.7] vs. 2.7 [0.4]; P < 0.001). In contrast, the mean (SD) comprehensibility scores were comparable for the two groups (2.8 [0.4] vs. 2.8 [0.3]; P = 0.55). The full comparisons between the experts' and nonexpertsʼ ratings for each question are shown in Table 1 s.

Patients’ assessment

The ChatGPT responses were given to 100 consecutive patients, aged between 50 and 69 years, admitted for CRC screening at the University Hospital of Enna “Kore.”

On the whole, each response received a completeness rating between 99 % and 100 %, with 10/15 questions being deemed complete by 100 % of the patients ([Table 4]). Likewise, the responses were considered comprehensible in 97 %–100 % of cases, with 12/15 questions being deemed comprehensible by 100 % of the patients. Finally, the responses were deemed trustworthy in 99 %–100 % of cases, with 14/15 questions being deemed trustworthy by 100 % of the patients.

Table 4
Assessment of the ChatGPT answers by the 100 patients.
Question	Complete Yes, %	Comprehensible Yes, %	Trustable Yes, %
Q1	99 %	100 %	100 %
Q2	100 %	100 %	100 %
Q3	100 %	100 %	100 %
Q4	99 %	100 %	99 %
Q5	100 %	99 %	100 %
Q6	99 %	100 %	100 %
Q7	100 %	100 %	100 %
Q8	99 %	100 %	100 %
Q9	100 %	97 %	100 %
Q10	100 %	100 %	100 %
Q11	100 %	100 %	100 %
Q12	99 %	100 %	100 %
Q13	100 %	100 %	100 %
Q14	100 %	100 %	100 %
Q15	100 %	99 %	100 %

Discussion

The results from this study show that ChatGPT performs well in providing patients with screening information, which has the potential to enhance awareness about CRC. The expert assessment showed high levels of accuracy and completeness, and an excellent level of comprehensibility. Although experts rated the accuracy as fair in two questions (Q3, which describes the screening process, and Q6, which explains the recall strategy after FIT positivity), the responses were still correct and provided a clear hierarchy and timing of the different exams. Similarly, completeness was rated as fair in three questions: Q3 and Q6, mentioned above, and Q10, which explains how to perform bowel preparation before colonoscopy. The answer to this last question was short and not very exhaustive on a topic that represents a very important quality indicator for colonoscopy and the detection of lesions.

As expected, the evaluation scores given by nonexperts were significantly higher in terms of accuracy and completeness, because specialists are generally more critical on topics within their knowledge. The comprehensibility ratings were however comparable, as both the experts and nonexperts are familiar with the medical language.

Patients do not have the skills to judge the accuracy of responses, so were not asked about this domain; however, when asked to comment on the completeness, comprehensibility, and trustability of the answers, they reported very high ratings. This confirms that, despite containing medical terminology, all the questions were found to be very comprehensive and easily understandable from the perspective of patients. Moreover, almost all of them expressed a high level of trust in the accuracy of the answers and, consequently, in the tool itself. This is relevant since generative language systems are usually built on statistical models that are not aimed at presenting accurate information, but rather at creating the impression of doing so by mimicking human speech or writing, and this is not always successful.

While the results of this study demonstrate the good performance of ChatGPT, it is important to note that this tool is not intended to replace medical consultations. In many instances, patients require a medical interview to address concerns and receive appropriate clarifications. Furthermore, engaging in discussion with a physician is essential to address complex questions related to concurrent medical conditions and medications, and to provide personalized care to patients.

To the best of our knowledge, this is the first study assessing the performance of a pretrained generative language system in responding to CRC screening-related inquiries including an evaluation of both doctors and patients. The study had several strengths. The evaluation of the responses was conducted by a large number of CRC experts from both Europe and the USA, ensuring the high reliability of the gold standard used in the study. Additionally, the assessment was also carried out in parallel by non-gastroenterologists to compare the responses between experts and nonexperts, as well as by patients, to gauge user perception. Moreover, all assessments by healthcare personnel were obtained through a quantitative scale, allowing for homogeneous comparability of the ratings.

This study also presents some limitations. First, the assessment of ChatGPT is limited to the study setting, so it cannot be generalized beyond these 15 questions and the evaluation of the raters. Second, it must be considered that the responses generated by pretrained regenerative language programs change based on the input of information, thereby resulting in poor reproducibility. Third, in this study, the questions were formulated by physicians, which represents a potential bias. Indeed, patient-generated questions may be less focused, affecting the quality of the responses provided by the chatbot. Finally, the questionnaire was disseminated only to Italian patients, limiting the external validity of the results. For this purpose, the form was translated into Italian from the original English version, which was very similar, though not identical. Nevertheless, given the high number of positive rates, it is likely that the language had little influence on patients' assessments.

In conclusion, this study shows that ChatGPT performs well in responding to patient CRC screening questions, with the potential to enhance awareness about CRC and improve screening outcomes. Nonetheless, these results must be interpreted with caution as they refer to a specific setting and cannot be generalized to overall ChatGPT performance. In the future, generative language systems will need further improvement to provide medical-specific versions that are trained in accordance with up-to-date scientific evidence and current guidelines.

Conflict of Interests

Y. Mori has received consulting and speaking fees, plus equipment loan from Olympus, and loyalties from Cybernet System Corp. M. Maida, D. Ramai, M. Dinis-Ribeiro, A. Facciorusso, and C. Hassan declare that they have no conflicts of interest.

Supplementary Material

Zusatzmaterial

References
1 Fitzmaurice C, Dicker D, Pain A. et al. Global burden of disease cancer collaboration. The global burden of cancer 2013. JAMA Oncol 2015; 1: 505-527

MissingFormLabel
Crossref PubMed Suche in Google Scholar
2 Løberg M, Kalager M, Holme Ø. et al. Long-term colorectal-cancer mortality after adenoma removal. NEJM 2014; 371: 799-807

MissingFormLabel
Crossref PubMed Suche in Google Scholar
3 Bretthauer M, Løberg M, Wieszczy P. et al. Effect of colonoscopy screening on risks of colorectal cancer and related death. NEJM 2022; 387: 1547-1556

MissingFormLabel
Crossref PubMed Suche in Google Scholar
4 Hewitson P, Glasziou P, Irwig L. et al. Screening for colorectal cancer using the faecal occult blood test, Hemoccult. Cochrane Database Syst Rev 2007; 2007: CD001216

MissingFormLabel
PubMed Suche in Google Scholar
5 Tsipa A, O'Connor DB, Branley-Bell D. et al. Promoting colorectal cancer screening: a systematic review and meta-analysis of randomised controlled trials of interventions to increase uptake. Health Psychol Rev 2021; 15: 371-394

MissingFormLabel
Crossref PubMed Suche in Google Scholar
6 Hahn EE, Baecker A, Shen E. et al. A patient portal-based commitment device to improve adherence with screening for colorectal cancer: a retrospective observational study. J Gen Intern Med 2021; 36: 952-960

MissingFormLabel
Crossref PubMed Suche in Google Scholar
7 Goshgarian G, Sorourdi C, May FP. et al. Effect of patient portal messaging before mailing fecal immunochemical test kit on colorectal cancer screening rates: a randomized clinical trial. JAMA Netw Open 2022; 5: e2146863

MissingFormLabel
Crossref PubMed Suche in Google Scholar
8 Klabunde C, Blom J, Bulliard JL. et al. Participation rates for organized colorectal cancer screening programmes: an international comparison. J Med Screen 2015; 22: 119-126

MissingFormLabel
Crossref PubMed Suche in Google Scholar
9 McNamara D, Leen R, Seng-Lee C. et al. Sustained participation, colonoscopy uptake and adenoma detection rates over two rounds of the Tallaght-Trinity College colorectal cancer screening programme with the faecal immunological test. Eur J Gastroenterol Hepatol 2014; 26: 1415-1421

MissingFormLabel
Crossref PubMed Suche in Google Scholar
10 Kapidzic A, Grobbee EJ, Hol L. et al. Attendance and yield over three rounds of population-based fecal immuno- chemical test screening. Am J Gastroenterol 2014; 109: 1257-1264

MissingFormLabel
Crossref PubMed Suche in Google Scholar
11 OpenAI. (2023). ChatGPT (Mar 14 version). Available at (Accessed 5 February 2024): https://chat.openai.com

MissingFormLabel
PubMed
12 Pugliese N, Wai-Sun Wong V, Schattenberg JM. et al. Accuracy, reliability, and comprehensibility of ChatGPT-generated medical responses for patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol 2024; 22: 886-889.e5

MissingFormLabel
Crossref PubMed Suche in Google Scholar
13 Du RC, Liu X, Lai YK. et al. Exploring the performance of ChatGPT on acute pancreatitis-related questions. J Transl Med 2024; 22: 527

MissingFormLabel
Crossref PubMed Suche in Google Scholar
14 Lai Y, Liao F, Zhao J. et al. Exploring the capacities of ChatGPT: A comprehensive evaluation of its accuracy and repeatability in addressing helicobacter pylori-related queries. Helicobacter 2024; 29: e13078

MissingFormLabel
Crossref PubMed Suche in Google Scholar
15 Lee TC, Staller K, Botoman V. et al. ChatGPT answers common patient questions about colonoscopy. Gastroenterology 2023; 165: 509-511.e7

MissingFormLabel
Crossref PubMed Suche in Google Scholar
16 Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297-334

MissingFormLabel
Crossref PubMed Suche in Google Scholar

Corresponding author

Marcello Maida, MD

Department of Medicine and Surgery

University of Enna “Kore”

Enna

Italy

eMail: marcello.maida@unikore.it

Publikationsverlauf

Eingereicht: 14. März 2024

Angenommen nach Revision: 14. August 2024

Accepted Manuscript online:
14. August 2024

Artikel online veröffentlicht:
23. Oktober 2024

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Fitzmaurice C, Dicker D, Pain A. et al. Global burden of disease cancer collaboration. The global burden of cancer 2013. JAMA Oncol 2015; 1: 505-527

MissingFormLabel
Crossref PubMed Suche in Google Scholar
2 Løberg M, Kalager M, Holme Ø. et al. Long-term colorectal-cancer mortality after adenoma removal. NEJM 2014; 371: 799-807

MissingFormLabel
Crossref PubMed Suche in Google Scholar
3 Bretthauer M, Løberg M, Wieszczy P. et al. Effect of colonoscopy screening on risks of colorectal cancer and related death. NEJM 2022; 387: 1547-1556

MissingFormLabel
Crossref PubMed Suche in Google Scholar
4 Hewitson P, Glasziou P, Irwig L. et al. Screening for colorectal cancer using the faecal occult blood test, Hemoccult. Cochrane Database Syst Rev 2007; 2007: CD001216

MissingFormLabel
PubMed Suche in Google Scholar
5 Tsipa A, O'Connor DB, Branley-Bell D. et al. Promoting colorectal cancer screening: a systematic review and meta-analysis of randomised controlled trials of interventions to increase uptake. Health Psychol Rev 2021; 15: 371-394

MissingFormLabel
Crossref PubMed Suche in Google Scholar
6 Hahn EE, Baecker A, Shen E. et al. A patient portal-based commitment device to improve adherence with screening for colorectal cancer: a retrospective observational study. J Gen Intern Med 2021; 36: 952-960

MissingFormLabel
Crossref PubMed Suche in Google Scholar
7 Goshgarian G, Sorourdi C, May FP. et al. Effect of patient portal messaging before mailing fecal immunochemical test kit on colorectal cancer screening rates: a randomized clinical trial. JAMA Netw Open 2022; 5: e2146863

MissingFormLabel
Crossref PubMed Suche in Google Scholar
8 Klabunde C, Blom J, Bulliard JL. et al. Participation rates for organized colorectal cancer screening programmes: an international comparison. J Med Screen 2015; 22: 119-126

MissingFormLabel
Crossref PubMed Suche in Google Scholar
9 McNamara D, Leen R, Seng-Lee C. et al. Sustained participation, colonoscopy uptake and adenoma detection rates over two rounds of the Tallaght-Trinity College colorectal cancer screening programme with the faecal immunological test. Eur J Gastroenterol Hepatol 2014; 26: 1415-1421

MissingFormLabel
Crossref PubMed Suche in Google Scholar
10 Kapidzic A, Grobbee EJ, Hol L. et al. Attendance and yield over three rounds of population-based fecal immuno- chemical test screening. Am J Gastroenterol 2014; 109: 1257-1264

MissingFormLabel
Crossref PubMed Suche in Google Scholar
11 OpenAI. (2023). ChatGPT (Mar 14 version). Available at (Accessed 5 February 2024): https://chat.openai.com

MissingFormLabel
PubMed
12 Pugliese N, Wai-Sun Wong V, Schattenberg JM. et al. Accuracy, reliability, and comprehensibility of ChatGPT-generated medical responses for patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol 2024; 22: 886-889.e5

MissingFormLabel
Crossref PubMed Suche in Google Scholar
13 Du RC, Liu X, Lai YK. et al. Exploring the performance of ChatGPT on acute pancreatitis-related questions. J Transl Med 2024; 22: 527

MissingFormLabel
Crossref PubMed Suche in Google Scholar
14 Lai Y, Liao F, Zhao J. et al. Exploring the capacities of ChatGPT: A comprehensive evaluation of its accuracy and repeatability in addressing helicobacter pylori-related queries. Helicobacter 2024; 29: e13078

MissingFormLabel
Crossref PubMed Suche in Google Scholar
15 Lee TC, Staller K, Botoman V. et al. ChatGPT answers common patient questions about colonoscopy. Gastroenterology 2023; 165: 509-511.e7

MissingFormLabel
Crossref PubMed Suche in Google Scholar
16 Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297-334

MissingFormLabel
Crossref PubMed Suche in Google Scholar

Lizenzen und Reprints

Zusatzmaterial

Zusatzmaterial

RSS-Feed abonnieren

Teilen / Bookmarken

The role of generative language systems in increasing patient awareness of colon cancer screening

Einführung zu diesem Artikel:

Einführung zu diesem Artikel:

Abstract

Introduction

Methods

List of questions posed to ChatGPT.

Results

Expert assessment

Assessment of the ChatGPT answers by the 20 experts.

Nonexpert assessment

Assessment of the ChatGPT answers by the 20 nonexperts.

Patients’ assessment

Assessment of the ChatGPT answers by the 100 patients.

Discussion

Conflict of Interests

Supplementary Material

References

Corresponding author

Publikationsverlauf

References