Open Access
CC BY-NC-ND 4.0 · Indian J Radiol Imaging 2024; 34(04): 653-660
DOI: 10.1055/s-0044-1787974
Original Article

Radiologic Decision-Making for Imaging in Pulmonary Embolism: Accuracy and Reliability of Large Language Models—Bing, Claude, ChatGPT, and Perplexity

1   Department of Radiodiagnosis, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
,
Suvrankar Datta
2   Department of Radiodiagnosis, All India Institute of Medical Sciences New Delhi, New Delhi, India
,
M. Sarthak Swarup
3   Department of Radiodiagnosis, Vardhman Mahavir Medical College and Safdarjung Hospital New Delhi, New Delhi, India
,
4   Department of Otorhinolaryngology and Head and Neck Surgery, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
,
5   Department of Computer Science and Engineering, SOET, Centurion University of Technology and Management, Bhubaneswar, Odisha, India
,
Archana Malik
6   Department of Pulmonary Medicine, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
,
6   Department of Pulmonary Medicine, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
,
7   Department of Physiology, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
› Institutsangaben
Preview

Abstract

Background Artificial intelligence chatbots have demonstrated potential to enhance clinical decision-making and streamline health care workflows, potentially alleviating administrative burdens. However, the contribution of AI chatbots to radiologic decision-making for clinical scenarios remains insufficiently explored. This study evaluates the accuracy and reliability of four prominent Large Language Models (LLMs)—Microsoft Bing, Claude, ChatGPT 3.5, and Perplexity—in offering clinical decision support for initial imaging for suspected pulmonary embolism (PE).

Methods Open-ended (OE) and select-all-that-apply (SATA) questions were crafted, covering four variants of case scenarios of PE in-line with the American College of Radiology Appropriateness Criteria. These questions were presented to the LLMs by three radiologists from diverse geographical regions and setups. The responses were evaluated based on established scoring criteria, with a maximum achievable score of 2 points for OE responses and 1 point for each correct answer in SATA questions. To enable comparative analysis, scores were normalized (score divided by the maximum achievable score).

Result In OE questions, Perplexity achieved the highest accuracy (0.83), while Claude had the lowest (0.58), with Bing and ChatGPT each scoring 0.75. For SATA questions, Bing led with an accuracy of 0.96, Perplexity was the lowest at 0.56, and both Claude and ChatGPT scored 0.6. Overall, OE questions saw higher scores (0.73) compared to SATA (0.68). There is poor agreement among radiologists' scores for OE (Intraclass Correlation Coefficient [ICC] = −0.067, p = 0.54), while there is strong agreement for SATA (ICC = 0.875, p < 0.001).

Conclusion The study revealed variations in accuracy across LLMs for both OE and SATA questions. Perplexity showed superior performance in OE questions, while Bing excelled in SATA questions. OE queries yielded better overall results. The current inconsistencies in LLM accuracy highlight the importance of further refinement before these tools can be reliably integrated into clinical practice, with a need for additional LLM fine-tuning and judicious selection by radiologists to achieve consistent and reliable support for decision-making.



Publikationsverlauf

Artikel online veröffentlicht:
04. Juli 2024

© 2024. Indian Radiological Association. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India