Klin Monbl Augenheilkd 2024; 241(05): 675-681
DOI: 10.1055/a-2149-0447
Experimentelle Studie

Assessment of ChatGPT in the Prehospital Management of Ophthalmological Emergencies – An Analysis of 10 Fictional Case Vignettes

ChatGPT in der präklinischen Versorgung augenärztlicher Notfälle – eine Untersuchung von 10 fiktiven Fallvignetten
Dominik Knebel
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
,
Siegfried Priglinger
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
,
Nicolas Scherer
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
,
Julian Klaas
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
,
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
,
Benedikt Schworm
Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
› Author Affiliations

Abstract

Background The artificial intelligence (AI)-based platform ChatGPT (Chat Generative Pre-Trained Transformer, OpenAI LP, San Francisco, CA, USA) has gained impressive popularity in recent months. Its performance on case vignettes of general medical (non-ophthalmological) emergencies has been assessed – with very encouraging results. The purpose of this study was to assess the performance of ChatGPT on ophthalmological emergency case vignettes in terms of the main outcome measures triage accuracy, appropriateness of recommended prehospital measures, and overall potential to inflict harm to the user/patient.

Methods We wrote ten short, fictional case vignettes describing different acute ophthalmological symptoms. Each vignette was entered into ChatGPT five times with the same wording and following a standardized interaction pathway. The answers were analyzed following a systematic approach.

Results We observed a triage accuracy of 93.6%. Most answers contained only appropriate recommendations for prehospital measures. However, an overall potential to inflict harm to users/patients was present in 32% of answers.

Conclusion ChatGPT should presently not be used as a stand-alone primary source of information about acute ophthalmological symptoms. As AI continues to evolve, its safety and efficacy in the prehospital management of ophthalmological emergencies has to be reassessed regularly.

Zusammenfassung

Hintergrund Die auf künstlicher Intelligenz (KI) basierende Plattform ChatGPT (Chat Generative Pre-Trained Transformer, OpenAI LP, San Francisco, CA, USA) hat in den vergangenen Monaten rasant an Popularität gewonnen. Vorangegange Studien zeigen ein vielversprechendes Abschneiden von ChatGPT in der Beantwortung allgemeinmedizinischer Notfallvignetten. Ziel dieser Studie war es, die Antworten von ChatGPT auf ophthalmologische Fallvignetten hinsichtlich Triagegenauigkeit, Angemessenheit empfohlener präklinischer Maßnahmen sowie Schadenspotenzial zu beurteilen.

Methoden Wir erstellten 10 kurze, fiktive Fallvignetten aus dem Bereich augenheilkundlicher Akutsymptomatik. Jede Vignette wurde entsprechend einem standardisierten Interaktionspfad 5-mal in ChatGPT eingegeben. Die Antworten wurden anhand eines strukturierten Evaluationsmanuals ausgewertet.

Ergebnisse Wir beobachteten eine Triagegenauigkeit von 93,6%. Die meisten Antworten enthielten nur angemessene Empfehlungen bezüglich präklinischer Maßnahmen. Insgesamt zeigte sich jedoch in 32% der Antworten ein Schadenspotenzial für den Nutzer/Patienten.

Schlussfolgerung ChatGPT sollte derzeit nicht als einzige Informationsquelle zur Beurteilung akuter ophthalmologischer Symptome herangezogen werden. Neuentwicklungen auf dem Bereich der KI sollten regelmäßig im Hinblick auf Chancen und Risiken im Bereich der augenärztlichen Notfallversorgung evaluiert werden.

Conclusion Box

Already known:

  • ChatGPT has been reported to perform well on the Ophthalmic Knowledge Assessment Programme to give useful information on several medical topics such as retinal diseases as well as cardiopulmonary resuscitation measures and to perform well on triaging and diagnosing general medical emergencies.

  • However, it can also give wrong information or harmful advice in a very confident and authoritative tone.

Newly described:

  • While performing remarkably well triaging ophthalmological emergencies and recommending preclinical measures, ChatGPTʼs performance strongly depended on the individual case description it was provided, and we identified 32% of its responses to be potentially harmful.

  • As the popularity of ChatGPT and other AI-based language models grows, it is important to educate the public as well as the medical community on their current limitations – at the moment, they should not be used for ophthalmological emergencies.

  • However, as even the current versions of general-purpose language models already show an impressive performance in the medical domain, research should focus on developing more advanced language models specifically designed for medical purposes.

Supporting Information



Publication History

Received: 31 July 2023

Accepted: 04 August 2023

Article published online:
27 October 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany