Large Language Modelle zur schnellen Vereinfachung der Eingabe von Qualitätssicherungsdaten: Performance-Test mit Echtdaten am Beispiel der Tumordokumentation in der Urologie

Johannes Frank; Axel S. Merseburger; Johannes Landmesser; Silvia Brozat-Essen; Peter Schramm; Laura Freimann; Alexander Kleehaus; Christian Elsner

doi:10.1055/a-2281-8015

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00000007.xml

Download PDF

Aktuelle Urol 2024; 55(05): 415-423
DOI: 10.1055/a-2281-8015

Originalarbeit

Large Language Modelle zur schnellen Vereinfachung der Eingabe von Qualitätssicherungsdaten: Performance-Test mit Echtdaten am Beispiel der Tumordokumentation in der Urologie

Large Language Models for Rapid Simplification of Quality Assurance Data Input: Field Trial with Real Data in the Context of Tumour Documentation in Urology

Authors

Johannes Frank

¹Urology Department, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)
Axel S. Merseburger

¹Urology Department, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)
Johannes Landmesser

¹Urology Department, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)
Silvia Brozat-Essen

¹Urology Department, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)
Peter Schramm

²Department for Neuroradiology, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)
Laura Freimann

³Data Protection, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany (Ringgold ID: RIN39068)
Alexander Kleehaus

⁴IT Consulting, PricewaterhouseCoopers Switzerland, Zuerich, Switzerland (Ringgold ID: RIN504202)
Christian Elsner

⁵Center for Artificial Intelligence, University of Luebeck, Luebeck, Germany (Ringgold ID: RIN9191)

Further Information

Also available at

Permissions and Reprints

Zusammenfassung

Einleitung Large Language Modelle (LLMs) wie ChatGPT haben innerhalb kürzester Zeit die Anwendung von künstlicher Intelligenz in die breite Anwendung gebracht. Neben vielen verschiedenen Use-Cases der Textgenerierung und Verarbeitung ist eine Anwendung die Extraktion von Daten aus vorhandenen Dokumenten und Gesprächen zur vereinfachten und automatisierten Befüllung von Formularen.

Zielsetzung Gerade im Bereich der Qualitätssicherung und Dokumentation von Tumorerkrankungen fällt aktuell ein hoher Arbeitsaufwand an, Daten unter verschiedenen Aspekten in leicht variierenden Formaten und unter Anwendung von Interpretationen wie z.B. der TNM-Klassifikation von Tumoren zu übertragen. Zur Beurteilung der Anwendbarkeit von LLMs unterstützen Prozessen in diesem Bereich fehlen jedoch Feldversuche mit Echtdaten, die eine Beurteilung der Effizienz und Praktikabilität ermöglichen. Diese Arbeit soll einen Performance-Test dazu umsetzen und beurteilen.

Methodik Es wurde ein Performance-Test mit N=153 datenschutztechnisch und durch eine Ethikkommission zu dem Zweck freigegebenen Arztbriefen von 25 Patienten vorgenommen. Mit der öffentlich verfügbaren Version von ChatGPT 4.0 wurden dazu mit einem automatisierten Programmskript die Aufgaben der Extraktion eines Erstdiagnosedatums sowie gängiger Tumorklassifikationen vorgenommen. Die Ergebnisse wurden dann einzeln auf Richtigkeit geprüft. Daran wurde dann der Nutzen eines Systems zum geführten Support bei Aufgaben im Kontext der Tumordokumentation indikativ beurteilt. Weiterhin wurde das Vorgehen auch im Kontext von Betriebskosten sowie potenzieller Hürden bis zur Anwendbarkeit beurteilt.

Ergebnisse In Summe kommt die Arbeit zum Schluss, dass der Einsatz generativer KI in diesem Feld vielversprechend ist und bereits im untrainierten Zustand als Hilfe tauglich ist. In einer simplifizierten Kalkulation stehen Kosten von 35 Cent einer Wertschöpfung von 61,54 Euro gegenüber. Es wird jedoch auch klar, dass die KI nur unterstützend tätig sein kann und die richtige Einbettung mit vorgefertigten spezifischen natürlichsprachigen Abfragen (=Prompts) und Werkzeugen in den Arbeitsablauf entscheidend für die Performance ist.

Schlussfolgerung Der Einsatz von generativer KI im Kontext von Such-, Übertragungs- und Interpretationsarbeiten bei der Erstellung einer Tumordokumentation ist ein vielversprechender Ansatz. Die Umsetzung muss jedoch in praktischer Anwendung eng begleitet werden und das beste Zusammenspiel zwischen Mensch und Maschine weiter evaluiert und mit spezifischen Werkzeugen begleitet werden.

Abstract

Introduction Large Language Models (LLMs) such as ChatGPT have rapidly brought the application of artificial intelligence into widespread use. Among many different use cases for text generation and processing, one application is the extraction of data from existing documents and conversations for simplified and automated form-filling.

Objective In the field of quality assurance and documentation of cancer diseases, there is currently a significant workload involved in transferring data under various aspects into slightly varying formats and applying interpretations such as the TNM classification of tumours. However, there is a lack of trials with real data to assess the applicability of LLM-supported processes in this area, which would enable an evaluation of efficiency and practicality. This study aims to implement and assess such a trial.

Methodology A trial was conducted with N=153 privacy-compliant and ethics committee-cleared medical reports from 25 patients. Using the publicly available version of ChatGPT 4.0, an automated script was used to extract the date of initial diagnosis and common tumor classifications. The results were then individually checked for accuracy. Based on this, the utility of a simple system for guided support in tasks related to tumour documentation was assessed. Additionally, the approach was evaluated in terms of operational costs for the model and its applicability.

Results In summary, the study concludes that the use of generative AI in this field is promising and suitable as a tool even in an untrained state. In a simplified calculation, costs of 35 cents are offset by a value creation of 61,54 euros. However, it also becomes clear that AI can only act in a supportive role, and the correct integration with pre-made specific prompts and tools into the workflow is crucial for a relevant performance.

Conclusion The use of generative AI in the context of search, transfer, and interpretation tasks in the creation of tumor documentation is a promising approach. However, its implementation in practical applications must be closely monitored, and the optimal interaction between man and machine should continue to be evaluated and must be accompanied by tools and task-specific prompts.

Schlüsselwörter

Tumor Dokumentation - künstliche Intelligenz - KI - ChatGPT - Ökonomie

Keywords

ChatGPT - artificial intelligence - AI - economics - oncology documentation

Publication History

Received: 24 January 2024

Accepted after revision: 29 February 2024

Article published online:
10 April 2024

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

Literatur
1 LLMs: How They Work and Their Applications | ML6. Accessed February 22, 2024 at: https://www.ml6.eu/resources/large-language-models

Download RIS citation
2 Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans Benchmarks Stand Eval 2023; 3: 100105

Crossref Search in Google Scholar
Download RIS citation
3 Hadi MU, Al-Tashi Q, Qureshi R. et al. Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. 2023

PubMed Search in Google Scholar
Download RIS citation
4 Kung TH, Cheatham M, Medenilla A. et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2: e0000198

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Ali SR, Dobbs TD, Hutchings HA. et al. Using ChatGPT to write patient clinic letters. Lancet Digit Health 2023; 5: e179-e181

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Johnson D, Goodman R, Patrinely J. et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res Sq 2023;

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Rumbold JMM, Pierscionek B. The Effect of the General Data Protection Regulation on Medical Research. J Med Internet Res 2017; 19: e47

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Walonoski J, Hall D, Bates K. et al. The “Coherent Data Set“: Combining Patient Data and Imaging in a Comprehensive, Synthetic Health Record. Electronics 2022; 11: 1199

Crossref Search in Google Scholar
Download RIS citation
9 Hernandez M, Epelde G, Alberdi A. et al. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022; 493: 28-45

Crossref Search in Google Scholar
Download RIS citation
10 Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25: 44-56

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Meskó B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. J Med Internet Res 2023; 25: e50638

Crossref PubMed Search in Google Scholar
Download RIS citation
12 The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools | Lakera – Protecting AI teams that disrupt the world. Accessed February 22, 2024 at: https://www.lakera.ai/blog/llm-fine-tuning-guide

Download RIS citation
13 Prognos_Endbericht_Deutsche_Krebshilfe.pdf

Download RIS citation
14 140206_Abschlussbericht_Projekt_Mammakarzinom.pdf

Download RIS citation
15 Saenz AD, Harned Z, Banerjee O. et al. Autonomous AI systems in the face of liability, regulations and costs. Npj Digit Med 2023; 6: 1-3

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Price WN, Gerke S, Cohen IG. Potential Liability for Physicians Using Artificial Intelligence. JAMA 2019; 322: 1765-1766

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Tobia K, Nielsen A, Stremitzer A. When Does Physician Use of AI Increase Liability?. J Nucl Med Off Publ Soc Nucl Med 2021; 62: 17-21

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Tang L, Li J, Fantus S. Medical artificial intelligence ethics: A systematic review of empirical studies. Digit Health 2023; 9 20552076231186064

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Zarifhonarvar A. Economics of ChatGPT: A Labor Market View on the Occupational Impact of Artificial Intelligence. 2023

PubMed Search in Google Scholar
Download RIS citation
20 Ehrgeiziger Plan: UKSH geht in die Cloud von Telekom und Google. kma Online. 2024 Accessed February 22, 2024 at: https://www.kma-online.de/aktuelles/it-digital-health/detail/uksh-geht-in-die-cloud-von-telekom-und-google-51383

PubMed Search in Google Scholar
Download RIS citation
21 Hughes A. The power of prompting. Microsoft Res 2023. Accessed February 22, 2024 at: https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/

Download RIS citation

Related Journals

Related Books

Subscribe to RSS

Share / Bookmark

Large Language Modelle zur schnellen Vereinfachung der Eingabe von Qualitätssicherungsdaten: Performance-Test mit Echtdaten am Beispiel der Tumordokumentation in der Urologie

Authors

Zusammenfassung

Abstract

Schlüsselwörter

Keywords

Publication History

Literatur