Best Paper Selection

doi:10.1055/s-0044-1800752

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00034612.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

CC BY 4.0 · Yearb Med Inform 2024; 33(01): 249
DOI: 10.1055/s-0044-1800752

Section 10: Natural Language Processing

Best Paper Selection – Content Summaries

Best Paper Selection

› Weitere Informationen

Auch verfügbar auf

Lizenzen und Reprints

Appendix: Content Summaries of Selected Best Papers for the 2024 IMIA Yearbook Section Natural Language Processing

Tan RSYC, Lin Q, Low GH, Lin R, Goh TC, Chang CCE, Lee FF, Chan WY, Tan WC, Tey HJ, Leong FL, Tan HQ, Nei WL, Chay WY, Tai DWM, Lai GGY, Cheng LT, Wong FY, Chua MCH, Chua MLK, Tan DSW, Thng CH, Tan IBH, Ng HT.

Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.

J Am Med Inform Assoc. 2023 Sep 25;30(10):1657-1664.

doi: 10.1093/jamia/ocad133

This paper focuses on classifying a dataset of 10,602 tomographic reports into four classes (no evidence of disease, partial response, stable disease, progressive disease) from a single institution (the National Cancer Center Singapore), using the most up-to-date model architecture: large language models (LLMs). While accessing clinical documents is still challenging, LLMs need a lot of data. In this paper, the authors explore two techniques of interest for using LLMs on a limited number of clinical texts. The first technique deals with increasing data through data augmentation techniques. The authors focused on the conclusion part of each report. They permuted sentences from existing reports to generate new synthetic radiology reports and attributed the original classification label. The second technique explores existing LLMs (10 models were used) through prompt-based fine-tuning to classify medical documents. The prompt used in those experiments looks like “[INPUT_TEXT] [SEP] In summary, this is a [MASK]” where the text given as input is the conclusion from the radiology report. The MASK value refers to one of the four classes ID from 0 (no evidence of disease) to 3 (progressive disease). This study confirms the impact of data size on LLM performances since the authors observed an increased accuracy value while fine-tuning the LLMs on corpora composed of augmented data. More specifically, with regard to prompt engineering techniques, the authors highlighted that prompting provides more stable results for a classification task when dealing with low training data.

Rohanian O, Nouriborji M, Kouchaki S, Clifton DA.

On the effectiveness of compact biomedical transformers.

Bioinformatics. 2023 Mar 1;39(3):btad103.

doi: 10.1093/bioinformatics/btad103.

While language models proved efficient on several NLP tasks, they require a vast amount of data to be valid. As a consequence, those models are cumbersome. In this paper, the authors explore how to compact biomedical models using two techniques: (i) continual learning of three pre-trained models (DistilBERT, MobileBERT, and TinyBERT) on the PubMed dataset in the masked language modeling framework, or (ii) knowledge distillation through three distillation procedures (producing four models: DistilBioBERT, TinyBioBERT, CompactBioBERT by combining distillation approaches of the two previous models, BioMobileBERT). The usefulness of these lightweight models has been evaluated on three NLP tasks: named entity recognition (NER, eight datasets produced from Medline or PubMed contents and related to disease, drug/chemical, gene/protein, and species), question answering (QA, BioASQ 7b dataset), and relation extraction (RE, two datasets). The BioBERT-v1.1 model was used as a baseline. This baseline model generally achieved better results on the three tasks than all distilled models. Nevertheless, the continual learning of the DistilBERT model outperformed the baseline model on NER and RE tasks. The continual learning of MobileBERT and BioMobileBERT models obtained the best results on the QA task. All produced models are available on HuggingFace and the authors' GitHub repository.

Publikationsverlauf

Artikel online veröffentlicht:
08. April 2025

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany