Open Access
CC BY-NC-ND 4.0 · Methods Inf Med 2024; 63(05/06): 145-163
DOI: 10.1055/a-2405-2489
Original Article

Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop

Shuntaro Yada
1   Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
,
Yuta Nakamura
2   22nd Century Medical and Research Center, The University of Tokyo Hospital, Tokyo, Japan
,
Shoko Wakamiya
1   Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
,
Eiji Aramaki
1   Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
› Institutsangaben

Funding This work was supported by JST AIP Trilateral AI Research Grant Number JPMJCR20G9 and MHLW Program Grant Number JPMH21AC500111 (formerly JST AIP-PRISM Grant Number JPMJCR18Y1), Japan.
Preview

Abstract

Background Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability.

Objectives We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks.

Methods We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs.

Results Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy.

Conclusion Most systems adopt medical-domain–specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8–0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.

Ethical Approval Statement

This study did not require the participants to be involved in any physical or mental intervention. Furthermore, as it did not utilize personally identifiable information, the study was exempt from institutional review board approval in accordance with the Ethical Guidelines for Medical and Health Research Involving Human Subjects outlined by the Japanese national government.




Publikationsverlauf

Eingereicht: 07. Juni 2023

Angenommen: 23. August 2024

Accepted Manuscript online:
29. August 2024

Artikel online veröffentlicht:
29. Oktober 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Stuttgart · New York