Z Gastroenterol 2024; 62(01): e5
DOI: 10.1055/s-0043-1777470
Abstracts | GASL
Lecture Session IV TUMORS 27/01/2024, 09.10am–09.55am, Lecture Hall

Machine learning for liver cancer risk stratification on population-level data

Jan Clusmann
1   University Hospital Aachen
,
Paul Koop
1   University Hospital Aachen
,
Yazhou Chen
1   University Hospital Aachen
,
Benjamin Laevens
1   University Hospital Aachen
,
Kai Markus Schneider
1   University Hospital Aachen
,
Christian Trautwein
1   University Hospital Aachen
,
Jakob Nikolas Kather
2   Technical University of Dresden
,
Carolin Victoria Schneider
1   University Hospital Aachen
› Author Affiliations
 

Hepatocellular carcinoma (HCC) is a highly fatal malignancy whose incidence is increasing due to the global obesity epidemic. Early diagnosis is crucial for curative therapy, but many patients are diagnosed at advanced stages with poor prognosis. To improve risk stratification and integrate the multi-dimensional well characterized risk constellations (chronic liver disease, serum indicators, lifestyle, hereditary risk), it is essential to standardize risk stratification and harvest the potential of big data. Population-based databases, such as the UK-Biobank (UKB), are an invaluable tool for this task. The UKB is a population-wide database with electronic health records, death registers, lifestyle, physical and biological measures as well as genomics (n=500k each) and metabolomics data (n=250k). Here, we train a random forest machine learning (ML) classifier on multimodal data of all included patients to predict HCC occurrence (n=470).

Training and testing a five-fold cross validation random forest model on 18 UKB centers inside England reveals high accuracy with a mean AUROC of 0.87. To emphasize the relevance of this approach, preliminary results reveal an astounding distribution of feature relevance, with very high relevance of blood and metabolomic parameters. This is followed by lifestyle and EHR parameters, while genetic information only mildly improves the predictions.

In conclusion, leveraging the comprehensive data from the UK Biobank via a random forest ML classifier underscores the importance of blood and metabolomic parameters in HCC risk prediction, while also highlighting the nuanced contributions of lifestyle, EHR, and genetic factors in enhancing diagnostic accuracy.



Publication History

Article published online:
23 January 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany