Machine learning for liver cancer risk stratification on population-level data

Jan Clusmann; Paul Koop; Yazhou Chen; Benjamin Laevens; Kai Markus Schneider; Christian Trautwein; Jakob Nikolas Kather; Carolin Victoria Schneider

doi:10.1055/s-0043-1777470

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00000094.xml

Share / Bookmark

Facebook X Linkedin Weibo

Z Gastroenterol 2024; 62(01): e5
DOI: 10.1055/s-0043-1777470

Abstracts | GASL

Lecture Session IV TUMORS 27/01/2024, 09.10am–09.55am, Lecture Hall

Machine learning for liver cancer risk stratification on population-level data

Jan Clusmann

¹University Hospital Aachen

,

Paul Koop

¹University Hospital Aachen

,

Yazhou Chen

¹University Hospital Aachen

,

Benjamin Laevens

¹University Hospital Aachen

,

Kai Markus Schneider

¹University Hospital Aachen

,

Christian Trautwein

¹University Hospital Aachen

,

Jakob Nikolas Kather

²Technical University of Dresden

,

Carolin Victoria Schneider

¹University Hospital Aachen

› Author Affiliations

› Further Information

Also available at

Congress Abstract
Full Text

Hepatocellular carcinoma (HCC) is a highly fatal malignancy whose incidence is increasing due to the global obesity epidemic. Early diagnosis is crucial for curative therapy, but many patients are diagnosed at advanced stages with poor prognosis. To improve risk stratification and integrate the multi-dimensional well characterized risk constellations (chronic liver disease, serum indicators, lifestyle, hereditary risk), it is essential to standardize risk stratification and harvest the potential of big data. Population-based databases, such as the UK-Biobank (UKB), are an invaluable tool for this task. The UKB is a population-wide database with electronic health records, death registers, lifestyle, physical and biological measures as well as genomics (n=500k each) and metabolomics data (n=250k). Here, we train a random forest machine learning (ML) classifier on multimodal data of all included patients to predict HCC occurrence (n=470).

Training and testing a five-fold cross validation random forest model on 18 UKB centers inside England reveals high accuracy with a mean AUROC of 0.87. To emphasize the relevance of this approach, preliminary results reveal an astounding distribution of feature relevance, with very high relevance of blood and metabolomic parameters. This is followed by lifestyle and EHR parameters, while genetic information only mildly improves the predictions.

In conclusion, leveraging the comprehensive data from the UK Biobank via a random forest ML classifier underscores the importance of blood and metabolomic parameters in HCC risk prediction, while also highlighting the nuanced contributions of lifestyle, EHR, and genetic factors in enhancing diagnostic accuracy.

Publication History

Article published online:
23 January 2024

Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany