Subscribe to RSS
DOI: 10.1055/s-0043-1777470
Machine learning for liver cancer risk stratification on population-level data
Hepatocellular carcinoma (HCC) is a highly fatal malignancy whose incidence is increasing due to the global obesity epidemic. Early diagnosis is crucial for curative therapy, but many patients are diagnosed at advanced stages with poor prognosis. To improve risk stratification and integrate the multi-dimensional well characterized risk constellations (chronic liver disease, serum indicators, lifestyle, hereditary risk), it is essential to standardize risk stratification and harvest the potential of big data. Population-based databases, such as the UK-Biobank (UKB), are an invaluable tool for this task. The UKB is a population-wide database with electronic health records, death registers, lifestyle, physical and biological measures as well as genomics (n=500k each) and metabolomics data (n=250k). Here, we train a random forest machine learning (ML) classifier on multimodal data of all included patients to predict HCC occurrence (n=470).
Training and testing a five-fold cross validation random forest model on 18 UKB centers inside England reveals high accuracy with a mean AUROC of 0.87. To emphasize the relevance of this approach, preliminary results reveal an astounding distribution of feature relevance, with very high relevance of blood and metabolomic parameters. This is followed by lifestyle and EHR parameters, while genetic information only mildly improves the predictions.
In conclusion, leveraging the comprehensive data from the UK Biobank via a random forest ML classifier underscores the importance of blood and metabolomic parameters in HCC risk prediction, while also highlighting the nuanced contributions of lifestyle, EHR, and genetic factors in enhancing diagnostic accuracy.
Publication History
Article published online:
23 January 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany