Subscribe to RSS
DOI: 10.1055/s-0043-1777562
Characterising MASLD using Bayesian Networks in the UK Biobank
Background Quick and accessible methods that characterise and predict Metabolic dysfunction-associated steatotic liver disease (MASLD) remain paramount considering its increasing prevalence. A continued area of focus is the use of blood biomarkers to estimate liver steatosis. We present a Bayesian Network approach which describes the structure of MASLD, using 35,000 UK Biobank participants.
Methods Using a Random Forest Classifier (RFC), feature importance is performed for 30 different serum parameters in function of a binary target variable representing either steatotic liver (>5% proton density fat fraction) or healthy liver (<5%). After hyperparameter tuning, the top 6 relevant features for MASLD are used as the input for a Bayesian Network. Using 5 different structure learning criteria, the 5 best networks are selected, which best describe the relationships between the serum parameters and MASLD.
Results The best RFC (with an ROC AUC of 81% in the training set) identifies SHBG, ALT, triglycerides, C-reactive protein, HDL cholesterol and GGT as having the highest impact on MASLD. The five best networks show a range of 7 to 10 (causal) dependencies between the 7 variables, with 2/5 models showing triglycerides, SHBG and HDL cholesterol leading directly towards MASLD. The other biomarkers are either connected to MASLD or a result of MASLD.
Conclusion Bayesian Networks are explainable methods, determining relationships between variables. These may be contrasted with expert knowledge to provide a more complete picture surrounding disease progression. Ultimately, this network will be used to predict MASLD patients in the general population.
Publication History
Article published online:
23 January 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany