TY - JOUR
T1 - Machine learning integration of multimodal data identifies key features of blood pressure regulation
AU - Louca, Bano
AU - Tran, Tran
AU - du Toit, Clea
AU - Christofidou, Paraskevi
AU - Spector, Tim
AU - Massimo, Mangino
AU - Suhre, Karsten
AU - Padmanabhan, Sandosh
AU - Menni, Cristina
N1 - Funding Information:
The Department of Twin Research receives support from grants from the Wellcome Trust ( 212904/Z/18/Z ) and the Medical Research Council (MRC)/British Heart Foundation (BHF) Ancestry and Biological Informative Markers for Stratification of Hypertension (AIM-HY; MR/M016560/1 ), European Union, Chronic Disease Research Foundation (CDRF), Zoe Global Ltd. , the NIHR Clinical Research Facility and Biomedical Research Centre (based at Guy's and St Thomas’ NHS Foundation Trust in partnership with King's College London). Qatar Biobank is supported by Qatar Foundation. C.M. is funded by the Chronic Disease Research Foundation and by the MRC AIM-HY project grant. P.L. is funded by the Chronic Disease Research Foundation; M.M. is funded by the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas’ NHS Foundation Trust in partnership with King's College London . P.C. is funded by the European Union (H2020 contract #733100 ). S.P. is funded by the Medical Research Council ( MR/M016560/1 ), the British Heart Foundation ( PG/12/85/29925 , CS/16/1/31878 , and RE/18/6/34217 ) and Chief Scientist Office, Scotland. SP and CdT acknowledge funding from Health Data Research UK (HDR-5012). K.S. is supported by the Biomedical Research Program at Weill Cornell Medicine in Qatar, a program funded by the Qatar Foundation also by Qatar National Research Fund (QNRF) grant NPRP11C-0115-180010 . The funding source had no input on the writing of the manuscript or the decision to submit for publication. We thank all the participants of TwinsUK and QBB for contributing their time and effort and supporting our research.
Publisher Copyright:
© 2022 The Authors
PY - 2022/10
Y1 - 2022/10
N2 - Background: Association studies have identified several biomarkers for blood pressure and hypertension, but a thorough understanding of their mutual dependencies is lacking. By integrating two different high-throughput datasets, biochemical and dietary data, we aim to understand the multifactorial contributors of blood pressure (BP). Methods: We included 4,863 participants from TwinsUK with concurrent BP, metabolomics, genomics, biochemical measures, and dietary data. We used 5-fold cross-validation with the machine learning XGBoost algorithm to identify features of importance in context of one another in TwinsUK (80% training, 20% test). The features tested in TwinsUK were then probed using the same algorithm in an independent dataset of 2,807 individuals from the Qatari Biobank (QBB). Findings: Our model explained 39·2% [4·5%, MAE:11·32 mmHg (95%CI, +/- 0·65)] of the variance in systolic BP (SBP) in TwinsUK. Of the top 50 features, the most influential non-demographic variables were dihomo-linolenate, cis-4-decenoyl carnitine, lactate, chloride, urate, and creatinine along with dietary intakes of total, trans and saturated fat. We also highlight the incremental value of each included dimension. Furthermore, we replicated our model in the QBB [SBP variance explained = 45·2% (13·39%)] cohort and 30 of the top 50 features overlapped between cohorts. Interpretation: We show that an integrated analysis of omics, biochemical and dietary data improves our understanding of their in-between relationships and expands the range of potential biomarkers for blood pressure. Our results point to potentially key biological pathways to be prioritised for mechanistic studies. Funding: Chronic Disease Research Foundation, Medical Research Council, Wellcome Trust, Qatar Foundation.
AB - Background: Association studies have identified several biomarkers for blood pressure and hypertension, but a thorough understanding of their mutual dependencies is lacking. By integrating two different high-throughput datasets, biochemical and dietary data, we aim to understand the multifactorial contributors of blood pressure (BP). Methods: We included 4,863 participants from TwinsUK with concurrent BP, metabolomics, genomics, biochemical measures, and dietary data. We used 5-fold cross-validation with the machine learning XGBoost algorithm to identify features of importance in context of one another in TwinsUK (80% training, 20% test). The features tested in TwinsUK were then probed using the same algorithm in an independent dataset of 2,807 individuals from the Qatari Biobank (QBB). Findings: Our model explained 39·2% [4·5%, MAE:11·32 mmHg (95%CI, +/- 0·65)] of the variance in systolic BP (SBP) in TwinsUK. Of the top 50 features, the most influential non-demographic variables were dihomo-linolenate, cis-4-decenoyl carnitine, lactate, chloride, urate, and creatinine along with dietary intakes of total, trans and saturated fat. We also highlight the incremental value of each included dimension. Furthermore, we replicated our model in the QBB [SBP variance explained = 45·2% (13·39%)] cohort and 30 of the top 50 features overlapped between cohorts. Interpretation: We show that an integrated analysis of omics, biochemical and dietary data improves our understanding of their in-between relationships and expands the range of potential biomarkers for blood pressure. Our results point to potentially key biological pathways to be prioritised for mechanistic studies. Funding: Chronic Disease Research Foundation, Medical Research Council, Wellcome Trust, Qatar Foundation.
UR - http://www.scopus.com/inward/record.url?scp=85137300426&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.ebiom.2022.104243
DO - https://doi.org/10.1016/j.ebiom.2022.104243
M3 - Article
SN - 2352-3964
VL - 84
SP - 104243
JO - EBioMedicine
JF - EBioMedicine
M1 - 104243
ER -