Student thesis: Doctoral ThesisDoctor of Philosophy


Robust evidence for the polygenicity and genetic correlations of complex traits across the phenome suggests both the necessity of polygenic instruments and the value of multi-trait prediction models. This thesis used multi-variable approaches in four papers and along two main threads:
Multi-variable approaches to trait prediction A primary goal of polygenic scores, which aggregate effects of trait-associated variants discovered in genome-wide association studies (GWAS), is to estimate individual-specific genetic propensities to predict trait variation. This is typically achieved using one polygenic score predicting one outcome. Extending this to a multi-variable approach, a ‘phenome-wide analysis of genome-wide polygenic scores’ mapped associations between 13 polygenic scores created from GWAS for psychiatric disorders and cognitive traits and 50 behavioural traits.
Extending the multi-variable approach further, a multi-polygenic score approach was employed to increase prediction by exploiting the joint power of multiple discovery GWAS in the same model. A regularised regression model combining summary statistics of 81 trait GWAS improved out-of-sample prediction of three child outcomes over the best single-predictor model.
Multi-variable approaches to gene-environment correlation Although geneenvironment correlation is widely investigated by family studies and recently by SNP-heritability studies, the possibility that genetic effects on traits capture environmental risk factors or protective factors has been neglected by polygenic prediction models. First, a study using genome-wide SNP-heritability estimation and polygenic score analysis provided the first molecular evidence for substantial genetic influence on differences in children’s educational achievement and its association with family socio-economic status.
Second, covariation between offspring trait-associated polygenic variation and a wide range of parent-mediated environmental exposures was estimated. For this, a mixed linear model estimated the effects of multiple polygenic scores on each environmental exposure while controlling for overall relatedness by fitting the effects of all SNPs as random effects. Findings illustrate the relevance of gene-environment correlation for polygenic prediction models.
Taken together, the analyses illustrate the value of multi-variable approaches to complex trait prediction, as well as their current limitations and future potential.
Date of Award2017
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorRobert Plomin (Supervisor), Thalia Eley (Supervisor) & Paul O'Reilly (Supervisor)

Cite this