Inferring ancestral and biogeographic origin using genome-wide SNP data

Student thesis: Doctoral ThesisDoctor of Philosophy


Statistically predicting the origin of a human DNA sample has proven effective in
forensic casework, and is of broad population genetic interest, but many facets of the problem remain unexplored. This thesis consolidates several pieces of work on the genetic prediction of origin; furthering its theory and practice.
I begin by examining the performance of straightforward predictive methods when the objective is to correctly determine an individual's origin from one of several closely related genetic groups, e.g. countries within Europe. Of particular interest are the volume of data required to make useful predictions, and the negative impact of combining data from independently collected convenience samples. Depending on these factors, I show that it is possible to predict origin from either Great Britain or Ireland with good accuracy. The same approach was applied to a unique dataset, provided by Dr. Jim Wilson and colleagues, in which individuals were ascertained based on their self-reported village of origin from sets of neighbouring villages. Highly accurate predictions for
village of origin were attained, demonstrating the detailed geographic resolution at which this branch of statistical methodology may succeed.
Origin has innumerable aspects, many of which may be tractable to genetic prediction. Two in particular are the topics of consideration throughout the remainder of the thesis. First, I develop models for predicting separate ancestral components in each parent of a genotyped 'target' individual (Crouch and Weale, 2012), providing a more detailed profile than models of lone personal ancestry. Accuracy is high when genome-wide data are available and the modelled populations are relatively genetically dissimilar e.g. West Africa, Europe and East Asia. Second, I develop a method for predicting geographic coordinates for target individuals, constituting an estimate of their biogeographic origin in continuous space, and compare performance within Europe against existing approaches. While one alternative (Hoggart et al., 2012) displayed greater accuracy, the
merits of each method are discussed in full.
Date of Award2013
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorMike Weale (Supervisor) & Cathryn Lewis (Supervisor)

Cite this