Exploiting the variability of DNA methylation patterns over time for the estimation of the chronological age from biological traces in forensic casework

Student thesis: Doctoral ThesisDoctor of Philosophy


A key aspect of forensic science research is the inference of information regarding a person’s visible appearance, geographical origin and age using biological stains recovered from crime scenes. This information, commonly referred to as ‘DNA intelligence’, can provide law enforcement organisations with leads for investigations, taking on the role of a ‘biological witness’. Following the successful implementation of DNA-based methods for the inference of ancestry and phenotype (e.g. eye, hair and skin colour) in forensic investigations, the focus of DNA intelligence research has recently shifted towards the accurate prediction of chronological age. Whilst multiple biomarkers, including protein and nucleic acid-based candidates, have been trialled for use in age estimation, recent studies have focused on the correlation between chronological age and methylation status at certain cytosine residues present in the human genome. In concert with attempts across the world to develop DNA methylation-based age estimation models for forensic applications, this work addressed previously unanswered questions regarding both statistical modelling and DNA input amounts, along with the applicability of massively parallel sequencing (MPS). Utilising 12 of the ‘epigenetic clock’ CpGs, a DNA methylation quantification assay based on MPS was successfully developed and used for the analysis of 110 whole blood samples from individuals aged 11–93 years. This data was subsequently used for training and testing 17 statistical modelling algorithms, of which the best-performing model, a support vector machine algorithm with polynomial function (SVMp), was able to estimate the donor age with a mean absolute error (MAE) of 4.1 years (RMSE = 4.9 years). The method successfully retained its accuracy down to 10 ng of initial DNA input (∼2 ng PCR input), establishing a new lowest input limit for this type of analysis. Finally, the developed assay was successfully transferred to 13 international forensic laboratories using various MPS technologies, following minor adjustments. Despite being designed and tested on blood, this model was also able to predict the age of donors for 34 saliva samples with an average error of ±7.3 years following basic normalisation. Additionally, an SVMp model based on the same markers for a set of 132 DNA extracts from buccal swabs (donor age 11-29 years) was able to successfully predict those aged 18 or over with a likelihood ratio of 11. Faced with a 50:50 uncertainty, this translates into a 92% certainty, suggesting high potential for the accurate discrimination between buccal swabs originating from minors and adults. However, high variability in prediction accuracy was observed in sperm (RMSE=19.6 years), with only 1 of the markers showing strong correlation with age in this tissue. Subsequently, new marker panels were selected and validated for the tissues of saliva and sperm using MPS-based assays, highlighting 6 markers in saliva and 4 in sperm with high potential for age estimation in these tissues, while the saliva panel also showed high correlation with age in buccal data. Following the optimisation of the methodological factors, to further increase sensitivity and accuracy, a comprehensive search was carried out to identify the best age-associated CpG markers. Statistical evaluation of markers from 51 studies using microarray data from over 4,000 individuals, followed by validation using in-house generated MPS data, revealed a final set of 11 markers with the highest potential. The new model surpassed the previous 12-marker model in terms of both accuracy and sensitivity, reducing the MAE to 3.3 years and the DNA starting input limit to 5 ng (~1 ng PCR input). The accuracy of the model was retained (MAE=3.8 years) in a separate set of 88 samples of Spanish origin, while predictions for donors under the age of 54 years displayed even higher accuracy (MAE=2.6 years). Finally, comparison of the methylation values obtained for 84 of these 88 samples revealed high overlap between the MiSeq, EpiTYPER and Pyrosequencing platforms. Lastly, potential variation related to sex, as well as certain diseases and conditions was investigated for the proposed marker sets. No significant differences in methylation values were observed in terms of sex for any of the markers and no sex-related bias was seen in the age estimates observed using the 11-marker blood-based model. Additionally, no variation was observed between control and disease-associated populations for schizophrenia, rheumatoid arthritis, frontal temporal dementia and progressive supranuclear palsy in microarray data relating to the 11 blood markers, but potential associations with metabolic and cardiovascular conditions, as well as obesity and smoking, were identified for the relevant genes. Similar analysis revealed potential associations with Alzheimer’s and Huntington’s disease for the saliva and sperm age marker sets. Overall, this thesis describes the development of an accurate and sensitive method for determining chronological age of donor from various tissues, while the associated validation studies showcase its high potential for application in forensics.
Date of Award1 Mar 2021
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorDenise Syndercombe-Court (Supervisor) & David Ballard (Supervisor)

Cite this