The genomic revolution has brought about large advances in the identification of disease-associated variants. However, despite the recent explosion of genetic data, the problem of missing heritability persists. Variants with low penetrance remain difficult to identify, as do variants which are rare or unique to a single individual. To fully understand disease mechanisms and design targeted therapies, the molecular mechanisms underlying the pathogenic effects of such variants must be clarified. A prime example of missense variants which are difficult to classify is provided by those which localise the Titin gene, a number of which are associated with titinopathies. Due to titin’s large size, even the majority of healthy individuals possess one or more rare titin missense variants. This results in the paradox that rare titin variants are commonly found; therefore, pathogenicity cannot be inferred from frequency alone. To address this issue we have created a web application, TITINdb (http://fraternalilab.kcl.ac.uk/TITINdb/), which integrates structural, variant, sequence and isoform information along with precomputed in-silico analyses, in order to facilitate the prioritisation of variants for further wet-lab investigation. Recently available databases allow access to missense variant data on an unprecedented scale. We sought to harness this information to better understand the characteristics of variants associated with health and disease, through a large scale-analysis of population variants from the gnomAD database, as well as disease-associated variants (ClinVar) and somatic cancer-associated variants (COSMIC). Here we established that variants from each data set target distinct functional pathways and proteomics features. In order to accomplish this analysis, we created a database, web interface and REST API, ZoomVar (http://fraternalilab.kcl.ac.uk/ZoomVar/), to allow for the mapping of variants to a 3D integrated protein-protein interaction network and calculation of the regional enrichment of missense variants. Despite the multitude of features which are able to segregate deleterious from neutral missense variants, a number of problem cases remain. This motivated us to investigate whether features extracted from molecular dynamics simulations could improve predictions of variant deleteriousness. To accomplish this we constructed a dataset of rare population and deleterious titin variants, and created machine-learning (random forest) based models of variant impact. We show that dynamicsbased features are able to segregate the majority of disease-associated titin variants from population variants. Ultimately, we believe a collaborative framework for the sharing of mutant and wildtype trajectories must be set up; both to enable investigation into the possible benefits of using dynamics-based features, and to harness their power.
Anatomy of missense variants in health and disease: towards better impact prediction with a focus on titinopathies
Laddach, A. C. (Author). 1 May 2019
Student thesis: Doctoral Thesis › Doctor of Philosophy