Novel computational methods using coded patient phenotypes to enhance disease gene identification

Student thesis: Doctoral ThesisDoctor of Philosophy


With the sequencing of the genomes of individuals with rare Mendelian disease becoming routine, there is an emerging challenge in identifying and quantifying similarity between individual’s phenotypes to assist in the identification of commonalities in the genetic variation contributing to disease. Whilst it is relatively easy to assess genetic similarities between individuals, it is less trivial to assess phenotypic similarity due to the complexity of phenotypic information. One route to systematically estimate similarity between phenotypes utilises computational approaches applied to standardised machine-readable phenotypic descriptors, such as those in the Human Phenotype Ontology (HPO) or structured patient questionnaires. This thesis describes advances in the representation of clinical phenotypes in machine-readable controlled vocabulary within the context of genetic studies of both the diagnosis of monogenic disease patients, and common variant association analysis of severe acne subtypes. When using genome sequencing for the genetic diagnosis of individuals with rare Mendelian diseases, a virtual gene panel approach is often taken wherein only a curated list of genes suspected to cause a phenotype are considered. With the number of known monogenic disease-gene pairs exceeding 5,000, manual curation of personalised gene panels based on the entire human phenotypic spectrum is challenging. Methods have previously been developed that formalise the approach using the patient phenotype to generate candidate genes, requiring both patients and known disorders to be defined in standardised machine-readable terms. Work in this project has investigated the ways by which established phenotypic descriptions (OMIM free-text) can be further leveraged using simple quantification of disease terms to gain a more nuanced description of known phenotypes with HPO terms, and how this helps to more efficiently generate candidate gene panels in real patient datasets. This project also examines the utility of extensive patient questionnaire records in patients with severe acne, enabling the identification of questionnaire response stratified subtypes of acne for use in downstream investigations seeking to identify new genetic determinants of the disease.
Date of Award1 Jul 2019
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorMichael Simpson (Supervisor) & Christopher Mathew (Supervisor)

Cite this