Identification of cancer driver genes using supervised machine learning and systems biology

Student thesis: Doctoral ThesisDoctor of Philosophy


During the past few years many tumour sequencing projects have been focused on the characterisation of genes harbouring somatic alterations with cancer promoting role, which have been named cancer driver genes. Since these genes have been shown to be subject to positive selection during cancer progression, it has been assumed that their mutation is observed more frequently than expected. However, the full characterisation of cancer drivers is particularly challenging in cancer types, such as oesophageal adenocarcinoma (OAC), in which the genomic landscape is highly variable and recurrent events are not frequent. 
To identify rare or even patient-specific cancer driver genes, a novel algorithm, sysSVM, was developed. SysSVM is based on support vector machines, a supervised machine-learning framework, and utilises systems-level properties of human genes and sequencing data from individual tumours to predict genes that promote cancer development. Unlike other state-of-the-art algorithms for driver gene prediction, sysSVM takes into account all types of damaging alterations simultaneously (mutations, copy number alterations and structural rearrangements). After the development phase, sysSVM was applied to 261 OACs from the Oesophageal Cancer Clinical and Molecular Stratification (OCCAMS) consortium. A large number of novel cancer driver genes that, together with well-known drivers, help promote OAC was discovered. Validation of sysSVM using 107 additional OACs confirmed the robustness of the approach. Moreover, the large majority of the newly discovered cancer genes was rare or patient-specific. Despite this, it was shown that they converged towards perturbing the same cancer-related processes, including intracellular signalling and cell cycle regulation. Recurrence of process perturbation, rather than mutations in individual genes, divided OACs into six clusters that differ in their molecular and clinical features, suggesting patient stratifications for personalised treatments. Collaboration with bench researchers to experimentally mimic or reverting alterations of the predicted cancer driver genes, validated their contribution to cancer progression in OAC. 
The findings of this thesis accomplish three things. First, they describe the first attempt to develop an algorithm, which extends the discovery of somatically acquired perturbations contributing to cancer beyond those of recurrent driver genes. Second, they reveal a widespread somatic perturbation of biological processes in OAC, demonstrate OAC acquired dependencies and highlight potential therapeutic targets. Third, they provide insights into the potential use of the newly predicted cancer driver genes to stratify OACs and inform clinical practice.
Date of Award1 Oct 2019
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorFrancesca Ciccarelli (Supervisor) & Franca Fraternali (Supervisor)

Cite this