Analysing RNA-seq datasets to determine how pre-mRNA splicing is regulated by RNA binding proteins and cis-acting elements

Student thesis: Doctoral ThesisDoctor of Philosophy


Most human genes undergo alternative pre-mRNA splicing to produce multiple transcript isoforms which often code for functionally distinct and tissue-specific products. Splicing factors interact with pre-mRNA and the core splicing machinery to control alternative splicing and regulate the generation of tissue-specific transcriptomes. For example, alternative splicing contributes to the function and homeostasis of the adaptive immune system. CD4+ T cells are an integral component of the adaptive immune system and regulate effector functions of immune cells towards diverse pathogens. Several splicing factors that contribute to CD4+ T cell function have been identified. However, the full splicing regulatory programmes characterising these and other immune cell types remain to be elucidated. Further, CD4+ T cells are the primary host target cell of HIV-1 infection. The HIV-1 lifecycle is regulated in large part through the host gene expression pathway. For instance, the HIV-1 RNA undergoes extensive alternative splicing mediated via the host splicing machinery. The study of processes such as these would benefit from development of improved methods for the inference of alternative splicing networks.

In this thesis, I have analysed RNA-seq datasets to understand how alternative splicing is regulated through the actions of RNA binding proteins and cis-acting RNA elements. Motif Activity Response Analysis (MARA) is an approach developed for the inference of tissuespecific regulatory transcription factors. I propose that MARA may also be effectively employed for the inference of regulatory splicing factors. To this end, I applied MARA for the novel use case of analysing splicing factors. I compared this Splicing-MARA (S-MARA) to a commonly used motif enrichment approach for predicting which splicing factors regulate a given splicing programme. For this purpose, I used a large-scale splicing factor knockdown data resource produced through the ENCODE project, in addition to a published CD4+ T cell activation timecourse. Despite its previous use, splicing factor motif enrichment analysis has not undergone a formal assessment. We found that this method has utility in identifying regulatory splicing factors, providing proof-of-concept for the use of motif-based methods in prediction of regulatory splicing factors. Counter to expectations, S-MARA had poorer performance in identifying regulatory splicing factors as compared to the motif enrichment method. As such, potential improvements to S-MARA are considered as future avenues for investigation.

Further, the RNA binding protein Sam68 was investigated using a knockdown approach to infer its genome-wide splicing targets during the CD4+ T cell activation process. This revealed a widespread role for Sam68 in regulating mRNA abundance, whilst only a limited number of genes showed Sam68-dependent alternative splicing. Finally, the regulation of the HIV-1 lifecycle by host RNA-binding proteins was investigated. We showed that suppression of CpG dinucleotides in the HIV-1 genome appears to maintain correct splicing of viral transcripts; whilst introduction of CpGs promotes use of a cryptic splice site which disrupts splicing, potentially mediated through the actions of host splicing factors.
Date of Award1 Jan 2021
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorReiner Schulz (Supervisor) & Chad Swanson (Supervisor)

Cite this