AbstractDNA sequencing is the translation of molecular structure into a human- and machine-readable format: a sequence, or string, of letters. The exponential growth of data (from DNA, RNA, and proteins) produced by biotechnology has resulted in two major scientific questions. First, what conclusions can we draw from all of the data that we have? Second, how can we do this in an efficient manner? The answers to these questions are where computer science and biological science meet: in the research field of bioinformatics. The obvious beauty of the aforementioned fields of study is their resemblance to stringology, the analysis of strings.
The research presented in this thesis lies within the intersection of computational molecular biology and stringology. Specifically, the aim was to design string-processing algorithms to analyse molecular sequences, in order to aid and enhance biological research. This thesis is an exploration of three important concepts in molecular biology: circular molecules, sequence motifs, and pan-genomes. In Chapter 2, we study the problem of accuracy when aligning two linear sequences obtained from circular molecular structures. Chapter 3 focuses on common, and thus biologically important, patterns found in molecular sequences. Lastly, in Chapter 4, we consider the complexities of handling pan-genomic data.
|Date of Award
|1 Dec 2019
|Costas Iliopoulos (Supervisor) & Solon Pissis (Supervisor)