Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations required to transform one sequence into the other.
The motivation herein lies specifically within the development of algorithms and their implementation for sequence comparison, avoiding computing alignments under the edit distance measure, where possible. One of the benefits of this work is that it is not necessarily limited to computational biology, but also has applications in image recognition.
The following algorithms were designed to solve and optimise previously presented work found within the literature. This thesis provides an in depth analysis of three algorithms that have been designed, implemented and tested.
The first, hCED is a heuristic solution for computing the cyclic edit distance between two strings using a new distance measure, namely the β-blockwise q-gram distance. The second algorithm, MARS builds on this work, by computing accurate rotations for a given set of circular sequences and outputs the rotated sequences, which can then later be used to compute a multiple sequence alignment. The nal algorithm, CNEFinder looks into the analysis of conserved non-coding elements, regions of a genetic sequence found to be evolutionarily conserved across multiple organisms, without needing to compute whole genome alignments.
Original language | English |
---|
Awarding Institution | |
---|
Supervisors/Advisors | |
---|
Award date | 1 Jun 2019 |
---|