Improved algorithmic efficiency within pattern analysis and text mining of DNA sequences

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

The study of data strings and its application to the processing, analysis and indexing of DNA data is an increasingly valuable area of research. This particular piece of research looks into multiple structures within data strings, and explores a related problem with the aim of reducing the time and space complexity required to obtain a solution. These problems often take the form of searching for a specific data structure, indexing a list of its occurrences or identifying the maximal cases.

Each chapter may be read as dealing with an individual algorithmic problem, though some overlap does occur within the presented problems. The focus of this research is in the algorithmic solution, and for each presented problem this is often complemented with an implementation that may be tested on real data, with the results judged by their performance.

The specific string structures that are topics of research within this thesis include: circular strings, abelian palindromes, maximal palindromes, inverted repeats, closed strings and previous factors.

Though the presented work is applicable to data strings in general, it is often the case that DNA processing provides the most direct application for such algorithmic solutions, owing to the simplicity of the alphabet and the huge scale of DNA data presently available, which lends itself well to more efficient processing methods
Date of Award1 Oct 2020
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorCostas Iliopoulos (Supervisor) & Solon Pissis (Supervisor)

Cite this

'