Generalised Implementation for Fixed-Length Approximate String Matching under Hamming Distance and Applications

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

4 Citations (Scopus)

Abstract

Approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from a pattern x of length m ≤ n. Fixed-length approximate string matching is the problem of finding all factors of a text t of length n with a distance at most k from any factor of length h of a pattern x of length m ≤ n, where h ≤ m. It is thus a generalisation of approximate string matching and has numerous direct applications in molecular biology-motif extraction and circular sequence alignment, to name a few. MaxShiftM is a bit-parallel algorithm for fixed-length approximate string matching under the Hamming distance model with time complexity O(m⌈h/w⌉n), where w is the size of the computer word (Crochemore et al., 2010). An implementation of this algorithm is straightforward as long as the maximal length of alignments is less than or equal to w. In this article, our contribution is twofold: first, we propose a generalised implementation of MaxShiftM, that is, for any given h ≤ m, under the Hamming distance model; and second, we show how our implementation can be used to improve the accuracy and efficiency of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies.
Original languageUndefined/Unknown
Title of host publication2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS 2015, Hyderabad, India, May 25-29, 2015
PublisherIEEE
Pages367-374
Number of pages8
DOIs
Publication statusPublished - 2015

Cite this