TY - CHAP

T1 - A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches

AU - Crochemore, Maxime

AU - Iliopoulos, Costas S.

AU - Pissis, Solon

PY - 2010

Y1 - 2010

N2 - This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with k-mismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in computational biology. We consider a generalisation of this problem, the fixed-length approximate string-matching with k-mismatches problem: given a text t, a pattern x and an integer E, search for all the occurrences in t of all factors of x of length l with k or fewer mismatches with a factor of t. We present a practical parallel algorithm of comparable simplicity that requires only O(nm inverted right perpendicularl/winverted left perpendicular/p)) time, where w is the word size of the machine (e.g. 32 or 64 in practice) and p the number of processors. Thus the algorithm's performance is independent of k and the alphabet size vertical bar Sigma vertical bar. The proposed parallel algorithm makes use of message-passing parallelism model, and word-level parallelism for efficient approximate string-matching.

AB - This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with k-mismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in computational biology. We consider a generalisation of this problem, the fixed-length approximate string-matching with k-mismatches problem: given a text t, a pattern x and an integer E, search for all the occurrences in t of all factors of x of length l with k or fewer mismatches with a factor of t. We present a practical parallel algorithm of comparable simplicity that requires only O(nm inverted right perpendicularl/winverted left perpendicular/p)) time, where w is the word size of the machine (e.g. 32 or 64 in practice) and p the number of processors. Thus the algorithm's performance is independent of k and the alphabet size vertical bar Sigma vertical bar. The proposed parallel algorithm makes use of message-passing parallelism model, and word-level parallelism for efficient approximate string-matching.

UR - http://www.scopus.com/inward/record.url?scp=78650438144&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12476-1_6

DO - 10.1007/978-3-642-12476-1_6

M3 - Conference paper

SN - 9783642124754

VL - N/A

T3 - Lecture Notes in Computer Science

SP - 92

EP - 101

BT - Algorithms and Applications

A2 - Elomaa, Tapio

A2 - Mannila, Heikki

A2 - Orponen, Pekka

PB - Springer

CY - Berlin

T2 - Workshop on Algorithms and Applications Dedicated to Esko Ukkonen for his 60th Birthday

Y2 - 28 May 2010

ER -