TY - JOUR

T1 - Extending alignments with k-mismatches and ℓ-gaps

AU - Barton, Carl

AU - Iliopoulos, Costas

AU - Lee, Inbok

AU - Mouchard, Laurent

AU - Park, Kunsoo

AU - Pissis, Solon

PY - 2014/3/13

Y1 - 2014/3/13

N2 - Recently, the problem of extending an alignment with k-mismatches and a single gap for pairwise sequence alignment was introduced (Flouri et al., 2011). The authors considered the problem of extending an alignment under the Hamming distance model by also allowing the insertion of a single gap; and presented a Θ(mβ)-time algorithm to solve it, where m is the length of the shortest sequence to be extended, and β is the maximum allowed length of the single gap. Very recently, it was shown (Flouri et al., 2012) that this problem is strongly and directly motivated by the next-generation re-sequencing application: aligning tens of millions of short DNA sequences against a reference genome. In this article, we consider an extension of this problem: extending an alignment with k-mismatches and two gaps ; and present a Θ(mβ)-time algorithm to solve it. This extension is proved to be fundamental in the next-generation re-sequencing application (Alachiotis et al., 2012). In addition, we present a generalisation of our solution to solve the problem of extending an alignment with k-mismatches and ℓ-gaps in time Θ(mβℓ). The presented solutions work provided that all gaps in the alignment must occur in one of the two sequences.

AB - Recently, the problem of extending an alignment with k-mismatches and a single gap for pairwise sequence alignment was introduced (Flouri et al., 2011). The authors considered the problem of extending an alignment under the Hamming distance model by also allowing the insertion of a single gap; and presented a Θ(mβ)-time algorithm to solve it, where m is the length of the shortest sequence to be extended, and β is the maximum allowed length of the single gap. Very recently, it was shown (Flouri et al., 2012) that this problem is strongly and directly motivated by the next-generation re-sequencing application: aligning tens of millions of short DNA sequences against a reference genome. In this article, we consider an extension of this problem: extending an alignment with k-mismatches and two gaps ; and present a Θ(mβ)-time algorithm to solve it. This extension is proved to be fundamental in the next-generation re-sequencing application (Alachiotis et al., 2012). In addition, we present a generalisation of our solution to solve the problem of extending an alignment with k-mismatches and ℓ-gaps in time Θ(mβℓ). The presented solutions work provided that all gaps in the alignment must occur in one of the two sequences.

U2 - 10.1016/j.tcs.2013.06.012

DO - 10.1016/j.tcs.2013.06.012

M3 - Article

VL - 525

JO - Journal of Theoretical Computer Science

JF - Journal of Theoretical Computer Science

ER -