Longest common prefixes with k-errors and applications

Lorraine A.K. Ayad, Carl Barton, Panagiotis Charalampopoulos*, Costas S. Iliopoulos, Solon P. Pissis

*Corresponding author for this work

Research output: Contribution to journalConference paperpeer-review

11 Citations (Scopus)

Abstract

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length n that occurs elsewhere in the string with k-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant k and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in (Formula presented) time on average, where c>1 is a constant, using O(n) space. Finally, we show that our technique is applicable to several algorithmic problems found in computational biology and elsewhere. The importance of our technique lies on the fact that it is the first one achieving this bound for non-constant k and using O(n) space.

Keywords

  • k-errors
  • k-mismatches
  • Longest common factor
  • Longest common prefix
  • Longest common substring

Fingerprint

Dive into the research topics of 'Longest common prefixes with k-errors and applications'. Together they form a unique fingerprint.

Cite this