Abstract
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length n that occurs elsewhere in the string with k-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant k and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in (Formula presented) time on average, where c>1 is a constant, using O(n) space. Finally, we show that our technique is applicable to several algorithmic problems found in computational biology and elsewhere. The importance of our technique lies on the fact that it is the first one achieving this bound for non-constant k and using O(n) space.
Original language | English |
---|---|
Pages (from-to) | 27-41 |
Number of pages | 15 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
DOIs | |
Publication status | Published - 1 Jan 2018 |
Event | 25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 - Lima, Peru Duration: 9 Oct 2018 → 11 Oct 2018 |
Keywords
- k-errors
- k-mismatches
- Longest common factor
- Longest common prefix
- Longest common substring