Approximate Suffix-Prefix Dictionary Queries

Wiktor Zuba*, Grigorios Loukides*, Solon P. Pissis*, Sharma V. Thankachan*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

In the all-pairs suffix-prefix (APSP) problem [Gusfield et al., Inf. Process. Lett. 1992], we are given a dictionary R of r strings, S1, . . ., Sr, of total length n, and we are asked to find the length SPLi,j of the longest string that is both a suffix of Si and a prefix of Sj, for all i, j ∈ [1 . . r]. APSP is a classic problem in string algorithms with applications in bioinformatics, especially in sequence assembly. Since r = |R| is typically very large in real-world applications, considering all r2 pairs of strings explicitly is prohibitive. This is when the data structure variant of APSP makes sense; in the same spirit as distance oracles computing shortest paths between any two vertices given online. We show how to quickly locate k-approximate matches (under the Hamming or the edit distance) in R using a version of the k-errata tree [Cole et al., STOC 2004] that we introduce. Let SPLki,j be the length of the longest suffix of Si that is at distance at most k from a prefix of Sj. In particular, for any k = O(1), we show an O(nlogk n)-sized data structure to support the following queries: One-to-Onek(i, j): output SPLki,j in O(logk nlog log n) time. Reportk(i, d): output all j ∈ [1 . . r], such that SPLki,j ≥ d, in O(logk n(log n/log log n + output)) time, where output denotes the size of the output. In fact, our algorithms work for any value of k not just for k = O(1), but the formulas bounding the complexities get much more complicated for larger values of k.

Original languageEnglish
Title of host publication49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024
EditorsRastislav Kralovic, Antonin Kucera
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959773355
DOIs
Publication statusE-pub ahead of print - 23 Aug 2024
Event49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024 - Bratislava, Slovakia
Duration: 26 Aug 202430 Aug 2024

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume306
ISSN (Print)1868-8969

Conference

Conference49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024
Country/TerritorySlovakia
CityBratislava
Period26/08/202430/08/2024

Keywords

  • all-pairs suffix-prefix
  • k-errata tree
  • suffix tree
  • suffix-prefix queries

Fingerprint

Dive into the research topics of 'Approximate Suffix-Prefix Dictionary Queries'. Together they form a unique fingerprint.

Cite this