All-pairs suffix/prefix in optimal time using Aho-Corasick space

Grigorios Loukidis, Solon P. Pissis

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
43 Downloads (Pure)

Abstract

The all-pairs suffix/prefix (APSP) problem is a classic problem in computer science with many applications in bioinformatics. Given a set $\{S_1,\ldots,S_k\}$ of $k$ strings of total length $n$, we are asked to find, for each string $S_i$, $i\in[1,k]$, its longest suffix that is a prefix of string $S_j$, for all $j\neq i$, $j\in[1,k]$. Several algorithms running in the optimal $\cO(n+k^2)$ time for solving APSP are known. All of these algorithms are based on suffix sorting and thus require space $\Omega(n)$ in any case. We consider the parameterized version of the APSP problem, denoted by $\ell$-APSP, in which we are asked to output only the pairs whose suffix/prefix overlap is of length at least $\ell$. We give an algorithm for solving $\ell$-APSP that runs in the optimal $\cO(n+|\LOUTPUT|)$ time using $\cO(n)$ space, where $\LOUTPUT$ is the set of output pairs. Our algorithm is thus optimal for the APSP problem as well by setting $\ell=0$. Notably, our algorithm is fundamentally different from all optimal algorithms solving the APSP problem: it does not rely on sorting the suffixes of all input strings but on a novel traversal of the Aho-Corasick machine, and it thus requires space linear in the size of the machine.
Original languageEnglish
Article number106275
JournalINFORMATION PROCESSING LETTERS
Volume178
Early online date28 Apr 2022
DOIs
Publication statusPublished - Nov 2022

Fingerprint

Dive into the research topics of 'All-pairs suffix/prefix in optimal time using Aho-Corasick space'. Together they form a unique fingerprint.

Cite this