King's College London

Research portal

Property Suffix Array with Applications in Indexing Weighted Sequences

Research output: Contribution to journalArticlepeer-review

Original languageEnglish
Article number3385898
JournalACM Journal of Experimental Algorithmics
Volume25
DOIs
PublishedNov 2020

King's Authors

Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454