TY - JOUR
T1 - Off-line and on-line algorithms for closed string factorization
AU - Alzamel, Mai
AU - Iliopoulos, Costas S.
AU - Smyth, W.F.
AU - Sung, Wing-Kin
PY - 2018/10/30
Y1 - 2018/10/30
N2 - A string X=X[1..n], n>1, is said to be closed if it has a nonempty proper prefix that is also a suffix, but that otherwise occurs nowhere else in X; for n=1, every X is closed. Closed strings were introduced by Fici in [1] as objects of combinatorial interest. Recently Badkobeh et al.[2] described a variety of algorithms to factor a given string into closed factors. In particular, they studied the Longest Closed Factorization (LCF) problem, which greedily computes the decomposition X=X1X2⋯Xk, where X1 is the longest closed prefix of X, X2 the longest closed prefix of X with prefix X1 removed, and so on. In this paper we present an O(logn) amortized per character algorithm to compute LCF on-line, where n is the length of the string. We also introduce the Minimum Closed Factorization (MCF) problem, which identifies the minimum number of closed factors that cover X. We first describe an off-line O(nlog2n)-time algorithm to compute MCF(X), then we present an on-line algorithm for the same problem. In fact, we show that MCF(X) can be computed in O(Llogn) time from MCF(X′), where X′=X[1..n−1], and L is the largest integer such that the suffix X[n−L+1..n] is a substring of X′.
AB - A string X=X[1..n], n>1, is said to be closed if it has a nonempty proper prefix that is also a suffix, but that otherwise occurs nowhere else in X; for n=1, every X is closed. Closed strings were introduced by Fici in [1] as objects of combinatorial interest. Recently Badkobeh et al.[2] described a variety of algorithms to factor a given string into closed factors. In particular, they studied the Longest Closed Factorization (LCF) problem, which greedily computes the decomposition X=X1X2⋯Xk, where X1 is the longest closed prefix of X, X2 the longest closed prefix of X with prefix X1 removed, and so on. In this paper we present an O(logn) amortized per character algorithm to compute LCF on-line, where n is the length of the string. We also introduce the Minimum Closed Factorization (MCF) problem, which identifies the minimum number of closed factors that cover X. We first describe an off-line O(nlog2n)-time algorithm to compute MCF(X), then we present an on-line algorithm for the same problem. In fact, we show that MCF(X) can be computed in O(Llogn) time from MCF(X′), where X′=X[1..n−1], and L is the largest integer such that the suffix X[n−L+1..n] is a substring of X′.
KW - Closed string
KW - On-line
KW - Minimum factorization
U2 - 10.1016/j.tcs.2018.10.033
DO - 10.1016/j.tcs.2018.10.033
M3 - Article
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
ER -
TY - CHAP
T1 - How Much Different Are Two Words with Different Shortest Periods
AU - Alzamel, Mai
AU - Crochemore, Maxime
AU - Iliopoulos, Costas S.
AU - Kociumaka, Tomasz
AU - Kundu, Ritu
AU - Radoszewski, Jakub
AU - Rytter, Wojciech
AU - Wale, Tomasz
PY - 2018
Y1 - 2018
N2 - Sometimes the difference between two distinct words of the same length cannot be smaller than a certain minimal amount. In particular if two distinct words of the same length are both periodic or quasiperiodic, then their Hamming distance is at least 2. We study here how the minimum Hamming distance dist (x,y)dist(x,y)between two words x, y of the same length n depends on their periods. Similar problems were considered in [1] in the context of quasiperiodicities. We say that a period p of a word x is primitive if x does not have any smaller period p'ptextasciiacutexwhich divides p. For integers p, n (pbackslashle np≤n) we define backslashmathcal Pp(n)Pp(n)as the set of words of length n with primitive period p. We show several results related to the following functions introduced in this paper for pbackslashne qp≠qand n backslashge backslashmax (p,q)n≥max(p,q). backslashbeginaligned backslashmathcal Dp,q(n) = backslashmin backslash,backslashbackslash, dist (x,y)backslash,:backslash; xbackslashin backslashmathcal Pp(n), backslash,ybackslashin backslashmathcal Pq(n)backslash,backslash, backslashbackslash Np,q(h) = backslashmax backslash,backslashbackslash, n backslash,:backslash; backslashmathcal Dp,q(n)backslashle hbackslash,backslash. backslashqquad backslashqquad backslashendalignedDp,q(n)=mindist(x,y):x∈Pp(n),y∈Pq(n),Np,q(h)=maxn:Dp,q(n)≤h.
AB - Sometimes the difference between two distinct words of the same length cannot be smaller than a certain minimal amount. In particular if two distinct words of the same length are both periodic or quasiperiodic, then their Hamming distance is at least 2. We study here how the minimum Hamming distance dist (x,y)dist(x,y)between two words x, y of the same length n depends on their periods. Similar problems were considered in [1] in the context of quasiperiodicities. We say that a period p of a word x is primitive if x does not have any smaller period p'ptextasciiacutexwhich divides p. For integers p, n (pbackslashle np≤n) we define backslashmathcal Pp(n)Pp(n)as the set of words of length n with primitive period p. We show several results related to the following functions introduced in this paper for pbackslashne qp≠qand n backslashge backslashmax (p,q)n≥max(p,q). backslashbeginaligned backslashmathcal Dp,q(n) = backslashmin backslash,backslashbackslash, dist (x,y)backslash,:backslash; xbackslashin backslashmathcal Pp(n), backslash,ybackslashin backslashmathcal Pq(n)backslash,backslash, backslashbackslash Np,q(h) = backslashmax backslash,backslashbackslash, n backslash,:backslash; backslashmathcal Dp,q(n)backslashle hbackslash,backslash. backslashqquad backslashqquad backslashendalignedDp,q(n)=mindist(x,y):x∈Pp(n),y∈Pq(n),Np,q(h)=maxn:Dp,q(n)≤h.
U2 - 10.1007/978-3-319-92016-0_16
DO - 10.1007/978-3-319-92016-0_16
M3 - Chapter
SN - 9783319920160
SP - 168
EP - 178
BT - Artificial Intelligence Applications and Innovations
A2 - Iliadis, Lazaros
A2 - Maglogiannis, Ilias
A2 - Plagianakos, Vassilis
PB - Springer International Publishing
CY - Cham
Y2 - 25 May 2018 through 27 May 2018
ER -