TY - JOUR
T1 - Faster algorithms for 1-mappability of a sequence
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
AU - Radoszewski, Jakub
AU - Sung, Wing-Kin
PY - 2019/5/23
Y1 - 2019/5/23
N2 - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(log
σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klog
σn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.
AB - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(log
σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klog
σn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.
KW - Algorithms on strings
KW - Hamming distance
KW - Sequence mappability
UR - http://www.scopus.com/inward/record.url?scp=85067179715&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2019.04.026
DO - 10.1016/j.tcs.2019.04.026
M3 - Article
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
ER -
TY - CHAP
T1 - Degenerate String Comparison and Applications
AU - Alzamel, Mai
AU - Ayad, Lorraine A. K.
AU - Bernardini, Giulia
AU - Grossi, Roberto
AU - Iliopoulos, Costas S.
AU - Pisanti, Nadia
AU - Pissis, Solon P.
AU - Rosone, Giovanna
PY - 2018
Y1 - 2018
U2 - 10.4230/LIPIcs.WABI.2018.21
DO - 10.4230/LIPIcs.WABI.2018.21
M3 - Conference paper
SN - 9783959770828
VL - 113
T3 - Leibniz International Proceedings in Informatics (LIPIcs)
SP - 21:1-21:14
BT - 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
A2 - Parida, Laxmi
A2 - Ukkonen, Esko
PB - Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik
CY - Dagstuhl, Germany
ER -
TY - CHAP
T1 - How to Answer a Small Batch of RMQs or LCA Queries in Practice
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
PY - 2018
Y1 - 2018
U2 - 10.1007/978-3-319-78825-8_28
DO - 10.1007/978-3-319-78825-8_28
M3 - Conference paper
VL - 10765
T3 - Lecture Notes in Computer Science
SP - 343
EP - 355
BT - Combinatorial Algorithms
A2 - Brankovic, Ljiljana
A2 - Ryan, Joe
A2 - Smyth, William F.
PB - Springer International Publishing
CY - Cham
ER -
TY - CHAP
T1 - Faster Algorithms for 1-Mappability of a Sequence
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas
AU - Pissis, Solon
AU - Radoszewski, Jakub
AU - Sung, Wing Kin
PY - 2017/11/16
Y1 - 2017/11/16
N2 - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k= 1. The fastest known algorithm for k= 1 requires time O(mnlog n/ log log n) and space O(n). We present two new algorithms that require worst-case time O(mn) and O(nlog nlog log n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = Ω(log σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = Ω(k logσn).
AB - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k= 1. The fastest known algorithm for k= 1 requires time O(mnlog n/ log log n) and space O(n). We present two new algorithms that require worst-case time O(mn) and O(nlog nlog log n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = Ω(log σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = Ω(k logσn).
UR - http://www.scopus.com/inward/record.url?scp=85038216754&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-71147-8_8
DO - 10.1007/978-3-319-71147-8_8
M3 - Other chapter contribution
AN - SCOPUS:85038216754
SN - 9783319711461
VL - 10628 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 109
EP - 121
BT - Combinatorial Optimization and Applications - 11th International Conference, COCOA 2017, Proceedings
PB - Springer Verlag
ER -
TY - CHAP
T1 - Efficient Computation of Palindromes in Sequences with Uncertainties
AU - Alzamel, Mai
AU - Gao, Jia
AU - Iliopoulos, Costas S.
AU - Liu, Chang
AU - Pissis, Solon P.
PY - 2017/8/2
Y1 - 2017/8/2
N2 - In this work, we consider a special type of uncertain sequence called weighted string. In a weighted string every position contains a subset of the alphabet and every letter of the alphabet is associated with a probability of occurrence such that the sum of probabilities at each position equals 1. Usually a cumulative weight thresholdOpen image in new window is specified, and one considers only strings that match the weighted string with probability at least Open image in new window. We provide an O(nz)O(nz) -time and O(nz)O(nz) -space off-line algorithm, where n is the length of the weighted string and Open image in new window is the given threshold, to compute a smallest maximal palindromic factorization of a weighted string. This factorization has applications in hairpin structure prediction in a set of closely-related DNA or RNA sequences. Along the way, we provide an O(nz)O(nz) -time and O(nz)O(nz) -space off-line algorithm to compute maximal palindromes in weighted strings.
AB - In this work, we consider a special type of uncertain sequence called weighted string. In a weighted string every position contains a subset of the alphabet and every letter of the alphabet is associated with a probability of occurrence such that the sum of probabilities at each position equals 1. Usually a cumulative weight thresholdOpen image in new window is specified, and one considers only strings that match the weighted string with probability at least Open image in new window. We provide an O(nz)O(nz) -time and O(nz)O(nz) -space off-line algorithm, where n is the length of the weighted string and Open image in new window is the given threshold, to compute a smallest maximal palindromic factorization of a weighted string. This factorization has applications in hairpin structure prediction in a set of closely-related DNA or RNA sequences. Along the way, we provide an O(nz)O(nz) -time and O(nz)O(nz) -space off-line algorithm to compute maximal palindromes in weighted strings.
U2 - 10.1007/978-3-319-65172-9_52
DO - 10.1007/978-3-319-65172-9_52
M3 - Other chapter contribution
SN - 978-3-319-65171-2
VL - 744
SP - 620
EP - 629
BT - Engineering Applications of Neural Networks: 18th International Conference, EANN 2017, Athens, Greece, August 25--27, 2017, Proceedings
A2 - Boracchi, Giacomo
A2 - Iliadis, Lazaros
A2 - Jayne, Chrisina
A2 - Likas, Aristidis
PB - Springer International Publishing Switzerland
CY - Cham
ER -
TY - CHAP
T1 - Efficient Identification of k-Closed Strings
AU - Alamro, Hayam
AU - Alzamel, Mai
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
AU - Watts, Steven
AU - Sung, Wing-Kin
PY - 2017/8/2
Y1 - 2017/8/2
N2 - A closed string contains a proper factor occurring as both a prefix and a suffix but not elsewhere in the string. Closed strings were introduced by Fici (WORDS 2011) as objects of combinatorial interest. In this paper, we extend this definition to k-closed strings, for which a level of approximation is permitted up to a number of Hamming distance errors, set by the parameter k. We then address the problem of identifying whether or not a given string of length n over an integer alphabet is k-closed and additionally specifying the border resulting in the string being k-closed. Specifically, we present an O(kn)O(kn) -time and O(n)O(n) -space algorithm to achieve this along with the pseudocode of an implementation.
AB - A closed string contains a proper factor occurring as both a prefix and a suffix but not elsewhere in the string. Closed strings were introduced by Fici (WORDS 2011) as objects of combinatorial interest. In this paper, we extend this definition to k-closed strings, for which a level of approximation is permitted up to a number of Hamming distance errors, set by the parameter k. We then address the problem of identifying whether or not a given string of length n over an integer alphabet is k-closed and additionally specifying the border resulting in the string being k-closed. Specifically, we present an O(kn)O(kn) -time and O(n)O(n) -space algorithm to achieve this along with the pseudocode of an implementation.
U2 - 10.1007/978-3-319-65172-9_49
DO - 10.1007/978-3-319-65172-9_49
M3 - Other chapter contribution
SN - 978-3-319-65171-2
VL - 744
SP - 583
EP - 595
BT - Engineering Applications of Neural Networks: 18th International Conference, EANN 2017, Athens, Greece, August 25--27, 2017, Proceedings
A2 - Boracchi, Giacomo
A2 - Iliadis, Lazaros
A2 - Jayne, Chrisina
A2 - Likas, Aristidis
PB - Springer International Publishing Switzerland
CY - Cham
ER -