TY - JOUR
T1 - Faster algorithms for 1-mappability of a sequence
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
AU - Radoszewski, Jakub
AU - Sung, Wing-Kin
PY - 2019/5/23
Y1 - 2019/5/23
N2 - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(log
σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klog
σn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.
AB - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k=1. There exists an algorithm to solve this problem for k=1 requiring time O(mnlogn/loglogn) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(nlognloglogn), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m=Ω(log
σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m=Ω(klog
σn), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.
KW - Algorithms on strings
KW - Hamming distance
KW - Sequence mappability
UR - http://www.scopus.com/inward/record.url?scp=85067179715&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2019.04.026
DO - 10.1016/j.tcs.2019.04.026
M3 - Article
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
ER -
TY - CONF
T1 - Efficient Computation of Sequence Mappability
AU - Alzamel, Mai Abdulaziz M
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas
AU - Kociumaka, Tomasz
AU - Pissis, Solon
AU - Radoszewski, Jakub
AU - Straszynski, Juliusz
PY - 2018
Y1 - 2018
U2 - 10.1007%2F978-3-030-00479-8_2
DO - 10.1007%2F978-3-030-00479-8_2
M3 - Paper
SP - 12
EP - 26
ER -
TY - CHAP
T1 - How to Answer a Small Batch of RMQs or LCA Queries in Practice
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas S.
AU - Pissis, Solon P.
PY - 2018
Y1 - 2018
U2 - 10.1007/978-3-319-78825-8_28
DO - 10.1007/978-3-319-78825-8_28
M3 - Conference paper
VL - 10765
T3 - Lecture Notes in Computer Science
SP - 343
EP - 355
BT - Combinatorial Algorithms
A2 - Brankovic, Ljiljana
A2 - Ryan, Joe
A2 - Smyth, William F.
PB - Springer International Publishing
CY - Cham
ER -
TY - CHAP
T1 - Faster Algorithms for 1-Mappability of a Sequence
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas
AU - Pissis, Solon
AU - Radoszewski, Jakub
AU - Sung, Wing Kin
PY - 2017/11/16
Y1 - 2017/11/16
N2 - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k= 1. The fastest known algorithm for k= 1 requires time O(mnlog n/ log log n) and space O(n). We present two new algorithms that require worst-case time O(mn) and O(nlog nlog log n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = Ω(log σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = Ω(k logσn).
AB - In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k= 1. The fastest known algorithm for k= 1 requires time O(mnlog n/ log log n) and space O(n). We present two new algorithms that require worst-case time O(mn) and O(nlog nlog log n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = Ω(log σn). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = Ω(k logσn).
UR - http://www.scopus.com/inward/record.url?scp=85038216754&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-71147-8_8
DO - 10.1007/978-3-319-71147-8_8
M3 - Other chapter contribution
AN - SCOPUS:85038216754
SN - 9783319711461
VL - 10628 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 109
EP - 121
BT - Combinatorial Optimization and Applications - 11th International Conference, COCOA 2017, Proceedings
PB - Springer Verlag
ER -
TY - CHAP
T1 - Palindromic Decompositions with Gaps and Errors
AU - Adamczyk, Michal
AU - Alzamel, Mai
AU - Charalampopoulos, Panagiotis
AU - Iliopoulos, Costas
AU - Radoszewski, Jakub
PY - 2017
Y1 - 2017
N2 - Identifying palindromes in sequences has been an interest-ing line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Effcient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decomposi-tions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decompo-sition of a string of length n with the minimal total gap length in time O(n log n · g) and space O(n · g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal δ-palindromes (i.e. palindromes with δ errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n · (g + δ)) and space O(n · g).
AB - Identifying palindromes in sequences has been an interest-ing line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Effcient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decomposi-tions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decompo-sition of a string of length n with the minimal total gap length in time O(n log n · g) and space O(n · g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal δ-palindromes (i.e. palindromes with δ errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n · (g + δ)) and space O(n · g).
UR - http://www.scopus.com/inward/record.url?scp=85019261702&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-58747-9_7
DO - 10.1007/978-3-319-58747-9_7
M3 - Other chapter contribution
AN - SCOPUS:85019261702
SN - 9783319587462
VL - 10304 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 48
EP - 61
BT - Computer Science - Theory and Applications - 12th International Computer Science Symposium in Russia, CSR 2017, Proceedings
PB - Springer Verlag
ER -