TY - JOUR
T1 - String Sanitization Under Edit Distance
AU - Bernardini, Giulia
AU - Chen, Huiping
AU - Loukidis, Grigorios
AU - Pisanti, Nadia
AU - Pissis, Solon
AU - Stougie, Leen
AU - Sweering, Michelle
PY - 2020/6/9
Y1 - 2020/6/9
N2 - Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
AB - Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
KW - Conditional lower bound
KW - Data sanitization
KW - Dynamic programming
KW - Edit distance
KW - String algorithms
UR - http://www.scopus.com/inward/record.url?scp=85088401873&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CPM.2020.7
DO - 10.4230/LIPIcs.CPM.2020.7
M3 - Conference paper
SN - 1868-8969
JO - Leibniz International Proceedings in Informatics, LIPIcs
JF - Leibniz International Proceedings in Informatics, LIPIcs
ER -