King's College London

Research portal

String Sanitization Under Edit Distance

Research output: Contribution to journalConference paper

Standard

String Sanitization Under Edit Distance. / Bernardini, Giulia; Chen, Huiping; Loukidis, Grigorios; Pisanti, Nadia; Pissis, Solon; Stougie, Leen; Sweering, Michelle.

In: Leibniz International Proceedings in Informatics, LIPIcs, 09.06.2020.

Research output: Contribution to journalConference paper

Harvard

Bernardini, G, Chen, H, Loukidis, G, Pisanti, N, Pissis, S, Stougie, L & Sweering, M 2020, 'String Sanitization Under Edit Distance', Leibniz International Proceedings in Informatics, LIPIcs. https://doi.org/10.4230/LIPIcs.CPM.2020.7

APA

Bernardini, G., Chen, H., Loukidis, G., Pisanti, N., Pissis, S., Stougie, L., & Sweering, M. (2020). String Sanitization Under Edit Distance. Leibniz International Proceedings in Informatics, LIPIcs. https://doi.org/10.4230/LIPIcs.CPM.2020.7

Vancouver

Bernardini G, Chen H, Loukidis G, Pisanti N, Pissis S, Stougie L et al. String Sanitization Under Edit Distance. Leibniz International Proceedings in Informatics, LIPIcs. 2020 Jun 9. https://doi.org/10.4230/LIPIcs.CPM.2020.7

Author

Bernardini, Giulia ; Chen, Huiping ; Loukidis, Grigorios ; Pisanti, Nadia ; Pissis, Solon ; Stougie, Leen ; Sweering, Michelle. / String Sanitization Under Edit Distance. In: Leibniz International Proceedings in Informatics, LIPIcs. 2020.

Bibtex Download

@article{838c86d09db146aabe382bc245c8930d,
title = "String Sanitization Under Edit Distance",
abstract = "Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and K{\"u}nnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.",
keywords = "Conditional lower bound, Data sanitization, Dynamic programming, Edit distance, String algorithms",
author = "Giulia Bernardini and Huiping Chen and Grigorios Loukidis and Nadia Pisanti and Solon Pissis and Leen Stougie and Michelle Sweering",
year = "2020",
month = jun,
day = "9",
doi = "10.4230/LIPIcs.CPM.2020.7",
language = "English",
journal = "Leibniz International Proceedings in Informatics, LIPIcs",
issn = "1868-8969",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

}

RIS (suitable for import to EndNote) Download

TY - JOUR

T1 - String Sanitization Under Edit Distance

AU - Bernardini, Giulia

AU - Chen, Huiping

AU - Loukidis, Grigorios

AU - Pisanti, Nadia

AU - Pissis, Solon

AU - Stougie, Leen

AU - Sweering, Michelle

PY - 2020/6/9

Y1 - 2020/6/9

N2 - Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

AB - Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

KW - Conditional lower bound

KW - Data sanitization

KW - Dynamic programming

KW - Edit distance

KW - String algorithms

UR - http://www.scopus.com/inward/record.url?scp=85088401873&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.CPM.2020.7

DO - 10.4230/LIPIcs.CPM.2020.7

M3 - Conference paper

JO - Leibniz International Proceedings in Informatics, LIPIcs

JF - Leibniz International Proceedings in Informatics, LIPIcs

SN - 1868-8969

ER -

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454