TY - JOUR
T1 - Reverse-Safe Text Indexing
AU - Bernardini, Giulia
AU - Chen, Huiping
AU - Fici, Gabriele
AU - Loukidis, Grigorios
AU - Pissis, Solon
N1 - Funding Information:
H. Chen was supported by a CSC Scholarship. G. Fici was supported in part by MIUR-PRIN 2017 project 2017K7XPAN “Algorithms, Data Structures and Combinatorics for Machine Learning.” G. Loukides was supported in part by the Leverhulme Trust project RPG-2019-399. Authors’ addresses: G. Bernardini, Università di Milano–Bicocca, Viale Sarca 336, Milano, 20100, Italy; email: [email protected]; H. Chen and G. Loukides, Department of Informatics, King’s College London, Bush House, 30 Aldwych, London, WC2B 4BG, UK; emails: {huiping.chen, grigorios.loukides}@kcl.ac.uk; G. Fici, Università degli Studi di Palermo, Via Archirafi 34, Palermo, 90123, Italy; email: [email protected]; S. P. Pissis, CWI, Science Park 123, Amsterdam, 1098 XG, Netherlands, and Vrije Universiteit, Amsterdam, Netherlands; email: [email protected].
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/7/9
Y1 - 2021/7/9
N2 - We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm that constructs a z-reverse-safe data structure (z-RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z-RSDS. The construction algorithm takes O(nI• log d) time, where I• is the matrix multiplication exponent. We show that, despite the nI• factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z-RSDS for decision pattern matching queries, whose size can be sublinear in n. A preliminary version of this article appeared in ALENEX 2020.
AB - We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm that constructs a z-reverse-safe data structure (z-RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z-RSDS. The construction algorithm takes O(nI• log d) time, where I• is the matrix multiplication exponent. We show that, despite the nI• factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z-RSDS for decision pattern matching queries, whose size can be sublinear in n. A preliminary version of this article appeared in ALENEX 2020.
UR - http://www.scopus.com/inward/record.url?scp=85119171740&partnerID=8YFLogxK
U2 - 10.1145/3461698
DO - 10.1145/3461698
M3 - Article
SN - 1084-6654
VL - 26
JO - ACM Journal of Experimental Algorithmics
JF - ACM Journal of Experimental Algorithmics
IS - 1
M1 - 3461698
ER -