King's College London

Research portal

String Sanitization: A combinatorial approach

Research output: Chapter in Book/Report/Conference proceedingConference paper

Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukidis, Nadia Pisanti, Solon Pissis, Giovanna Rosone

Original languageEnglish
Title of host publicationEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) 2019
Accepted/In press8 Jun 2019

Publication series

NameEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) 2019
PublisherSpringer

Documents

King's Authors

Abstract

String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this paper, we consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility. First, we propose a time-optimal algorithm, TFS-ALGO, to construct the shortest string preserving the order of appearance and the frequency of all non-sensitive patterns. Such a string allows accurately performing tasks based on the sequential nature
and pattern frequencies of the string. Second, we propose a time-optimal algorithm, PFS-ALGO, which preserves a partial order of appearance of non-sensitive patterns but produces a much shorter string that can be analyzed more efficiently. The strings produced by either of these algorithms may reveal the location of sensitive patterns. In response, we propose a heuristic, MCSR-ALGO, which replaces letters in these strings
with carefully selected letters, so that sensitive patterns are not reinstated and occurrences of spurious patterns are prevented. We implemented our sanitization approach that applies TFS-ALGO, PFS-ALGO and then MCSR-ALGO and experimentally show that it is effective and efficient.

Download statistics

No data available

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454