Algorithms in information security: spambots detection, string sanitization, and computing antiperiods

Student thesis: Doctoral ThesisDoctor of Philosophy


The rapid increase in the amount of published information and the explosive growth in the volume of data coincide with the digital transformation of orga-nizations in all sectors, in light of the development of information technology and global networks. Making data available online has created challenges for many organizations. These challenges include the securing of data, and the detecting confidential patterns efficiently. The need to protect data from intrusion and the speed in finding interesting patterns within the data have become a competitive research problem, that works to find effective algorithmic solutions in terms of space and time. In computer science, a pattern matching algorithm is an essential solution on many applications, such as biomedical informatics, network analysis, marketing, intrusion detection systems, and data compression. It acts by checking several patterns within a larger string or text.
This thesis presents a variety of advanced and efficient algorithms – based on advanced stringology data structures and smart techniques in pattern matching - to solve pattern matching problems in detecting creative Web spambots actions, sensitive patterns, and repeated patterns in linear time and space. The thesis is organized mainly in four parts/ divisions. The first part deals with “Pattern Matching Problems in Detecting Web Spambots”, related to the digital creative deception. This part focuses on the innovative creative techniques that the spammer might resort to defeat and deceive current detection tools. These techniques include per-forming a series of malicious actions with variable time delays, repeating the same series of malicious actions multiple times, and interleaving legitimate actions with malicious and unnecessary actions. Our algorithms to tackle the aforementioned techniques consider the temporal information as they consider time annotated sequences and require a match to occur within a time window. The problems of creative web spambots detection are detailed in a separate chapter which includes: Deception with Errors, Deception with Disguised Actions and Errors, and De-ception with Don’t Cares Actions. The problem of ”Deception with Don’t Cares Actions” is generalized to ”Detecting Patterns Efficiently with Don’t Care” in a separate chapter as our proposed algorithm can be applied for any pattern matching with don’t care problem in any field of computer science applications. The third part is called “Towards String Sanitization”, which studies the general problem of sanitizing a string. The problem of protecting a string is formalized by sanitizing (i.e., concealing) a given set of sensitive patterns that represent confidential information through injecting uncertainty in the string while preserving the utility of the string as much as possible. The last part is called “Computing the Antiperiod(s) of a String”. This part describes algorithms for computing the smallest antiperiods and all antiperiods of a given string in a linear time and space. Then we show how to obtain a faster algorithm for the smallest antiperiod problem via exploiting the properties of antiperiods. These algorithms are essential and central to several fields of computer science including computational biology, patterns matching and data compression.
Date of Award1 Oct 2021
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorCostas Iliopoulos (Supervisor) & Grigorios Loukidis (Supervisor)

Cite this