Clustering sequence graphs

Haodi Zhong, Grigorios Loukidis, Solon P Pissis*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

32 Downloads (Pure)

Abstract

In application domains ranging from social networks to e-commerce, it is important to cluster users with respect to both their relationships (e.g., friendship or trust) and their actions (e.g., visited locations or rated products). Motivated by these applications, we introduce here the task of clustering the nodes of a sequence graph, i.e., a graph whose nodes are labeled with strings (e.g., sequences of users' visited locations or rated products). Both string clustering algorithms and graph clustering algorithms are inappropriate to deal with this task, as they do not consider the structure of strings and graph simultaneously. Moreover, attributed graph clustering algorithms generally construct poor solutions because they need to represent a string as a vector of attributes, which inevitably loses information and may harm clustering quality.
We thus introduce the problem of clustering a sequence graph. We first propose two pairwise distance measures for sequence graphs, one based on edit distance and shortest path distance and another one based on SimRank. We then formalize the problem under each measure, showing also that it is NP-hard. In addition, we design a polynomial-time 2-approximation algorithm, as well as a heuristic for the problem. Experiments using real datasets and a case study demonstrate the effectiveness and efficiency of our methods.
Original languageEnglish
Article number101981
JournalDATA AND KNOWLEDGE ENGINEERING
Volume138
DOIs
Publication statusPublished - Mar 2022

Fingerprint

Dive into the research topics of 'Clustering sequence graphs'. Together they form a unique fingerprint.

Cite this