Connecting de Bruijn Graphs

Giulia Bernardini*, Huiping Chen*, Inge Li Gørtz*, Christoffer Krogh*, Grigorios Loukides*, Solon P. Pissis*, Leen Stougie*, Michelle Sweering*

*Corresponding author for this work

Research output: Contribution to journalConference paperpeer-review

Abstract

We study the problem of making a de Bruijn graph (dBG), constructed from a collection of strings, weakly connected while minimizing the total cost of edge additions. The input graph is a dBG that can be made weakly connected by adding edges (along with extra nodes if needed) from the underlying complete dBG. The problem arises from genome reconstruction, where the dBG is constructed from a set of sequences generated from a genome sample by a sequencing experiment. Due to sequencing errors, the dBG is never Eulerian in practice and is often not even weakly connected. We show the following results for a dBG G(V, E) of order k consisting of d weakly connected components: 1. Making G weakly connected by adding a set of edges of minimal total cost is NP-hard. 2. No PTAS exists for making G weakly connected by adding a set of edges of minimal total cost (unless the unique games conjecture fails). We complement this result by showing that there does exist a polynomial-time (2 − 2/d)-approximation algorithm for the problem. 3. We consider a restricted version of the above problem, where we are asked to make G weakly connected by only adding directed paths between pairs of components. We show that making G weakly connected by adding d− 1 such paths of minimal total cost can be done in O(k|V |α(|V |) + |E|) time, where α(·) is the inverse Ackermann function. This improves on the O(k|V | log(|V |) + |E|)-time algorithm proposed by Bernardini et al. [CPM 2022] for the same restricted problem. 4. An ILP formulation of polynomial size for making G Eulerian with minimal total cost.

Original languageEnglish
JournalLeibniz International Proceedings in Informatics, LIPIcs
DOIs
Publication statusAccepted/In press - 12 Apr 2024
Event35th Annual Symposium on Combinatorial Pattern Matching, CPM 2024 - Fukuoka, Japan
Duration: 25 Jun 202427 Jun 2024

Keywords

  • de Bruijn graph
  • Eulerian graph
  • graph algorithm
  • string algorithm

Fingerprint

Dive into the research topics of 'Connecting de Bruijn Graphs'. Together they form a unique fingerprint.

Cite this