King's College London

Research portal

Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Research output: Contribution to journalArticle

Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

Original languageEnglish
JournalTHEORY OF COMPUTING SYSTEMS
Early online date10 Aug 2018
DOIs
StateE-pub ahead of print - 10 Aug 2018

Links

King's Authors

Abstract

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterised by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem. Therefore, we make an effort to keep the lower order terms of the complexities of our algorithms as small as possible.

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454