Research output: Chapter in Book/Report/Conference proceeding › Conference paper

Carl Barton, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

Original language | English |
---|---|

Title of host publication | 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016) |

Editors | Roberto Grossi, Moshe Lewenstein |

Place of Publication | Dagstuhl, Germany |

Publisher | Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik |

Pages | 4:1-4:13 |

Number of pages | 13 |

Volume | 54 |

ISBN (Print) | 978-3-95977-012-5 |

DOIs | |

Published | 2016 |

Name | Leibniz International Proceedings in Informatics (LIPIcs) |
---|---|

Publisher | Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik |

**Efficient Index for Weighted Sequences_BARTON_Published 2016_GOLD VoR**Barton_et_al_2016.pdf, 515 KB, application/pdf

Uploaded date:01 Jul 2016

Version:Final published version

Licence:CC BY

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1/z, we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i,...,i+|P|-1 in the text is at least 1/z. In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

No data available

King's College London - Homepage

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454