Abstract
Novel high throughput sequencing technology methods have redefined the way genome sequencing is performed. They are able to produce tens of millions of short sequences (reads) in a single experiment and with a much lower cost than previous sequencing methods. Due to this massive amount of data generated by the above systems, efficient algorithms for mapping short sequences to a reference genome are in great demand. In this paper, we present a practical algorithm for addressing the problem of efficiently mapping uniquely occuring short reads to a reference genome. This requires the classification of these short reads into unique and duplicate matches. In particular, we define and solve the Massive Exact Unique Pattern Matching problem in genomes.
Original language | English |
---|---|
Title of host publication | ITAB 2009 |
Subtitle of host publication | 9th International Conference on Information Technology and Applications in Biomedicine, 2009 |
Place of Publication | Piscataway, N.J. |
Publisher | IEEE |
Pages | N/A |
Number of pages | 4 |
Volume | N/A |
Edition | N/A |
ISBN (Print) | 9781424453795 |
DOIs | |
Publication status | Published - 2009 |