MoTeX: A word-based HPC tool for MoTif eXtraction

Solon P. Pissis, Alexandros Stamatakis, Pavlos Pavlidis

Research output: Chapter in Book/Report/Conference proceedingConference paper

10 Citations (Scopus)

Abstract

Motivation: Identifying repeated factors that occur in a string of letters or common factors that occur in a set of strings represents an important task in computer science and biology. Such patterns are called motifs, and the process of identifying them is called motif extraction. In biology, motifs may correspond to functional elements in DNA, RNA, or protein molecules. Motifs may also correspond to whole loci whose sequences are highly similar because of recent duplication (e.g., transposable elements or recently duplicated genes). A DNA motif is a nucleic acid sequence that has a specific biological function, for instance encoding the DNA binding sites for a regulatory protein (transcription factor).

Results: In this article, we introduce MoTeX, the first high-performance computing (HPC) tool for MoTif eXtraction from large-scale datasets. It uses state-of-the-art algorithms for solving the fixed-length approximate string matching problem. MoTeX comes in three flavors: a standard CPU version; an OpenMP-based version; and an MPI-based version. We show that MoTeX produces similar and partially identical results to current state-of-the-art tools with respect to accuracy as quantified by statistical significance measures. Moreover, we show that it matches or outperforms competing tools in terms of runtime efficiency. The MPI-based version of MoTeX requires only one hour to process all human genes on 1056 processors, while current sequential programmes require more than two months for this task.
Original languageEnglish
Title of host publicationBCB '13
Subtitle of host publicationProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
EditorsCathy H Wu, Sridhar Hannenhalli
Place of PublicationNew York
PublisherACM
Pages13-22
Number of pages10
ISBN (Print)9781450324342
DOIs
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'MoTeX: A word-based HPC tool for MoTif eXtraction'. Together they form a unique fingerprint.

Cite this