Accelerating String Matching on MIC Architecture for Motif Extraction

Pissis, Solon P.; Goll, Christian; Pavlidis, Pavlos; Stamatakis, Alexandros

doi:10.1007/978-3-642-55195-6_24

Accelerating String Matching on MIC Architecture for Motif Extraction

Solon P. Pissis^19,20,
Christian Goll²⁰,
Pavlos Pavlidis²¹ &
…
Alexandros Stamatakis²⁰

Conference paper
First Online: 01 January 2014

1348 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8385))

Abstract

Identifying repeated factors that occur in a string of letters or common factors that occur in a set of strings represents an important task in computer science and biology. Such patterns are called motifs, and the process of identifying them is called motif extraction. In biology, motifs may correspond to functional elements in DNA, RNA, or protein molecules. In this article, we orchestrate MoTeX, a high-performance computing tool for MoTif eXtraction from large-scale datasets, on Many Integrated Core (MIC) architecture. MoTeX uses state-of-the-art algorithms for solving the fixed-length approximate string-matching problem. It comes in three flavors: a standard CPU version; an OpenMP version; and an MPI version. We compare the performance of our MIC implementation to the corresponding CPU version of MoTeX. Our MIC implementation accelerates the computations by a factor of ten compared to the CPU version. We also compare the performance of our MIC implementation to the corresponding OpenMP version of MoTeX running on modern Multicore architectures. Our MIC implementation accelerates the computations by a factor of two compared to the OpenMP version.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Pisanti, N., Carvalho, A.M., Marsan, L., Sagot, M.-F.: RISOTTO: fast extraction of Motifs with mismatches. In: Correa, J., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 757–768. Springer, Heidelberg (2006)
Chapter Google Scholar
Crochemore, M., Iliopoulos, C.S., Pissis, S.P.: A parallel algorithm for fixed-length approximate string-matching with k-mismatches. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 92–101. Springer, Heidelberg (2010)
Google Scholar
Das, M., Dai, H.K.: A survey of DNA motif finding algorithms. BMC Bioinform. 8(Suppl 7), S21+ (2007)
Article Google Scholar
Iliopoulos, C.S., Mouchard, L., Pinzon, Y.J.: The Max-Shift algorithm for approximate string matching. In: Brodal, G., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, pp. 13–25. Springer, Heidelberg (2001)
Google Scholar
Lothaire, M. (ed.): Applied Combinatorics on Words. Cambridge University Press, New York (2005)
MATH Google Scholar
Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. J. Comput. Mol. Cell Biol. 7(3–4), 345–362 (2000)
Article Google Scholar
Pavesi, G., Mereghetti, P., Mauri, G., Pesole, G.: Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32(Web-Server-Issue), 199–203 (2004)
Google Scholar
Pissis, S.P., Stamatakis, A., Pavlidis, P.: MoTeX: a word-based HPC tool for MoTif eXtraction. In: Gao, J. (ed.) Fourth ACM International Conference on Bioinformatics and Computational Biology (ACM-BCB 2013), pp. 13–22 (2013)
Google Scholar
Rombauts, S., Déhais, P., Van Montagu, M., Rouzé, P.: PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Res. 27(1), 295–296 (1999)
Article Google Scholar
Sagot, M.-F.: Spelling approximate repeated or common Motifs using a suffix tree. In: Lucchesi, C., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 374–390. Springer, Heidelberg (1998)
Google Scholar
Sinha, S., Tompa, M.: YMF: a program for discovery of novel transcription factor binding sites by statistical verrepresentation. Nucleic Acids Res. 31(13), 3586–3588 (2003)
Article Google Scholar
Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Florida Museum of Natural History, University of Florida, Gainesville, USA
Solon P. Pissis
Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Solon P. Pissis, Christian Goll & Alexandros Stamatakis
Foundation for Research and Technology – Hellas, Iraklio, Greece
Pavlos Pavlidis

Authors

Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar
Christian Goll
View author publications
You can also search for this author in PubMed Google Scholar
Pavlos Pavlidis
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Stamatakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Department of Computer Science, Knoxville, Tennessee, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski
Technical University of Denmark Informatics and Mathematical Modelling, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pissis, S.P., Goll, C., Pavlidis, P., Stamatakis, A. (2014). Accelerating String Matching on MIC Architecture for Motif Extraction. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55195-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-55195-6_24
Published: 08 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55194-9
Online ISBN: 978-3-642-55195-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics