Skip to main content
Log in

Parallelizing exact motif finding algorithms on multi-core

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The motif finding problem is one of the important and challenging problems in bioinformatics. A variety of sequential algorithms have been proposed to find exact motifs, but the running time is still not suitable due to high computational complexity of finding motifs. In this paper we parallelize three efficient sequential algorithms which are HEPPMSprune, PMS5 and PMS6. We implement the algorithms on a Dual Quad-Core machine using openMP to measure the performance of each algorithm. Our experiment on simulated data show that: (1) the parallel PMS6 is faster than the other algorithms in case of challenging instances, while the parallel HEPPMSprune is faster than the other algorithms in most of solvable instances; (2) the scalability of parallel HEPPMSprune is linear for all instances, while the scalability of parallel PMS5 and PMS6 is linear in case of challenging instances only; (3) the memory used by HEPPMSprune is less than that of the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 269–278

  2. Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242

    Article  Google Scholar 

  3. Leung H, Chin F (2005) Generalized planted \((l, d)\)-motif problem with negative set. In: Proceedings of workshop on algorithms in bioinformatics (LNCS), vol 3692, pp 264–275

  4. Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7(1):41–51

    Article  Google Scholar 

  5. Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214

    Article  Google Scholar 

  6. Bailey T, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21:51–80

    Google Scholar 

  7. Fraenkel Y, Mandel Y, Friedberg D, Margalit H (1995) Identification of common motifs in unaligned DNA sequences: application to \(Escherichia\ coli\) Lrpregulon. Bioinformatics 11:379–387

  8. Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14:55–67

    Article  Google Scholar 

  9. Hertz G, Stormo G (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15:563–577

    Article  Google Scholar 

  10. Gelfand M, Koonin E, Mironov A (2000) Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl Acid Res 28:695–705

    Article  Google Scholar 

  11. Price A, Ramabhadran S, Pevzner P (2003) Finding subtle motifs by branching from sample strings. Bioinformatics 19(2):ii149–ii155

    Google Scholar 

  12. Huang C, Lee W, Hsieh S (2011) An improved heuristic algorithm for finding motif signals in DNA sequences. IEEE/ACM Trans Comput Biol Bioinf 8(4):959–975

    Article  Google Scholar 

  13. Galas D, Eggert M, Waterman M (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from \(Escherichia coli\). J Mol Biol 186(1):117–128

    Article  Google Scholar 

  14. Staden R (1989) Methods for discovering novel motifs in nucleic acid sequences. Comput Appl Biosci 5(4):293–298

    Google Scholar 

  15. Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 15:1202–1215

    Google Scholar 

  16. Sagot M (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi CL, Moura AV (eds) Latin’98: theoretical informatics, LNCS, vol 1380, pp 111–127

  17. Van-Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842

    Article  Google Scholar 

  18. Tompa M (1999) An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proceedings of seventh international conference on intelligent systems for molecular biology, pp 262–271

  19. Marsan L, Sagot M (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol 7(3–4):345–362

    Article  Google Scholar 

  20. Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 344–354

  21. Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223

    Article  Google Scholar 

  22. Eskin E, Pevzner P (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1):354–363

    Article  Google Scholar 

  23. Evans P, Smith A (2003) Toward optimal motif enumeration. In: Proceedings of eighth international workshop algorithms and data structures ( WADS03), pp 47–58

  24. Carvalho A, Freitas A, Oliveira A, Sagot M (2005) A highly scalable algorithm for the extraction of CIS-Regulatory regions. In: Proceedings of third Asia Pacific bioinformatics conference, pp 273–282

  25. Chin F, Leung H (2005) Voting algorithms for discovering long motifs. In: Proceedings of third Asia Pacific bioinformatics conference, pp 261–271

  26. Rajasekaran S, Balla S, Huang C (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128

    Article  Google Scholar 

  27. Davila J, Balla S, Rajasekaran S (2006) Space and time efficient algorithms for planted motif search. In: Proceedings of second international workshop on bioinformatics research and applications ( LNCS 3992), pp 822–829

  28. Pisanti N, Carvalho A, Marsan L, Sagot M (2006) RISOTTO: fast extraction of motifs with mismatches. In: Proceedings of seventh Latin American theoretical informatics symposium, pp 757–768

  29. Davila J, Balla S, Rajasekaran S (2007) Fast and practical algorithms for planted \((l, d)\) motif search. IEEE/ACM Trans Comput Biol Bioinf 4(4):544–552

    Article  Google Scholar 

  30. Dinh H, Rajasekaran S, Kundeti V (2011) PMS5: an efficient exact algorithm for the \((l, d)\)-motif finding problem. BMC Bioinf 12:410–420

    Article  Google Scholar 

  31. Abbas M, Abouelhoda M, Bahig H (2012) A hybrid method for the exact planted (l, d) motif finding problem and its parallelization. BMC Bioinformatics, vol 13, supplement 17, Article S10

  32. Bandyopadhyay S, Sahni S, Rajasekaran S (2012) PMS6: a faster algorithm for motif discovery. In: Proceedings of the second IEEE international conference on computational advances in bio and medical sciences (ICCABS 2012), pp 1–6

  33. Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310

    Google Scholar 

  34. Carvalho A, Freitas A, Oliveira A, Sagot M (2004) A parallel algorithm for the extraction of structured motifs. In: Proceedings of the 19th ACM symposium on applied computing (SAC’04), pp 147–153

  35. Hamdani H, Rashid N, Abdulrazzaq A, Ghadban R, Wajidi M (2009) Fast phylocon algorithm using OpenMP. In: Proceedings of the IEEE international conference on computer technology and development, pp 550–553

  36. Yu L, Xu Y (2009) A parallel Gibbs sampling algorithm for motif finding on GPU. In: Proceedings of the IEEE international symposium on parallel and distributed processing with applications, pp 555–558

  37. Faheem H (2010) Accelerating motif finding problem using grid computing with enhanced brute force. In: Proceedings of the 12th international conference on advanced communication technology (ICACT), pp 197–202

  38. Dasari N, Desh R, Zubair M (2010a) An efficient multicore implementation of planted motif problem. In: Proceedings of the international conference on high performance computing and simulation, pp 9–15

  39. Dasari N, Desh R, Zubair M (2010b) Solving planted motif problem on GPU. In: International workshop on GPUs and scientific applications

  40. Dasari N, Desh R, Zubair M (2011) High performance implementation of planted motif problem using suffix trees. In: Proceedings of the international conference on high performance computing and simulation, pp 200–206

  41. Sahoo B, Sourav R, Ranjan R, Padhy S (2011) Parallel implementation of exact algorithm for planted motif search problem using SMP cluster. Eur J Sci Res 64(4):484–496

    Google Scholar 

  42. Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: Proceedings of the IEEE international parallel and distributed processing symposium, pp 423–429

Download references

Acknowledgments

Part of this publication was made possible by NPRP Grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. The authors are grateful to Dr. S. Bandyopadhyay of the University of Florida for providing us the PMS5 and PMS6 codes and for assisting us with several aspects of these codes. The authors also thank Prof. S. Rajasekaran of the University of Connecticut for providing us with the source code of PMSprune. Finally, the authors would like to thank the anonymous referees for their helpful comments. The first submission of this publication was done before the first author attended KINDI Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hazem M. Bahig.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbas, M.M., Bahig, H.M., Abouelhoda, M. et al. Parallelizing exact motif finding algorithms on multi-core. J Supercomput 69, 814–826 (2014). https://doi.org/10.1007/s11227-014-1180-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1180-3

Keywords

Navigation