Abstract
The motif finding problem is one of the important and challenging problems in bioinformatics. A variety of sequential algorithms have been proposed to find exact motifs, but the running time is still not suitable due to high computational complexity of finding motifs. In this paper we parallelize three efficient sequential algorithms which are HEPPMSprune, PMS5 and PMS6. We implement the algorithms on a Dual Quad-Core machine using openMP to measure the performance of each algorithm. Our experiment on simulated data show that: (1) the parallel PMS6 is faster than the other algorithms in case of challenging instances, while the parallel HEPPMSprune is faster than the other algorithms in most of solvable instances; (2) the scalability of parallel HEPPMSprune is linear for all instances, while the scalability of parallel PMS5 and PMS6 is linear in case of challenging instances only; (3) the memory used by HEPPMSprune is less than that of the other algorithms.







Similar content being viewed by others
References
Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 269–278
Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242
Leung H, Chin F (2005) Generalized planted \((l, d)\)-motif problem with negative set. In: Proceedings of workshop on algorithms in bioinformatics (LNCS), vol 3692, pp 264–275
Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7(1):41–51
Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214
Bailey T, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21:51–80
Fraenkel Y, Mandel Y, Friedberg D, Margalit H (1995) Identification of common motifs in unaligned DNA sequences: application to \(Escherichia\ coli\) Lrpregulon. Bioinformatics 11:379–387
Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14:55–67
Hertz G, Stormo G (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15:563–577
Gelfand M, Koonin E, Mironov A (2000) Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl Acid Res 28:695–705
Price A, Ramabhadran S, Pevzner P (2003) Finding subtle motifs by branching from sample strings. Bioinformatics 19(2):ii149–ii155
Huang C, Lee W, Hsieh S (2011) An improved heuristic algorithm for finding motif signals in DNA sequences. IEEE/ACM Trans Comput Biol Bioinf 8(4):959–975
Galas D, Eggert M, Waterman M (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from \(Escherichia coli\). J Mol Biol 186(1):117–128
Staden R (1989) Methods for discovering novel motifs in nucleic acid sequences. Comput Appl Biosci 5(4):293–298
Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 15:1202–1215
Sagot M (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi CL, Moura AV (eds) Latin’98: theoretical informatics, LNCS, vol 1380, pp 111–127
Van-Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842
Tompa M (1999) An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proceedings of seventh international conference on intelligent systems for molecular biology, pp 262–271
Marsan L, Sagot M (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol 7(3–4):345–362
Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 344–354
Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223
Eskin E, Pevzner P (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1):354–363
Evans P, Smith A (2003) Toward optimal motif enumeration. In: Proceedings of eighth international workshop algorithms and data structures ( WADS03), pp 47–58
Carvalho A, Freitas A, Oliveira A, Sagot M (2005) A highly scalable algorithm for the extraction of CIS-Regulatory regions. In: Proceedings of third Asia Pacific bioinformatics conference, pp 273–282
Chin F, Leung H (2005) Voting algorithms for discovering long motifs. In: Proceedings of third Asia Pacific bioinformatics conference, pp 261–271
Rajasekaran S, Balla S, Huang C (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128
Davila J, Balla S, Rajasekaran S (2006) Space and time efficient algorithms for planted motif search. In: Proceedings of second international workshop on bioinformatics research and applications ( LNCS 3992), pp 822–829
Pisanti N, Carvalho A, Marsan L, Sagot M (2006) RISOTTO: fast extraction of motifs with mismatches. In: Proceedings of seventh Latin American theoretical informatics symposium, pp 757–768
Davila J, Balla S, Rajasekaran S (2007) Fast and practical algorithms for planted \((l, d)\) motif search. IEEE/ACM Trans Comput Biol Bioinf 4(4):544–552
Dinh H, Rajasekaran S, Kundeti V (2011) PMS5: an efficient exact algorithm for the \((l, d)\)-motif finding problem. BMC Bioinf 12:410–420
Abbas M, Abouelhoda M, Bahig H (2012) A hybrid method for the exact planted (l, d) motif finding problem and its parallelization. BMC Bioinformatics, vol 13, supplement 17, Article S10
Bandyopadhyay S, Sahni S, Rajasekaran S (2012) PMS6: a faster algorithm for motif discovery. In: Proceedings of the second IEEE international conference on computational advances in bio and medical sciences (ICCABS 2012), pp 1–6
Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310
Carvalho A, Freitas A, Oliveira A, Sagot M (2004) A parallel algorithm for the extraction of structured motifs. In: Proceedings of the 19th ACM symposium on applied computing (SAC’04), pp 147–153
Hamdani H, Rashid N, Abdulrazzaq A, Ghadban R, Wajidi M (2009) Fast phylocon algorithm using OpenMP. In: Proceedings of the IEEE international conference on computer technology and development, pp 550–553
Yu L, Xu Y (2009) A parallel Gibbs sampling algorithm for motif finding on GPU. In: Proceedings of the IEEE international symposium on parallel and distributed processing with applications, pp 555–558
Faheem H (2010) Accelerating motif finding problem using grid computing with enhanced brute force. In: Proceedings of the 12th international conference on advanced communication technology (ICACT), pp 197–202
Dasari N, Desh R, Zubair M (2010a) An efficient multicore implementation of planted motif problem. In: Proceedings of the international conference on high performance computing and simulation, pp 9–15
Dasari N, Desh R, Zubair M (2010b) Solving planted motif problem on GPU. In: International workshop on GPUs and scientific applications
Dasari N, Desh R, Zubair M (2011) High performance implementation of planted motif problem using suffix trees. In: Proceedings of the international conference on high performance computing and simulation, pp 200–206
Sahoo B, Sourav R, Ranjan R, Padhy S (2011) Parallel implementation of exact algorithm for planted motif search problem using SMP cluster. Eur J Sci Res 64(4):484–496
Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: Proceedings of the IEEE international parallel and distributed processing symposium, pp 423–429
Acknowledgments
Part of this publication was made possible by NPRP Grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. The authors are grateful to Dr. S. Bandyopadhyay of the University of Florida for providing us the PMS5 and PMS6 codes and for assisting us with several aspects of these codes. The authors also thank Prof. S. Rajasekaran of the University of Connecticut for providing us with the source code of PMSprune. Finally, the authors would like to thank the anonymous referees for their helpful comments. The first submission of this publication was done before the first author attended KINDI Lab.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbas, M.M., Bahig, H.M., Abouelhoda, M. et al. Parallelizing exact motif finding algorithms on multi-core. J Supercomput 69, 814–826 (2014). https://doi.org/10.1007/s11227-014-1180-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1180-3