Skip to main content

Advertisement

Log in

Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequences

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Transcriptional regulation is the main regulation of gene expression, the process by which all prokaryotic organisms and eukaryotic cells transform the information encoded by the nucleic acids (DNA) into the proteins required for their operation and development. A crucial component in genetic regulation is the bindings between transcription factors and DNA sequences that regulate the expression of genes. These specific locations are short and share a common sequence of nucleotides. The discovery of these small DNA strings, also known as motifs, is labor intensive and therefore the use of high-performance computing can be a good way to address it. In this work, we present a parallel multiobjective evolutionary algorithm, a novel hybrid technique based on differential evolution with Pareto tournaments (H-DEPT). To study whether this algorithm is suitable to be parallelized, H-DEPT has been used to solve instances of different sizes on several multicore systems (2, 4, 8, 16, and 32 cores). As we will see, the results show that H-DEPT achieves good speedups and efficiencies. We also compare the predictions made by H-DEPT with those predicted by other biological tools demonstrating that it is also capable of performing quality predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21(1–2):51–80

    Google Scholar 

  2. Baldwin N, Collins R, Langston M, Symons C, Leuze M, Voy B (2004) High performance computational tools for motif discovery. In: Proceedings of 18th International parallel and distributed processing symposium (IPDPS’04) pp 1–8

  3. Blanco E, Farre D, Alba M, Messenguer X, Guigo R (2006) ABS: a database of annotated regulatory binding sites from orthologous promoters. Nucleic Acids Res 34:D63–D67

    Article  Google Scholar 

  4. Challa S, Thulasiraman P (2008) Protein sequence motif discovery on distributed supercomputer. Adv Grid Pervasive Comput LNCS 5036:232–243

    Article  Google Scholar 

  5. Chapman B, Jost R, van der Pas R (2007) Using OpenMP. MIT Press, Boston

    Google Scholar 

  6. Che D, Song Y, Rashedd K (2005) MDGA: motif discovery using a genetic algorithm. Proceedings of the 2005 Conference on genetic and evolutionary computation (GECCO’05), pp 447–452

  7. Chen C, Schmidt B, Weiguo L, Müller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recogn Bioinf LNCS 5265:448–459

    Article  Google Scholar 

  8. Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188–1190

    Article  Google Scholar 

  9. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  11. D’haeseleer P (2006) What are DNA sequence motifs? Nat Biotechnol 24(4):423–425

    Article  Google Scholar 

  12. Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucleic Acids Res 36(21):e142, 1–14

  13. Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res 32(13):3826–3835

    Article  Google Scholar 

  14. González-Álvarez DL, Vega-Rodríguez MA (2013) Parallelizing a hybrid multiobjective differential evolution for identifying cis-regulatory elements. In: Proceedings of the 20th European MPI Users’s Group Meeting (EuroMPI 2013), pp 223–228

  15. González-Álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2010) Solving the motif discovery problem by using differential evolution with pareto tournaments. In: IEEE Congress on Evolutionary Computation (CEC’10), pp 4140–4147

  16. González-Álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2012) Predicting DNA motifs by using evolutionary multiobjective optimization. IEEE Trans Syst Man Cybernet Part C: Appl Rev 42(6):913–925

    Article  Google Scholar 

  17. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, Edinburgh

    Google Scholar 

  18. Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Computer Appl Biosci 12(4):303–310

    Google Scholar 

  19. Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577

    Article  Google Scholar 

  20. Jensen S, Liu J (2004) BioOptimizer: a Bayesian scoring function approach to motif discovery. Bioinformatics 20:1557–1564

    Article  Google Scholar 

  21. Kaya M (2009) MOGAMOD: multi-objective genetic algorithm for motif discovery. Expert Syst Appl 36(2):1039–1047

    Article  Google Scholar 

  22. Kel A, Kel-Margoulis O, Farnham P, Bartley S, Wingender E, Zhang M (2001) Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol 309(1):99–120

    Article  Google Scholar 

  23. Klinge C (2001) Estrogen receptor interaction with estrogen response elements. Nucleic Acids Res 29(14):2905–2919

    Article  Google Scholar 

  24. Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Computer Syst Sci 65(1):73–96

    Article  MathSciNet  Google Scholar 

  25. Li L (2009) GADEM: a genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery. J Comput Biol 16(2):317–329

    Article  Google Scholar 

  26. Liu FFM, Tsai JJP, Chen RM, Chen SN, Shih SH (2004) FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on bioinformatics and bioengineering (BIBE’04), pp 459–466

  27. Liu J, Neuwald A, Lawrence C (1995) Bayesian models for multiple local sequence alignment and gibbs sampling strategies. J Am Stat Assoc 90(432):1156–1170

    Article  MATH  Google Scholar 

  28. Liu X, Brutlag D, Liu J (2001) Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pacific Symposium on biocomputing, pp 127–138

  29. Liu X, Brutlag D, Liu J (2002) An algorithm for finding protein-DNA interaction sites with applications to chromatin immunoprecipitation microarray experiments. Nat Biotechnol 20:835–839

    Article  Google Scholar 

  30. Liu Y, Schmidt B, Liu W, Maskell D (2010) CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recogn Lett 31(14):2170–2177

    Article  Google Scholar 

  31. Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: IEEE International Symposium on parallel and distributed processing Workshops and Phd Forum, pp 428–434

  32. Mak T, Lam K (2004) Embedded computation of maximum-likelihood phylogeny inference using platform FPGA. In: IEEE computational systems bioinformatics conference, pp 512–514

  33. Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632

    Article  Google Scholar 

  34. Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515–1524

    Article  Google Scholar 

  35. Oliver T, Schmidt B, Nathan D, Clemens R, Maskell D (2005) Using reconfigurable hardware to accelerate multiple sequence alignment with clustalw. Bioinformatics 21(16):3431–3432

    Article  Google Scholar 

  36. Price K, Storn R, Lampinen J (2006) Differential evolution: a practical approach to global optimization. Springer, Inc. Secaucus, New york, NJ, USA

  37. Qin J, Pinkenburg S, Rosenstiel W (2005) Parallel motif search using ParSEQ. In: Parallel and distributed computing and networks, pp 601–607

  38. Roth FP, Hughes JD, Estep PW, Church GM (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945

    Article  Google Scholar 

  39. Rubio-Largo A, Vega-Rodríguez MA, González-Álvarez DL (2013) Designing a fine-grained parallel differential evolution with pareto tournaments for solving an optical networking problem. In: Concurrency and computation: practice and experience (online available), pp 1–27

  40. Sandve G, Nedland M, Syrstad B, Eidsheim L, Abul O, Drabløs F (2006) Accelerating motif discovery: motif matching on parallel hardware. Alg Bioinf LNCS 4175:197–206

  41. Schröder J, Wienbrandt L, Pfeiffer G, Schimmler M (2008) Massively parallelized DNA motif search on the reconfigurable hardware platform COPACOBANA. In: Proceedings of the Third IAPR International Conference on pattern recognition in bioinformatics, pp 436–447

  42. Setubal J, Meidanis J (1997) Introduction to computational molecular biology. PWS Publishing Company, Boston

    Google Scholar 

  43. Shaw W, Burgin R, Howell P (1997) Performance standards and evaluations in ir test collections: cluster-based retrieval models. Inf Process Manag 33(1):1–14

    Article  Google Scholar 

  44. Stine M, Dasgupta D, Mukatira S (2003) Motif discovery in upstream sequences of coordinately expressed genes. In: The 2003 Congress on evolutionary computation (CEC’03), pp 1596–1603

  45. Stormo G (1988) Computer methods for analyzing sequence recognition of nucleic acids. Annu Rev Biophys Biophys Chem 17(1):241–263

    Article  Google Scholar 

  46. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  47. Sutou T, Tamura K, Mori Y, Kitakami H (2003) Design and implementation of parallel modified prefixspan method. Int Symp High Perform Comput 2858:412–422

    Article  Google Scholar 

  48. Wei Z, Jensen S (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584

    Article  Google Scholar 

  49. Weicker N, Szabo G, Weicker K, Widmayer P (2003) Evolutionay multiobjective optimization for base station transmitter placement with frequency assignment. IEEE Trans Evol Comput 7(2):189–203

    Article  Google Scholar 

  50. Yamaguchi Y, Miyajima Y, Maruyama T, Konagaya A (2002) High speed homology search using run-time reconfiguration. Field Progr Logic Appl: Reconfig Comput Going Mainstream LNCS 2438:671–687

    Google Scholar 

  51. Yu L, Xu Y (2009) A parallel gibbs sampling algorithm for motif finding on gpu. In: IEEE International Symposium on parallel and distributed processing with applications, pp 555–558

  52. Zaharie D (2009) Influence of crossover on the behavior of differential evolution algorithms. Appl Soft Comput 9(3):1126–1138

    Article  Google Scholar 

  53. Zare-Mirakabad F, Ahrabian H, Sadeghi M, Hashemifar S, Nowzari-Dalini A, Goli-aei B (2009) Genetic algorithm for dyad pattern finding in DNA sequences. Genes Genet Syst 84(1):81–93

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). David L. González-Álvarez and Álvaro Rubio-Largo are supported by the postdoc research grant ACCION-III-15 and ACCION-III-13 from the University of Extremadura.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David L. González-Álvarez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Álvarez, D.L., Vega-Rodríguez, M.A. & Rubio-Largo, Á. Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequences. J Supercomput 70, 880–905 (2014). https://doi.org/10.1007/s11227-014-1266-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1266-y

Keywords

Navigation