Abstract
Transcriptional regulation is the main regulation of gene expression, the process by which all prokaryotic organisms and eukaryotic cells transform the information encoded by the nucleic acids (DNA) into the proteins required for their operation and development. A crucial component in genetic regulation is the bindings between transcription factors and DNA sequences that regulate the expression of genes. These specific locations are short and share a common sequence of nucleotides. The discovery of these small DNA strings, also known as motifs, is labor intensive and therefore the use of high-performance computing can be a good way to address it. In this work, we present a parallel multiobjective evolutionary algorithm, a novel hybrid technique based on differential evolution with Pareto tournaments (H-DEPT). To study whether this algorithm is suitable to be parallelized, H-DEPT has been used to solve instances of different sizes on several multicore systems (2, 4, 8, 16, and 32 cores). As we will see, the results show that H-DEPT achieves good speedups and efficiencies. We also compare the predictions made by H-DEPT with those predicted by other biological tools demonstrating that it is also capable of performing quality predictions.
Similar content being viewed by others
References
Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21(1–2):51–80
Baldwin N, Collins R, Langston M, Symons C, Leuze M, Voy B (2004) High performance computational tools for motif discovery. In: Proceedings of 18th International parallel and distributed processing symposium (IPDPS’04) pp 1–8
Blanco E, Farre D, Alba M, Messenguer X, Guigo R (2006) ABS: a database of annotated regulatory binding sites from orthologous promoters. Nucleic Acids Res 34:D63–D67
Challa S, Thulasiraman P (2008) Protein sequence motif discovery on distributed supercomputer. Adv Grid Pervasive Comput LNCS 5036:232–243
Chapman B, Jost R, van der Pas R (2007) Using OpenMP. MIT Press, Boston
Che D, Song Y, Rashedd K (2005) MDGA: motif discovery using a genetic algorithm. Proceedings of the 2005 Conference on genetic and evolutionary computation (GECCO’05), pp 447–452
Chen C, Schmidt B, Weiguo L, Müller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recogn Bioinf LNCS 5265:448–459
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188–1190
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
D’haeseleer P (2006) What are DNA sequence motifs? Nat Biotechnol 24(4):423–425
Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucleic Acids Res 36(21):e142, 1–14
Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res 32(13):3826–3835
González-Álvarez DL, Vega-Rodríguez MA (2013) Parallelizing a hybrid multiobjective differential evolution for identifying cis-regulatory elements. In: Proceedings of the 20th European MPI Users’s Group Meeting (EuroMPI 2013), pp 223–228
González-Álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2010) Solving the motif discovery problem by using differential evolution with pareto tournaments. In: IEEE Congress on Evolutionary Computation (CEC’10), pp 4140–4147
González-Álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2012) Predicting DNA motifs by using evolutionary multiobjective optimization. IEEE Trans Syst Man Cybernet Part C: Appl Rev 42(6):913–925
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, Edinburgh
Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Computer Appl Biosci 12(4):303–310
Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577
Jensen S, Liu J (2004) BioOptimizer: a Bayesian scoring function approach to motif discovery. Bioinformatics 20:1557–1564
Kaya M (2009) MOGAMOD: multi-objective genetic algorithm for motif discovery. Expert Syst Appl 36(2):1039–1047
Kel A, Kel-Margoulis O, Farnham P, Bartley S, Wingender E, Zhang M (2001) Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol 309(1):99–120
Klinge C (2001) Estrogen receptor interaction with estrogen response elements. Nucleic Acids Res 29(14):2905–2919
Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Computer Syst Sci 65(1):73–96
Li L (2009) GADEM: a genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery. J Comput Biol 16(2):317–329
Liu FFM, Tsai JJP, Chen RM, Chen SN, Shih SH (2004) FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on bioinformatics and bioengineering (BIBE’04), pp 459–466
Liu J, Neuwald A, Lawrence C (1995) Bayesian models for multiple local sequence alignment and gibbs sampling strategies. J Am Stat Assoc 90(432):1156–1170
Liu X, Brutlag D, Liu J (2001) Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pacific Symposium on biocomputing, pp 127–138
Liu X, Brutlag D, Liu J (2002) An algorithm for finding protein-DNA interaction sites with applications to chromatin immunoprecipitation microarray experiments. Nat Biotechnol 20:835–839
Liu Y, Schmidt B, Liu W, Maskell D (2010) CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recogn Lett 31(14):2170–2177
Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: IEEE International Symposium on parallel and distributed processing Workshops and Phd Forum, pp 428–434
Mak T, Lam K (2004) Embedded computation of maximum-likelihood phylogeny inference using platform FPGA. In: IEEE computational systems bioinformatics conference, pp 512–514
Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632
Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515–1524
Oliver T, Schmidt B, Nathan D, Clemens R, Maskell D (2005) Using reconfigurable hardware to accelerate multiple sequence alignment with clustalw. Bioinformatics 21(16):3431–3432
Price K, Storn R, Lampinen J (2006) Differential evolution: a practical approach to global optimization. Springer, Inc. Secaucus, New york, NJ, USA
Qin J, Pinkenburg S, Rosenstiel W (2005) Parallel motif search using ParSEQ. In: Parallel and distributed computing and networks, pp 601–607
Roth FP, Hughes JD, Estep PW, Church GM (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945
Rubio-Largo A, Vega-Rodríguez MA, González-Álvarez DL (2013) Designing a fine-grained parallel differential evolution with pareto tournaments for solving an optical networking problem. In: Concurrency and computation: practice and experience (online available), pp 1–27
Sandve G, Nedland M, Syrstad B, Eidsheim L, Abul O, Drabløs F (2006) Accelerating motif discovery: motif matching on parallel hardware. Alg Bioinf LNCS 4175:197–206
Schröder J, Wienbrandt L, Pfeiffer G, Schimmler M (2008) Massively parallelized DNA motif search on the reconfigurable hardware platform COPACOBANA. In: Proceedings of the Third IAPR International Conference on pattern recognition in bioinformatics, pp 436–447
Setubal J, Meidanis J (1997) Introduction to computational molecular biology. PWS Publishing Company, Boston
Shaw W, Burgin R, Howell P (1997) Performance standards and evaluations in ir test collections: cluster-based retrieval models. Inf Process Manag 33(1):1–14
Stine M, Dasgupta D, Mukatira S (2003) Motif discovery in upstream sequences of coordinately expressed genes. In: The 2003 Congress on evolutionary computation (CEC’03), pp 1596–1603
Stormo G (1988) Computer methods for analyzing sequence recognition of nucleic acids. Annu Rev Biophys Biophys Chem 17(1):241–263
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Sutou T, Tamura K, Mori Y, Kitakami H (2003) Design and implementation of parallel modified prefixspan method. Int Symp High Perform Comput 2858:412–422
Wei Z, Jensen S (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584
Weicker N, Szabo G, Weicker K, Widmayer P (2003) Evolutionay multiobjective optimization for base station transmitter placement with frequency assignment. IEEE Trans Evol Comput 7(2):189–203
Yamaguchi Y, Miyajima Y, Maruyama T, Konagaya A (2002) High speed homology search using run-time reconfiguration. Field Progr Logic Appl: Reconfig Comput Going Mainstream LNCS 2438:671–687
Yu L, Xu Y (2009) A parallel gibbs sampling algorithm for motif finding on gpu. In: IEEE International Symposium on parallel and distributed processing with applications, pp 555–558
Zaharie D (2009) Influence of crossover on the behavior of differential evolution algorithms. Appl Soft Comput 9(3):1126–1138
Zare-Mirakabad F, Ahrabian H, Sadeghi M, Hashemifar S, Nowzari-Dalini A, Goli-aei B (2009) Genetic algorithm for dyad pattern finding in DNA sequences. Genes Genet Syst 84(1):81–93
Acknowledgments
This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). David L. González-Álvarez and Álvaro Rubio-Largo are supported by the postdoc research grant ACCION-III-15 and ACCION-III-13 from the University of Extremadura.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
González-Álvarez, D.L., Vega-Rodríguez, M.A. & Rubio-Largo, Á. Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequences. J Supercomput 70, 880–905 (2014). https://doi.org/10.1007/s11227-014-1266-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1266-y