A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences

González-Álvarez, David L.; Vega-Rodríguez, Miguel A.; Rubio-Largo, Álvaro

doi:10.1007/s11227-016-1916-3

A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences

Published: 17 November 2016

Volume 73, pages 2285–2312, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

David L. González-Álvarez¹,
Miguel A. Vega-Rodríguez¹ &
Álvaro Rubio-Largo²

411 Accesses
5 Citations
Explore all metrics

Abstract

Since the late 1970s, when the first DNA-based genome was sequenced, the field of biology is experiencing a significant growth in the amount of data that needs to be processed. Long ago it became impractical to analyze all this information manually, resulting in a great need for new techniques, algorithms and strategies to facilitate this work. Within the vast world of bioinformatics, we will focus on proteomics, more specifically, on the discovery of small repeated common patterns on sets of protein sequences that may represent some biological functionality. When we analyze a large number of sequences, the problem shows non-deterministic polynomial times, it implies that we could benefit from the combination of high-performance computing and computational intelligence techniques. In this paper, we address the discovery of repeated common patterns as a multiobjective optimization problem by means of a hybrid MPI/OpenMP approach which parallelizes a well-known multiobjective metaheuristic, the fast non-dominated sorting genetic algorithm (NSGA-II). Our main objective is to combine the benefits of shared-memory and distributed-memory programming paradigms to discover patterns in an accurate and efficient manner. Experiments conducted on six different datasets, comparisons with other well-known biological tools, and the obtained speed-ups and efficiencies show that our approach is able to achieve a significant performance in terms of parallel and biological results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

Article 06 January 2020

Parallelization of Protein Clustering Algorithm Using OpenMP

Using mixed mode programming to parallelize an indicator-based evolutionary algorithm for inferring multiobjective phylogenetic histories

Article 14 June 2016

References

Adhianto L, Chapman B (2007) Performance modeling of communication and computation in hybrid MPI and OpenMP applications. Simul Model Pract Theory 15(4):481–491
Article Google Scholar
Anderson NL, Anderson NG (1998) Proteome and proteomics: new technologies, new concepts, and new words. Electrophoresis 19(11):1853–1861
Article Google Scholar
Bailey TL, Bodn M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Nucl Acids Res 37(2):W202–W208
Article Google Scholar
Bork P, Koonin EV (1996) Protein sequence motifs. Curr Opin Struct Biol 6(3):366–376
Article Google Scholar
Chan TK, Leung KS, Lee KH (2008) TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3):341–349
Article Google Scholar
Chan TK, Li G, Leung KS, Lee KH (2009) Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinform 10:321
Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge ISBN: 978-0262533027
Google Scholar
Che D, Song Y, Rashedd K (2005) MDGA: Motif discovery using a genetic algorithm. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (GECCO’05), pp 447–452
Chen C, Schmidt B, Weiguo L, Mller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recognit Bioinform LNCS 5265:448–459
Article Google Scholar
Coello Coello, CA, Lamont GB, Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems., 2nd edn. Springer-Verlag, New York ISBN: 978-0-387-33254-3
MATH Google Scholar
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14:1188–1190
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from incomplete data via the EM algorithm (with Discussion). J R Stat Soc Ser B 39(1):1–38
MATH Google Scholar
Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1):S354–S363
Article Google Scholar
Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ (2005) A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21(10):2240–2245
Article Google Scholar
Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucl Acids Res 36(21):e142, 1–14
Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucl Acids Res 32(13):3826–3835
Article Google Scholar
Frith MC, Hansen U, Spouge JL, Weng Z (2004) Finding functional sequence elements by multiple local alignment. Nucl Acids Res 32(1):189–200
Article Google Scholar
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, Edinburgh
MATH Google Scholar
Gropp W, Lusk W, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, 2nd edn. The MIT Press, Cambridge ISBN: 0-262-57132-3
MATH Google Scholar
Grundy WN, Bailey TL, Elkan CP (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Comput Appl Biosci 12(4):303–310
Google Scholar
Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577
Article Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214
Article Google Scholar
James P (1997) Protein identification in the post-genome era: the rapid rise of proteomics. Q Rev Biophys 30(4):279–331
Article Google Scholar
Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Comput Syst Sci 65(1):73–96
Article MathSciNet MATH Google Scholar
Liu FFM, Tsai JJP, Chen RM, Chen SN, Shih SH (2004) FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), pp 459–466
Liu Y, Schmidt B, Liu W, Maskell DL (2010) CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit Lett 31(14):2170–2177
Article Google Scholar
Liu Y, Schmidt B, Maskell DL (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 428–434
Lones MA, Tyrrell AM (2007) Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans Comput Biol Bioinform 4(3):403–414
Article Google Scholar
Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24(8):1515–1524
Article Google Scholar
Pavesi G, Mereghetti P, Zambelli F, Stefani M, Mauri G, Pesole G (2006) MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes. Nucl Acids Res 34:W566–W570
Article Google Scholar
Qin J, Pinkenburg S, Rosenstiel W (2005) Parallel motif search using ParSEQ. In: IASTED International Conference on Parallel and Distributed Computing and Networks, pp 601–607
Regnier M, Denise A (2004) Rare events and conditional events on random strings. Discret Math Theoret Comput Sci 6(2):191–214
MathSciNet MATH Google Scholar
Sandve GK, Nedland M, Syrstad OB, Eidsheim LA, Abul O, Drablos F (2006) Accelerating motif discovery: Motif matching on parallel hardware. Algorithms Bioinform LNCS 4175:197–206
Article Google Scholar
Schröder J, Wienbrandt L, Pfeiffer G, Schimmler M (2008) Massively parallelized DNA motif search on the reconfigurable hardware platform COPACOBANA. In: Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics, pp 436–447
Shao L, Chen Y (2009) Bacterial foraging optimization algorithm integrating tabu search for motif discovery. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM’09), pp 415–418
Shao L, Chen Y, Abraham A (2009) Motif discovery using evolutionary algorithms. In: International Conference of Soft Computing and Pattern Recognition (SOCPAR’09), pp 420–425
Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L., Xenarios I (2012) New and continuing developments at PROSITE. Nucl Acids Res 41(Database issue): D344–D347
Sinha S, Tompa M (2003) YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucl Acids Res 31(13):3586–3588
Article Google Scholar
Stine M, Dasgupta D, Mukatira S (2003) Motif discovery in upstream sequences of coordinately expressed genes. 2003 Congress Evol Comput (CEC’03) 3:1596–1603
Sutou T, Tamura K, Mori Y, Kitakami H (2003) Design and implementation of parallel modified prefixspan method. Int Sympos High Perform Comput 2858:412–422
Article Google Scholar
Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouzé P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122
Article Google Scholar
Thompson WA, Newberg LA, Conlan S, McCue LA, Lawrence CE (2007) The gibbs centroid sampler. Nucl Acids Res 35(Web Server issue):W232–W237
van Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mole Biol 281(5):827–842
Article Google Scholar
Wei Z, Jensen S (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584
Article Google Scholar
Workman CT, Stormo GD (2000) ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. In: Pacifc symposium on biocomputing, pp 467–478
Yang JY, Yang MQ, Zhu M, Arabnia HR, Deng Y (2008) Promoting synergistic research and education in genomics and bioinformatics. BMC Genom 9(Suppl 1):I1
Article Google Scholar
Yang JY, Yang MQ, Arabnia HR, Deng Y (2008) Review: genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research. BMC Genom 9(Suppl 2):I1
Article Google Scholar
Yang MQ, Athey BD, Arabnia HR, Sung AH, Liu Q, Yang JY, Mao J, Deng Y (2009) High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genom 10(1)
Yu L, Xu Y (2009) A parallel gibbs sampling algorithm for motif finding on gpu. In: IEEE International Symposium on Parallel and Distributed Processing with Applications, pp 555–558
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271
Article Google Scholar

Download references

Acknowledgements

This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2016-76259-P (PROTEIN project). Álvaro Rubio-Largo is also supported by the post-doctoral fellowship SFRH/BPD/100872/2014 granted by Fundação para a Ciência e a Tecnologia (FCT), Portugal.

Author information

Authors and Affiliations

Department of Technologies of Computers and Communications, University of Extremadura, Cáceres, Spain
David L. González-Álvarez & Miguel A. Vega-Rodríguez
NOVA Information Management School, University Nova of Lisbon, Lisbon, Portugal
Álvaro Rubio-Largo

Authors

David L. González-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Miguel A. Vega-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro Rubio-Largo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David L. González-Álvarez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González-Álvarez, D.L., Vega-Rodríguez, M.A. & Rubio-Largo, Á. A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences. J Supercomput 73, 2285–2312 (2017). https://doi.org/10.1007/s11227-016-1916-3

Download citation

Published: 17 November 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11227-016-1916-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences

Abstract

Access this article

Similar content being viewed by others

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

Parallelization of Protein Clustering Algorithm Using OpenMP

Using mixed mode programming to parallelize an indicator-based evolutionary algorithm for inferring multiobjective phylogenetic histories

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences

Abstract

Access this article

Similar content being viewed by others

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

Parallelization of Protein Clustering Algorithm Using OpenMP

Using mixed mode programming to parallelize an indicator-based evolutionary algorithm for inferring multiobjective phylogenetic histories

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation