Skip to main content
Log in

Multiobjective optimization algorithms for motif discovery in DNA sequences

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Optimization techniques have become powerful tools for approaching multiple NP-hard optimization problems. In this kind of problem it is practically impossible to obtain optimal solutions, thus we must apply approximation strategies such as metaheuristics. In this paper, seven metaheuristics have been used to address an important biological problem known as the motif discovery problem. As it is defined as a multiobjective optimization problem, we have adapted the proposed algorithms to this optimization context. We evaluate the proposed metaheuristics on 54 sequence datasets that belong to four organisms with different numbers of sequences and sizes. The results have been analysed in order to discover which algorithm performs best in each case. The algorithms implemented and the results achieved can assist biological researchers in the complicated task of finding DNA patterns with an important biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. W. Ao, J. Gaudet, W.J. Kent, S. Muttumu, S.E. Mango, Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305(5691), 1743–1746 (2004)

    Article  Google Scholar 

  2. T.L. Bailey, C. Elkan, Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21(1–2), 51–80 (1995)

    Google Scholar 

  3. D. Che, Y. Song, K. Rashedd, MDGA: motif discovery using a genetic algorithm, in Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (GECCO’05), (2005), pp. 447–452.

  4. K. Deb, Multi-objective Optimization using Evolutionary Algorithms (Wiley, New York, 2001)

    MATH  Google Scholar 

  5. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  6. P. D’haeseleer, What are DNA sequence motifs? Nat. Biotechnol. 24(4), 423–425 (2006)

    Article  Google Scholar 

  7. E. Eskin, P.A. Pevzner, Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1), S354–S363 (2002)

    Article  Google Scholar 

  8. A.V. Favorov, M.S. Gelfand, A.V. Gerasimova, D.A. Ravcheev, A.A. Mironov, V.J. Makeev, A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21(10), 2240–2245 (2005)

    Article  Google Scholar 

  9. G.B. Fogel, D.G. Weekes, G. Varga, E.R. Dow, H.B. Harlow, J.E. Onyia, C. Su, Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res. 32(13), 3826–3835 (2004)

    Article  Google Scholar 

  10. M.C. Frith, U. Hansen, J.L. Spouge, Z. Weng, Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32(1), 189–200 (2004)

    Article  Google Scholar 

  11. F. Glover, G. Kochenberger, Handbook of Metaheuristics (Kluwer, Dordrecht, 2003)

    Book  MATH  Google Scholar 

  12. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, A multiobjective variable neighborhood search for solving the motif discovery problem, in International Workshop on Soft Computing Models in Industrial Applications (SOCO’10), vol. 73 (2010), pp. 39–46

  13. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, Solving the motif discovery problem by using differential evolution with pareto tournaments, in Proceedings of the 2010 IEEE Congress on Evolutionary Computation (CEC’10), (2010), pp. 4140–4147.

  14. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, Applying a multiobjective gravitational search algorithm (MO-GSA) to discover motifs, in International Work Conference on Artificial Neural Networks (IWANN’11), LNCS 6692/2011, (2011), pp. 372–379

  15. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, Finding motifs in DNA sequences applying a multiobjective artificial bee colony (MOABC) algorithm, in EVOBIO’11, LNCS 6623/2011, (2011), pp. 89–100

  16. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, Predicting DNA motifs by using evolutionary multiobjective optimization. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 913–925 (2011)

    Article  Google Scholar 

  17. D.L. González-Álvarez, M.A. Vega-Rodríguez, J.A. Gómez-Pulido, J.M. Sánchez-Pérez, Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery. Eng. Appl. Artif. Intell. 26(1), 341–326 (2012)

    Google Scholar 

  18. J. van Helden, B. Andre, J. Collado-Vides, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281(5), 827–842 (1998)

    Article  Google Scholar 

  19. G.Z. Hertz, G.D. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8), 563–577 (1999)

    Article  Google Scholar 

  20. J.H. Holland, Adaptation in Natural and Artificial Systems (University of Michigan Press, Ann Arbor, 1975)

    Google Scholar 

  21. D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Glob. Optim. 39(3), 459–471 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  22. M. Kaya, MOGAMOD: multi-objective genetic algorithm for motif discovery. Expert Syst. Appl. 36(2), 1039–1047 (2009)

    Article  Google Scholar 

  23. J. Kennedy, R. Eberhart, Particle swarm optimization, in Proceedings of IEEE International Conference on Neural Networks IV, (1995), pp. 1942–1948

  24. M. Li, B. Ma, L. Wang, Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002)

    Article  MathSciNet  Google Scholar 

  25. F.F.M. Liu, J.J.P. Tsai, R.M. Chen, S.N. Chen, S.H. Shih, FMGA: finding motifs by genetic algorithm, in Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), (2004) pp. 459–466

  26. M.A. Lones, A.M. Tyrrell, Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(3), 403–414 (2007)

    Article  Google Scholar 

  27. N. Mladenovic, P. Hansen, Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  28. C. Notredame, D.G. Higgins, SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24(8), 1515–1524 (1996)

    Article  Google Scholar 

  29. G. Pavesi, G. Mauri, G. Pesole, An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1), S207–S214 (2001)

    Article  Google Scholar 

  30. E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, GSA: a gravitational search algorithm. Inf. Sci. 179(13), 2232–2248 (2009)

    Article  MATH  Google Scholar 

  31. M. Regnier, A. Denise, Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6(2), 191–214 (2004)

    MATH  MathSciNet  Google Scholar 

  32. F.P. Roth, J.D. Hughes, P.W. Estep, G.M. Church, Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16(10), 939–945 (1998)

    Article  Google Scholar 

  33. L. Shao, Y. Chen, Bacterial foraging optimization algorithm integrating tabu search for motif discovery, in IEEE International Conference on Bioinformatics and Biomedicine (BIBM’09), (2009), pp. 415–418

  34. L. Shao, Y. Chen, A. Abraham, Motif discovery using evolutionary algorithms, in International Conference of Soft Computing and Pattern Recognition (SOCPAR’09), (2009), pp. 420–425

  35. D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 5th edn. (Chapman & Hall/CRC Press, New York, 2011)

    MATH  Google Scholar 

  36. S. Sinha, M. Tompa, YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31(13), 3586–3588 (2003)

    Article  Google Scholar 

  37. M. Stine, D. Dasgupta, S. Mukatira, Motif discovery in upstream sequences of coordinately expressed genes, in The 2003 Congress on Evolutionary Computation (CEC’03), vol. 3 (2003), pp. 1596–1603

  38. R. Storn, K. Price, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  39. E.G. Talbi, Metaheuristics: From Design to Implementation (Wiley, London, 2009)

    Book  Google Scholar 

  40. G. Thijs, M. Lescot, K. Marchal, S. Rombauts, B. De Moor, P. Rouzé, Y. Moreau, A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)

    Article  Google Scholar 

  41. M. Tompa, N. Li, T.L. Bailey, G.M. Church, B. De Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, V.J. Makeev, A.A. Mironov, W.S. Noble, G. Pavesi, G. Pesole, M. Régnier, N. Simonis, G. Sinha, S. Thijs, J. Van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, Z. Zhu, Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)

    Article  Google Scholar 

  42. Z. Wei, S.T. Jensen, GAME: detecting cis-regulatory elements using genetic algorithm. Bioinformatics 22(13), 1577–1584 (2006)

    Article  Google Scholar 

  43. E. Wingender, P. Dietze, H. Karas, R. Knuppel, TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24(1), 238–241 (1996)

    Article  Google Scholar 

  44. C.T. Workman, G.D. Stormo, ANN-Spec: a method for discovering transcription factor binding sites with improved specificity, in Pacific Symposium on Biocomputing, (2000), pp. 467–478

  45. X.S. Yang, Firefly algorithms for multimodal optimization, in 5th International Symposium of Stochastic Algorithms: Foundations and Applications (SAGA’09), LNCS 5792, (2009), pp. 169–178

  46. F. Zare-Mirakabad, H. Ahrabian, M. Sadeghi, S. Hashemifar, A. Nowzari-Dalini, B. Goliaei, Genetic algorithm for dyad pattern finding in DNA sequences. Genes Genet. Syst. 84(1), 81–93 (2009)

    Article  Google Scholar 

  47. E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)

    Article  Google Scholar 

  48. E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm. Technical Report tik-report 103 (Swiss Federal Institute of Technology, Zurich, Switzerland, 2001)

  49. E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). David L. González-Álvarez is supported by the postdoc research grant ACCION-III-15 from the University of Extremadura.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David L. González-Álvarez.

Additional information

Area Editor for Life Sciences: James Foster.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Álvarez, D.L., Vega-Rodríguez, M.A. & Rubio-Largo, Á. Multiobjective optimization algorithms for motif discovery in DNA sequences. Genet Program Evolvable Mach 16, 167–209 (2015). https://doi.org/10.1007/s10710-014-9232-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-014-9232-2

Keywords

Navigation