Skip to main content

Towards a Better Understanding of Heuristic Approaches Applied to the Biological Motif Discovery

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

The detection of transcription factor binding sites (TFBS) play a important role inside bioinformatics challenges. Its correct identification in the promoter regions of co-expressed genes is a crucial step for understanding gene expression mechanisms and creating new drugs and vaccines. The problem of finding motifs consists of looking for conserved patterns in biological datasets of sequences through the use of unsupervised learning algorithms. For that reason, it is considered one of the classic problems of computational biology, which in its simplest formulation has been proven to be NP-HARD. Moreover, heuristic and meta-heuristic algorithms have been shown to be very promising in solving combinatorial problems with very large search spaces. In this work, we propose an evaluation of different heuristics and meta-heuristics approaches in order to measure its performance: Variable Neighborhood Search (VNS), Expectation Maximization (EM) and Iterated Local Search (ILS). For each of them, two sets of experiments were carried out: In the first, the heuristics were performed alone and in the second, a constructive procedure was introduced with respect to improve the quality of initial solutions. Finally, the metrics were compared with the state-of-art MEME algorithm, which is very used in biological motif discovery. The results obtained suggest that the heuristics are more efficient when used together and also, a constructive procedure was very promising, managing to improve the performance metrics of the evaluated heuristics in most experiments. Also, the combination between a constructive procedure and EM proved to be quite competitive, managing to outperform the MEME algorithm in several datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Text segment of size w.

References

  1. Ashraf, F.B., Shafi, M.S.R.: MFEA: an evolutionary approach for motif finding in DNA sequences. Inf. Med. Unlocked 21 (2020)

    Google Scholar 

  2. Bailey, T.L.: Streme: accurate and versatile sequence motif discovery. Bioinformatics 37(18), 2834–2840 (2021)

    Article  Google Scholar 

  3. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21(1–2), 51–80 (1995)

    Article  Google Scholar 

  4. Bailey, T.L., Johnson, J., Grant, C.E., Noble, W.S.: The meme suite. Nucleic Acids Res. 43(W1), W39–W49 (2015)

    Article  Google Scholar 

  5. D’haeseleer, P.: How does DNA sequence motif discovery work? Nature Biotechnol. 24(8), 959–961 (2006)

    Google Scholar 

  6. D’haeseleer, P.: What are DNA sequence motifs? Nature Biotechnol. 24(4), 423–425 (2006)

    Google Scholar 

  7. Feo, T.A., Resende, M.G.: Greedy randomized adaptive search procedures. J. Global Optimiz. 6(2), 109–133 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hart, J.P., Shogan, A.W.: Semi-greedy heuristics: an empirical study. Oper. Res. Lett. 6(3), 107–114 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  9. He, Y., Shen, Z., Zhang, Q., Wang, S., Huang, D.S.: A survey on deep learning in DNA/RNA motif mining. Brief. Bioinf. 22(4), bbaa229 (2021)

    Google Scholar 

  10. Lee, N.K., Li, X., Wang, D.: A comprehensive survey on genetic algorithms for DNA motif prediction. Inf. Sci. 466, 25–43 (2018)

    Article  MathSciNet  Google Scholar 

  11. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of The Thirty-first Annual ACM Symposium on Theory of Computing, pp. 473–482. ACM (1999)

    Google Scholar 

  12. Lihu, A., Holban, Ş.: A review of ensemble methods for de novo motif discovery in chip-seq data. Briefings in bioinformatics p. bbv022 (2015)

    Google Scholar 

  13. Liu, F.F., Tsai, J.J., Chen, R.M., Chen, S., Shih, S.: FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. Proceedings, pp. 459–466. IEEE (2004)

    Google Scholar 

  14. Lourenço, H.R., Martin, O.C., Stützle, T.: Iterated local search: framework and applications. In: Gendreau, M., Potvin, J.-Y. (eds.) Handbook of Metaheuristics. ISORMS, vol. 272, pp. 129–168. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91086-4_5

  15. Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  16. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(suppl 1), S207–S214 (2001)

    Article  Google Scholar 

  17. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic acids Res. 32(suppl 1), D91–D94 (2004)

    Article  Google Scholar 

  18. Sandve, G.K., Drabløs, F.: A survey of motif discovery methods in an integrated framework. Biol. Direct 1(1), 11 (2006)

    Google Scholar 

  19. Stormo, G.D., Hartzell, G.W.: Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. 86(4), 1183–1187 (1989)

    Article  Google Scholar 

  20. Stützle, T.: Local search algorithms for combinatorial problems. Darmstadt University of Technology PhD Thesis, p. 20 (1998)

    Google Scholar 

  21. Talbi, E.G.: A taxonomy of hybrid metaheuristics. J. Heurist. 8(5), 541–564 (2002)

    Article  Google Scholar 

  22. Thijs, G., et al.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)

    Google Scholar 

  23. Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jader M. Caldonazzo Garbelini .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 185 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garbelini, J.M.C., Sanches, D.S., Pozo, A.T.R. (2022). Towards a Better Understanding of Heuristic Approaches Applied to the Biological Motif Discovery. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21686-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21685-5

  • Online ISBN: 978-3-031-21686-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics