Abstract
DNA motif discovery means to find short similar sequence elements within a set of nucleotide sequences. It has become a compulsory need in bioinformatics for its useful applications such as compression, summarization, and clustering algorithms. Motif discovery is an NP-hard problem and exact algorithms cannot solve it in polynomial time. Many optimization algorithms were proposed to solve this problem. However, none of them can show its supremacy by overcoming all the obstacles. Chemical Reaction Optimization (CRO) is a population based metaheuristic algorithm that can easily fit for the optimization problem. Here, we have proposed an algorithm based on Chemical Reaction Optimization technique to solve the DNA motif discovery problem. The four basic operators of CRO have been redesigned for this problem to search the solution space locally as well as globally. Two additional operators (repair functions) have been proposed to improve the quality of the solutions. They have been applied to the final solution after the iteration stage of CRO to get a better one. Using the flexible mechanism of elementary operators of CRO along with the additional operators (repair functions), it is possible to determine motif more precisely. Our proposed method is compared with other traditional algorithms such as Gibbs sampler, AlignACE (Aligns Nucleic Acid Conserved Elements), MEME (Multiple Expectation Maximization for Motif Elicitation), and ACRI (Ant-Colony-Regulatory-Identification) by testing real-world datasets. The experimental results show that the proposed algorithm can give better results than other traditional algorithms in quality and in less running time. Besides, statistical tests have been performed to show the superiority of the proposed algorithm over other state-of-the-arts in this area.
Similar content being viewed by others
References
Douglas Harper. motif. (1848, n.d.) Dictionary.com Unabridged. In https://www.dictionary.com/browse/motif
El Haj Mohamed AS, Elloumi M, Thompson JD (2016) Motif discovery in protein sequences, pattern recognition—analysis and applications, S. Ramakrishnan, IntechOpen, 14th Dec 2016, https://doi.org/10.5772/65441. https://www.intechopen.com/books/pattern-recognition-analysis-and-applications/motif-discovery-in-protein-sequences
Zambelli F, Pesole G, Pavesi G (2012) Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 14(2):225–237
Wikipedia contributors. Position. Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 1 Jan. 2019. Web. 13 May. 2019
Fan Y, Wu W, Liu R, Yang W (2013) An iterative algorithm for motif discovery. Procedia Comput Sci 24:25–29. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2013.10.024. (http://www.sciencedirect.com/science/article/pii/S1877050913011666)
Huan HX et al (2015) An efficient ant colony algorithm for DNA motif finding. In: Knowledge and systems engineering. Springer, Cham, pp 589–601
Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632
Bailey TL et al (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(suppl2):W369–W373
Gutierrez JB, Frith M, Nakai K (2015) A genetic algorithm for motif finding based on statistical significance. In: International conference on bioinformatics and biomedical engineering. Springer, Cham
Che D, Song Y, Rasheed K (2005) MDGA: motif discovery using a genetic algorithm. In: Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM
Liu FFM et al (2004) FMGA: finding motifs by genetic algorithm. In: Proceedings. Fourth IEEE symposium on bioinformatics and bioengineering. IEEE
Al Daoud E (2013) Efficient DNA motif discovery using modified genetic algorithm. Int J Comput Intell Appl 12(03):1350017
Huo H, Zhao Z, Stojkovic V, Liu L (2010) Optimizing genetic algorithm for motif discovery. Math Comput Model 52(11–12): 2011–2020. ISSN 0895-7177 https://doi.org/10.1016/j.mcm.2010.06.003. (http://www.sciencedirect.com/science/article/pii/S0895717710002748)
Yang C-H, Liu Y-T, Chuang L-Y (2011) DNA motif discovery based on ant colony optimization and expectation maximization. In: Proceedings of the International multi conference of engineers and computer scientists. vol 1
Bouamama S, Boukerram A, Al-Badarneh AF (2010) Motif finding using ant colony optimization. In: International conference on swarm intelligence. Springer, Berlin
Liu W, Chen H, Chen L (2013) An ant colony optimization based algorithm for identifying gene regulatory elements. Comput Biol Med 43(7): 922–932. ISSN 0010-4825. https://doi.org/10.1016/j.compbiomed.2013.04.008. (http://www.sciencedirect.com/science/article/pii/S0010482513000978)
Claeys M et al (2012) MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics 28(14):1931–932
Liu X, Brutlag DL, Liu JS (2000) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Biocomputing 2001:127–138
Kirkpatrick S Jr, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Hu J, Li B, Kihara D (2005) Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res 33(15):4899–4913
Wingender E et al (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24(1):238–241
Lam AYS, Li VOK (2012) Chemical reaction optimization: a tutorial. Memet Comput 4(1):3–17
Islam MR, Khaled Saifullah CM (2019) Mahmud MR (2019) Chemical reaction optimization: survey on variants. Evolut Intell 12(3):395–420
Lam AYS, Li VOK, Xu J (2012) On the convergence of chemical reaction optimization for combinatorial optimization. IEEE Trans Evolut Comput 17(5):605–620
Chaabani A, Bechikh S, Said LB (2018) A new co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization. Appl Intell 48(9):2847–2872
Khaled Saifullah CM, Md Rafiqul I (2016) Chemical reaction optimization for solving shortest common supersequence problem. Comput Biol Chem 64:82–93
Islam MR et al (2018) Chemical reaction optimization for solving longest common subsequence problem for multiple string. Soft Comput. https://doi.org/10.1007/s00500-018-3200-3
Rayhanul K, Rafiqul I (2019) Chemical reaction optimization for RNA structure prediction. Appl Intell 49(2):352–375
Rafiqul Islam M, Mahmud R, Pritom RM (2019) Transportation scheduling optimization by a ollaborative strategy in supply chain management with TPL using chemical reaction. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04218-5
Lam AYS, Li VOK (2009) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evolut Comput 14(3):381–399
Islam MR, Islam MS, Sakeef N (2019) RNA Secondary Structure Prediction with Pseudoknots using chemical reaction optimization algorithm. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2936570
Islam MR et al (2019) Optimization of protein folding using chemical reaction optimization in HP cubic lattice model. Neural Comput Appl 32:3117–3134
Blekas K, Fotiadis DI, Likas A (2003) Greedy mixture learning for multiple motif discovery in biological sequences. Bioinformatics 19(5):607–617
Attwood TK et al (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res 28(1):225–227
Hofmann K, Bucher P, Falquet L, Bairoch A (1999) The PROSITE database, its status in 1999. Nucleic Acids Res 27(1):215–219. https://doi.org/10.1093/nar/27.1.215
Stormo GD, Hartzell GW (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci 86(4):1183–1187
Harbison CT et al (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004):99
Roth FP et al (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939
Shao L, Chen Y, Abraham A (2009) Motif discovery using evolutionary algorithms. In: 2009 international conference of soft computing and pattern recognition. IEEE 2009
Zhu J, Zhang MQ (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics (Oxford, England) 15(7):607–611
Sun J, Zhang Q, Tsang EPK (2005) DE/EDA: a new evolutionary algorithm for global optimization. Inf Sci 169(3–4):249–262
Wolfger H et al (1997) The yeast ATP binding cassette (ABC) protein genes PDR10 and PDR15 are novel targets for the Pdr1 and Pdr3 transcriptional regulators. FEBS Lett 418(3):269–274
Chan T-M, Leung K-S, Lee K-H (2007) TFBS identification by position-and consensus-led genetic algorithm with local filtering. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM
Bryne JC et al (2007) JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 36(suppl1):D102–D106
Tompa M et al (2005) (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saha, S.K., Islam, M.R. & Hasan, M. DNA motif discovery using chemical reaction optimization. Evol. Intel. 14, 1707–1726 (2021). https://doi.org/10.1007/s12065-020-00444-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00444-2