Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery
Introduction
Nowadays, we can easily find many optimization problems that require a huge computational time. There are even problems that can not be solved optimally with the existing computers. Such problems are called NP-hard problems. Currently, all known algorithms for solving NP-hard problems require an exponential time with respect to the input size. It is unknown if there will be so fast algorithms, therefore, to solve an NP-hard problem of an arbitrary size it is common to use techniques such as metaheuristics (Glover and Kochenberger, 2003). In computer science, we can define a metaheuristics as a problem optimization method that applies an iterative process to improve the quality of possible solutions, taking into account a given fitness function. Also, we can easily adapt metaheuristics to several problems. These techniques do not guarantee finding the optimal solution, they find quasi-optimal solutions in a reasonable time. Within the vast world of metaheuristics it is defined the concept of swarm intelligence. This concept is taking a lot of strength in recent years. Swarm intelligence is the discipline that deals with systems composed of a set of decentralized and self-organized individuals, and which are normally based on natural phenomenons. In particular, this discipline takes advantage of the collective behavior of individuals who relate to each other and the environment. These algorithms could be divided into two groups: those based on the animal behaviors and those based on physics or nature behaviors. In recent years, many algorithms based on these collective behaviors are being successfully applied in different problems of several fields, due to this, we have decided to analyze the behavior of swarm intelligence algorithms in this work, comparing two novel algorithms such as the Artificial Bee Colony (ABC) algorithm (Karaboga, 2005), which is an optimization algorithm based on the intelligent foraging behavior of honey bee swarm; and the Gravitational Search Algorithm (GSA) (Rashedi et al., 2009), a new optimization algorithm based on the law of gravity and mass interactions. In this way, we can compare an algorithm from each group: one based on the animal behaviors (ABC), and other based on physics and nature behaviors (GSA).
The main objective of this work is to analyze which kind of swarm intelligence algorithm is able to solve better the Motif Discovery Problem (MDP). MDP is an NP-hard optimization problem as defined in literature (Maier, 1978, Rivière et al., 2008), applied to the specific task of discovering novel Transcription Factor Binding Sites (TFBS) in DNA sequences (D'haeseleer, 2006). Predicting common patterns, motifs, is one of the most important sequence analysis problem, and it has not yet been resolved in an efficient manner. In addition, in this work we have expanded the formulation of this problem with several constraints to adapt the solution search to the real biology. In biology, finding and decoding the true meaning of these DNA patterns can help us to explain the complexity and development of living organisms. MDP maximizes three conflicting objectives: motif length, support, and similarity. Due to this, we have to use multiobjective techniques for its resolution. Moreover, we have to adapt the operation of our algorithms to the multiobjective context. Therefore, the swarm intelligence based algorithms compared in this paper are adapted to this context: the Multiobjective ABC (MOABC) algorithm (González-Álvarez et al., 2011a), and the Multiobjective GSA (MO-GSA) (González-Álvarez et al., 2011b), based on the single-objective ABC and GSA algorithms, respectively. For the results presentation, we have used typical multiobjective indicators such as hypervolume (Zitzler and Thiele, 1999) or coverage relation (Zitzler et al., 2000), and thus, we facilitate future comparisons. We also want to emphasize that to ensure that the solutions found by our proposals are biologically relevant, we have made several analysis by using biological indicators such as Sensitivity, the Positive Predictive Value, the Performance Coefficient, and the Correlation Coefficient. On whole, our main objectives in this work are: compare novel swarm intelligence based techniques to solve a well-known bioinformatics problem that still has not been resolved in an efficient manner, incorporate new rules and constraints to adapt the MDP to the real biological world, obtaining good and relevant results.
In the remainder of the paper, we briefly mention a number of existing works dedicated to the motif discovery in Section 2. Thereafter, in Section 3, we describe the MDP in detail. Section 4 presents the metaheuristics compared, explaining their performances. In Section 5, we include the experimental methodology, the instances used, and the best configuration of each algorithm. In Section 6, we analyze the behavior of our proposals, making comparisons between them and with two standard multiobjective evolutionary algorithms. We also compare the two proposed swarm-based algorithms with other previously proposed metaheuristics in Section 7. Section 8 compares the motifs discovered by our algorithms with those predicted by other 14 well-known biological methods. Finally, we outline the conclusions of this paper.
Section snippets
Related work
In this section we present some of the research literature related to the MDP. First, we will describe some of the latest research that apply evolutionary computation to discover motifs in DNA sequences. Next, we will organize and analyze the biological methods most commonly used to solve this problem.
There are many proposals based on evolutionary techniques for finding DNA motifs, an example is the algorithm FMGA (Liu et al., 2004), a genetic algorithm based on the SAGA operators (Notredame
Motif discovery problem
Gene expression is the process by which a gene is transcribed to form an RNA sequence. Then this sequence is used to produce the corresponding protein sequence. This process starts when a macro-molecule called Transcription Factor (TF) has been bounded to a short subsequence in the promoter region of the gene, called TFBS (Zare-Mirakabad et al., 2009). Finding TFBSs in DNA sequences (problem known as MDP) is important for uncovering the underlying regulatory relationship and understanding the
Description of the algorithms
In this section we detail the representation of the individuals, and we include a brief description of the algorithms compared in this work.
Methodology
In this section we explain the methodology followed to configure each algorithm, we detail the data sets used in our experiments, and we show the results obtained by our algorithms. In the following sections we will compare the results obtained by our algorithms with those obtained by several standard multiobjective algorithms, and with those obtained by other 14 well-known biological methods. We will also include a powerful statistical analysis of the results to ensure their statistical
Algorithm comparisons
As mentioned above, we have structured the comparisons into three groups, in the first one, we use “Generic” instances, the second one uses “Real” instances, and the last one applies “Markov” instances. Now we proceed to analyze the behavior of the algorithms with “Generic” instances. The first analysis was made by using the hypervolume indicator. The hypervolume indicator (While et al., 2006) is a measure of quality of a set of n nondominated objective vectors produced in
Comparison with previous works
In this section we compare the results obtained by our two swarm-based algorithms with those achieved by our previously proposed metaheuristics. The first techniques that we applied to solve the MDP as a multiobjective optimization problem were Differential Evolution with Pareto Tournaments (DEPT, González-Álvarez et al., 2010a) and Multiobjective Variable Neighbourhood Search (MO-VNS, González-Álvarez et al., 2010b). At present, after better understanding the biological aspects of the
Comparison with other biological methods
In this section we analyze the motifs obtained by our two proposals, MOABC and MO-GSA algorithms. To that end we have compared the best motifs (nondominated solutions) discovered by both heuristics with the best solutions predicted by 14 well-known biological methods in motif discovery. Thus, we demonstrate that the motifs predicted by MOABC and MO-GSA have an important biological relevance. The biological methods compared in this section are AlignACE (Roth et al., 1998), ANN_Spec (Workman and
Conclusions and future work
In this paper, we have compared two novel swarm intelligence based algorithms: ABC and GSA to solve the MDP. Moreover we have adapted these algorithms to the multiobjective context, resulting in two new algorithms named MOABC and MO-GSA. This work differs from previous approaches to MDP, because our new constraints focuses on real-world aspects of biology, e.g., the motif complexity concept. In this work, we have also combined computational and biological aspects, demonstrating through several
Acknowledgments
This work was partially funded by the Spanish Ministry of Science and Innovation and ERDF (the European Regional Development Fund), under the contract TIN2008-06491-C04-04 (the M⁎ project). Thanks also to the Fundación Valhondo, for the economic support offered to David L. González-Álvarez to make this research.
References (49)
MOGAMOD: Multi-objective genetic algorithm for motif discovery
Expert Syst. Appl.
(2009)- et al.
Variable neighborhood search
Comput. Oper. Res.
(1997) - et al.
GSA: a gravitational search algorithm
Inf. Sci.
(2009) - et al.
Shuffling biological sequences with motif constraints
J. Discrete Algorithms
(2008) - et al.
Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies
J. Mol. Biol.
(1998) - et al.
Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR
Science
(2004) - et al.
Unsupervised learning of multiple motifs in biopolymers using expectation maximization
Mach. Learn.
(1995) - Che, Y., Song, D., Rashedd, K., 2005. MDGA: Motif discovery using a genetic algorithm. In: Proceedings of the 2005...
Multi-objective Optimization Using Evolutionary Algorithms
(2001)- et al.
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Trans. Evolut. Comput.
(2002)
What are DNA sequence motifs?
Nat. Biotechnol.
Finding composite regulatory patterns in DNA sequences
Bioinformatics
A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length
Bioinformatics
Discovery of sequence motifs related to coexpression of genes using evolutionary computation
Nucl. Acids Res.
Evolutionary computation for discovery of composite transcription factor binding sites
Nucl. Acids Res.
Finding functional sequence elements by multiple local alignment
Nucl. Acids Res.
Handbook of Metaheuristics
Predicting DNA motifs by using evolutionary multiobjective optimization
IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences
Bioinformatics
Identification of consensus patterns in unaligned DNA sequences known to be functionally related
Comput. Appl. Biosci.
Cited by (19)
IRMAC: Interpretable Refined Motifs in Binary Classification for smart grid applications
2023, Engineering Applications of Artificial IntelligenceCitation Excerpt :Motifs, defined as approximately repeated sub-patterns in a long time series, were first proposed in 2003 (Chiu et al., 2003). Since then, motifs have been used as representative patterns for long time series data in various data mining applications, e.g., classification, clustering, and rule discovery (González-Álvarez et al., 2013; Fu, 2011). However, efficient ways to extract motifs were needed as the brute-force solution was computationally untenable (Linardi et al., 2020).
A Memetic Chaotic Gravitational Search Algorithm for unconstrained global optimization problems
2019, Applied Soft Computing JournalCitation Excerpt :In the case of GSA, it has been widely applied to real-world problems, ranging from power or electrical engineering to the water industry or biology. Thus, GSA has been widely applied in many problems of electrical engineering (see [23,24]), machine learning (see ((; ))), mechanical engineering (see [27]) and even biology (see [28]). For a deeper review, see [20].
A comprehensive survey on gravitational search algorithm
2018, Swarm and Evolutionary ComputationCitation Excerpt :Additionally, a hybrid form of rainfall-runoff was modeled in Ref. [219] by integrating the variable infiltration capacity model and wavelet neural network based on BGSA. In biology, Álvarez, et al. [220] solved a DNA sequence analysis problem entitled Motif Discovery (MDP) using GSA. Later, Amoozegar, et al. [221] employed GSA for designing the optimal primers in successful DNA sequencing.
On the role of metaheuristic optimization in bioinformatics
2023, International Transactions in Operational ResearchCurrent studies and applications of Krill Herd and Gravitational Search Algorithms in healthcare
2023, Artificial Intelligence Review