The Selective Fixing Algorithm for the closest string problem
Introduction
The Closest String Problem (CSP), sometimes called the Center String Problem, is a NP-hard combinatorial optimization problem which arises in different fields, such as computational biology and coding theory. It consists in finding a string s of length l that minimizes the maximum Hamming distance from a set of strings , each of the same length l. In other words, the aim of a CSP instance is to find the geometric center of the given set of strings.
Let us define a finite set of characters . We consider set A as the alphabet over which each string si is constructed and we indicate with Al the set of all the strings of length l that can be constructed over the alphabet A. Let denote the j-th character of the string si. The following set of strings with the same length l specifies a CSP instance, whose aim is to find a string that minimizes the maximum Hamming distance from S. In [2], several ILP formulations of CSP are presented and polyhedrally studied. Among these formulations, we consider the following formulation first proposed in [5], [6], which provides the best performances in practice:where if and otherwise. Constraint (2) indicates that only one symbol can be assigned to each position of s, while constraint (3) ensures that the Hamming distance from each string si to s does not exceed d. Many heuristic approaches have been introduced in the last few years. In [7], Liu et al. applied a parallel genetic algorithm (herewith denoted GA) and a parallel simulated annealing (herewith denoted SA), but the section of computational experiments only refers to small instances. Another heuristic proposed by Liu et al. in [8] (herewith denoted LDDA_LSS) uses a hybrid approach combining a polynomial time approximation algorithm and a local search strategy. An ant colony optimization (herewith denoted ACO) algorithm was presented by Faro and Pappalardo in [4] and turned out to be superior to the non-parallel implementations of GA and SA. In addition, several matheuristics which use either a Lagrangian relaxation or a continuous relaxation of the ILP model (1), (2), (3), (4), (5) have been proposed. The algorithm proposed by Tanaka (herewith denoted TA) in [10] combines a Lagrangian multiplier adjustment procedure and a tabu search. In [1], a heuristic algorithm (herewith denoted IRA) is introduced that iteratively solves the continuous relaxation of the ILP model above, rounding up one at a time the variables with the highest value. In [3], an instant algorithm (herewith denoted RA) and two core based procedures (herewith denoted BCPA and ECPA) are presented. RA simply rounds up the result of continuous relaxation, while BCPA and ECPA fix a subset of the integer variables in the continuous solution at the current value and let the solver run on the remaining problem. In [3] it is also shown that ECPA outperforms BCPA, which in turn outperforms RA, where RA dominates the other heuristics not based on mathematical programming, both in terms of quality and CPU time.
In all the above mentioned papers, the instances considered in computational experiments are rectangular instances (), while there are no results on tests performed on instances for which . In [3], it is mentioned that such instances are inherently harder to solve due to the higher number of constraints imposed by the strings, which make the continuous solution of the problem much more fractional. The purpose of this work is to introduce an efficient heuristic algorithm that has a robust behaviour on any type of instance.
Section snippets
Description of the proposed algorithm
The proposed algorithm takes in input a rough feasible string and related solution x′ computed by means of a fast algorithm such as RA. Then, it starts a phase, called the “selection phase”, whose aim is to iteratively select which positions are convenient to be maintained at the current value. At each iteration several positions are selected and the related symbols present in the current solution are fixed, in other words for such positions the new solution keeps the symbols used by the
Computational experiments
All algorithms discussed above have been implemented using C as a programming language and CPLEX 12.1 as LP and ILP solver. The algorithms have been tested on a Celeron Dual Core CPU at 2.1 GHz with 3 GB of RAM. The instances considered are of three different types: rectangular instances (where ), square instances (where n=l) and rectangular inverse instances (where ). We considered the instances tackled in [8] as rectangular instances, while we randomly generated the other ones. All
Conclusions
The algorithm proposed in this paper provides a robust and efficient method to solve CSP instances. Its performances are globally better than the performances of the state-of-the-art heuristics, as described above. By increasing parameter IT, namely the number of iterations of the algorithm, it is possible to obtain a further improvement in the solution quality.
The method introduced can also be adapted to other combinatorial optimization problems with some adjustments. The issues that have to
References (10)
- et al.
Improved LP-based algorithms for the closest string problem
Computers and Operations Research
(2012) - et al.
Exact algorithm and heuristic for the closest string problem
Computers and Operations Research
(2011) A heuristic algorithm based on Lagrangian relaxation for the closest string problem
Computers and Operations Research
(2012)- Chen J. Iterative rounding for the closest string problem. In: Proceedings of the fifth conference on computability in...
- Chimani M, Woste M, Bocker S. A closer look at the closest string and closest substring problem. In: Proceedings of the...
Cited by (5)
An improved integer linear programming formulation for the closest 0-1 string problem
2017, Computers and Operations ResearchCitation Excerpt :Due to its importance, the problem has recently attracted extensive research, see e.g. [5,8–10]. Various integer linear programming (ILP) formulations have also been proposed to solve it, see [1,6,7], and ILP is a key factor of success for the present state-of-the-art heuristics [2,3]. Therefore, improving the performance of ILP formulations for the CSP is a way to improve the performance of those algorithms.
On the role of metaheuristic optimization in bioinformatics
2023, International Transactions in Operational ResearchSelected string problems
2018, Handbook of HeuristicsMetaheuristics for string problems in bio-informatics
2016, Metaheuristics for String Problems in Bio-informaticsOptimum solution of the closest string problem via rank distance
2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)