The Selective Fixing Algorithm for the closest string problem

doi:10.1016/j.cor.2013.07.017

Computers & Operations Research

Volume 41, January 2014, Pages 24-30

https://doi.org/10.1016/j.cor.2013.07.017 Get rights and content

Abstract

A hybrid heuristic algorithm based on integer linear programming is proposed for the closest string problem (CSP). The algorithm takes a rough feasible solution in input and iteratively selects variables to be fixed at their initial value until the number of free variables is small enough for the remaining problem to be solved to optimality by an ILP solver. The new solution can then be used as input for another iteration of the algorithm and this approach is repeated a predefined number of times. The procedure is denoted as Selective Fixing Algorithm (SFA). SFA has first been tested on standard instances available from the literature, which is denoted as rectangular having string length larger than the number of strings. Then, this approach has also been tested on the so-called square instances (having string length equal to the number of strings) and rectangular inverse instances (having string length smaller than the number of strings). Computational experiments indicate that SFA globally outperforms the state-of-the-art heuristics.

Introduction

The Closest String Problem (CSP), sometimes called the Center String Problem, is a NP-hard combinatorial optimization problem which arises in different fields, such as computational biology and coding theory. It consists in finding a string s of length l that minimizes the maximum Hamming distance from a set of strings ${s_{i}}_{i = 1}^{n}$ , each of the same length l. In other words, the aim of a CSP instance is to find the geometric center of the given set of strings.

Let us define a finite set of characters $A = {a_{i}}_{i = 1}^{k}$ . We consider set A as the alphabet over which each string s_i $(\forall i = 1 \dots n)$ is constructed and we indicate with A^l the set of all the strings of length l that can be constructed over the alphabet A. Let $s_{i} [j] \in A$ denote the j-th character of the string s_i. The following set $S = {s_{i}}_{i = 1}^{n}$ of strings with the same length l specifies a CSP instance, whose aim is to find a string $s \in A^{l}$ that minimizes the maximum Hamming distance from S. In [2], several ILP formulations of CSP are presented and polyhedrally studied. Among these formulations, we consider the following formulation first proposed in [5], [6], which provides the best performances in practice: $\min d$ $\sum_{c \in A} x (c, j) = 1 \forall j = 1 \dots l$ $l - \sum_{j = 1}^{l} x (s_{i} (j), j) \leq d \forall i = 1 \dots n$ $d \geq 0 d \in N$ $x (c, j) \in {0, 1} \forall c \in A \forall j = 1 \dots l$ where $x (c, j) = 1$ if $s [j] = c$ and $x (c, j) = 0$ otherwise. Constraint (2) indicates that only one symbol can be assigned to each position of s, while constraint (3) ensures that the Hamming distance from each string s_i to s does not exceed d. Many heuristic approaches have been introduced in the last few years. In [7], Liu et al. applied a parallel genetic algorithm (herewith denoted GA) and a parallel simulated annealing (herewith denoted SA), but the section of computational experiments only refers to small instances. Another heuristic proposed by Liu et al. in [8] (herewith denoted LDDA_LSS) uses a hybrid approach combining a polynomial time approximation algorithm and a local search strategy. An ant colony optimization (herewith denoted ACO) algorithm was presented by Faro and Pappalardo in [4] and turned out to be superior to the non-parallel implementations of GA and SA. In addition, several matheuristics which use either a Lagrangian relaxation or a continuous relaxation of the ILP model (1), (2), (3), (4), (5) have been proposed. The algorithm proposed by Tanaka (herewith denoted TA) in [10] combines a Lagrangian multiplier adjustment procedure and a tabu search. In [1], a heuristic algorithm (herewith denoted IRA) is introduced that iteratively solves the continuous relaxation of the ILP model above, rounding up one at a time the variables with the highest value. In [3], an instant algorithm (herewith denoted RA) and two core based procedures (herewith denoted BCPA and ECPA) are presented. RA simply rounds up the result of continuous relaxation, while BCPA and ECPA fix a subset of the integer variables in the continuous solution at the current value and let the solver run on the remaining problem. In [3] it is also shown that ECPA outperforms BCPA, which in turn outperforms RA, where RA dominates the other heuristics not based on mathematical programming, both in terms of quality and CPU time.

In all the above mentioned papers, the instances considered in computational experiments are rectangular instances ( $n ≪ l$ ), while there are no results on tests performed on instances for which $n \geq l$ . In [3], it is mentioned that such instances are inherently harder to solve due to the higher number of constraints imposed by the strings, which make the continuous solution of the problem much more fractional. The purpose of this work is to introduce an efficient heuristic algorithm that has a robust behaviour on any type of instance.

Section snippets

Description of the proposed algorithm

The proposed algorithm takes in input a rough feasible string $s'$ and related solution x′ computed by means of a fast algorithm such as RA. Then, it starts a phase, called the “selection phase”, whose aim is to iteratively select which positions are convenient to be maintained at the current value. At each iteration several positions are selected and the related symbols present in the current solution are fixed, in other words for such positions the new solution keeps the symbols used by the

Computational experiments

All algorithms discussed above have been implemented using C as a programming language and CPLEX 12.1 as LP and ILP solver. The algorithms have been tested on a Celeron Dual Core CPU at 2.1 GHz with 3 GB of RAM. The instances considered are of three different types: rectangular instances (where $n ≪ l$ ), square instances (where n=l) and rectangular inverse instances (where $l ≪ n$ ). We considered the instances tackled in [8] as rectangular instances, while we randomly generated the other ones. All

Conclusions

The algorithm proposed in this paper provides a robust and efficient method to solve CSP instances. Its performances are globally better than the performances of the state-of-the-art heuristics, as described above. By increasing parameter IT, namely the number of iterations of the algorithm, it is possible to obtain a further improvement in the solution quality.

The method introduced can also be adapted to other combinatorial optimization problems with some adjustments. The issues that have to

References (10)

F. Della Croce et al.
Improved LP-based algorithms for the closest string problem
Computers and Operations Research
(2012)
X. Liu et al.
Exact algorithm and heuristic for the closest string problem
Computers and Operations Research
(2011)
S. Tanaka
A heuristic algorithm based on Lagrangian relaxation for the closest string problem
Computers and Operations Research
(2012)
Chen J. Iterative rounding for the closest string problem. In: Proceedings of the fifth conference on computability in...
Chimani M, Woste M, Bocker S. A closer look at the closest string and closest substring problem. In: Proceedings of the...

There are more references available in the full text version of this article.

Cited by (5)

An improved integer linear programming formulation for the closest 0-1 string problem
2017, Computers and Operations Research
Citation Excerpt :
Due to its importance, the problem has recently attracted extensive research, see e.g. [5,8–10]. Various integer linear programming (ILP) formulations have also been proposed to solve it, see [1,6,7], and ILP is a key factor of success for the present state-of-the-art heuristics [2,3]. Therefore, improving the performance of ILP formulations for the CSP is a way to improve the performance of those algorithms.
The Closest String Problem (CSP) calls for finding an n-string that minimizes its maximum Hamming distance from m given n-strings. Recently, integer linear programs (ILP) have been successfully applied within heuristics to improve efficiency and effectiveness. We consider an ILP for the binary case (0-1 CSP) that updates the previous formulations and solve it by branch-and-cut. The method separates in polynomial time the first closure of ${0, \frac{1}{2}}$ -Chvátal-Gomory cuts and can either be used stand-alone to find optimal solutions, or as a plug-in to improve heuristics based on the exact solution of reduced problems. Due to the parity structure of the right-hand side, the impressive performances obtained with this method in the binary case cannot be directly replicated in the general case.
On the role of metaheuristic optimization in bioinformatics
2023, International Transactions in Operational Research
Selected string problems
2018, Handbook of Heuristics
Metaheuristics for string problems in bio-informatics
2016, Metaheuristics for String Problems in Bio-informatics
Optimum solution of the closest string problem via rank distance
2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View full text

The Selective Fixing Algorithm for the closest string problem

Abstract

Introduction

Section snippets

Description of the proposed algorithm

Computational experiments

Conclusions

Computers and Operations Research

Computers and Operations Research

Computers and Operations Research