Elsevier

Computers & Operations Research

Volume 41, January 2014, Pages 24-30
Computers & Operations Research

The Selective Fixing Algorithm for the closest string problem

https://doi.org/10.1016/j.cor.2013.07.017Get rights and content

Abstract

A hybrid heuristic algorithm based on integer linear programming is proposed for the closest string problem (CSP). The algorithm takes a rough feasible solution in input and iteratively selects variables to be fixed at their initial value until the number of free variables is small enough for the remaining problem to be solved to optimality by an ILP solver. The new solution can then be used as input for another iteration of the algorithm and this approach is repeated a predefined number of times. The procedure is denoted as Selective Fixing Algorithm (SFA). SFA has first been tested on standard instances available from the literature, which is denoted as rectangular having string length larger than the number of strings. Then, this approach has also been tested on the so-called square instances (having string length equal to the number of strings) and rectangular inverse instances (having string length smaller than the number of strings). Computational experiments indicate that SFA globally outperforms the state-of-the-art heuristics.

Introduction

The Closest String Problem (CSP), sometimes called the Center String Problem, is a NP-hard combinatorial optimization problem which arises in different fields, such as computational biology and coding theory. It consists in finding a string s of length l that minimizes the maximum Hamming distance from a set of strings {si}i=1n, each of the same length l. In other words, the aim of a CSP instance is to find the geometric center of the given set of strings.

Let us define a finite set of characters A={ai}i=1k. We consider set A as the alphabet over which each string si (i=1n) is constructed and we indicate with Al the set of all the strings of length l that can be constructed over the alphabet A. Let si[j]A denote the j-th character of the string si. The following set S={si}i=1n of strings with the same length l specifies a CSP instance, whose aim is to find a string sAl that minimizes the maximum Hamming distance from S. In [2], several ILP formulations of CSP are presented and polyhedrally studied. Among these formulations, we consider the following formulation first proposed in [5], [6], which provides the best performances in practice:mindcAx(c,j)=1j=1llj=1lx(si(j),j)di=1nd0dNx(c,j){0,1}cAj=1lwhere x(c,j)=1 if s[j]=c and x(c,j)=0 otherwise. Constraint (2) indicates that only one symbol can be assigned to each position of s, while constraint (3) ensures that the Hamming distance from each string si to s does not exceed d. Many heuristic approaches have been introduced in the last few years. In [7], Liu et al. applied a parallel genetic algorithm (herewith denoted GA) and a parallel simulated annealing (herewith denoted SA), but the section of computational experiments only refers to small instances. Another heuristic proposed by Liu et al. in [8] (herewith denoted LDDA_LSS) uses a hybrid approach combining a polynomial time approximation algorithm and a local search strategy. An ant colony optimization (herewith denoted ACO) algorithm was presented by Faro and Pappalardo in [4] and turned out to be superior to the non-parallel implementations of GA and SA. In addition, several matheuristics which use either a Lagrangian relaxation or a continuous relaxation of the ILP model (1), (2), (3), (4), (5) have been proposed. The algorithm proposed by Tanaka (herewith denoted TA) in [10] combines a Lagrangian multiplier adjustment procedure and a tabu search. In [1], a heuristic algorithm (herewith denoted IRA) is introduced that iteratively solves the continuous relaxation of the ILP model above, rounding up one at a time the variables with the highest value. In [3], an instant algorithm (herewith denoted RA) and two core based procedures (herewith denoted BCPA and ECPA) are presented. RA simply rounds up the result of continuous relaxation, while BCPA and ECPA fix a subset of the integer variables in the continuous solution at the current value and let the solver run on the remaining problem. In [3] it is also shown that ECPA outperforms BCPA, which in turn outperforms RA, where RA dominates the other heuristics not based on mathematical programming, both in terms of quality and CPU time.

In all the above mentioned papers, the instances considered in computational experiments are rectangular instances (nl), while there are no results on tests performed on instances for which nl. In [3], it is mentioned that such instances are inherently harder to solve due to the higher number of constraints imposed by the strings, which make the continuous solution of the problem much more fractional. The purpose of this work is to introduce an efficient heuristic algorithm that has a robust behaviour on any type of instance.

Section snippets

Description of the proposed algorithm

The proposed algorithm takes in input a rough feasible string s and related solution x′ computed by means of a fast algorithm such as RA. Then, it starts a phase, called the “selection phase”, whose aim is to iteratively select which positions are convenient to be maintained at the current value. At each iteration several positions are selected and the related symbols present in the current solution are fixed, in other words for such positions the new solution keeps the symbols used by the

Computational experiments

All algorithms discussed above have been implemented using C as a programming language and CPLEX 12.1 as LP and ILP solver. The algorithms have been tested on a Celeron Dual Core CPU at 2.1 GHz with 3 GB of RAM. The instances considered are of three different types: rectangular instances (where nl), square instances (where n=l) and rectangular inverse instances (where ln). We considered the instances tackled in [8] as rectangular instances, while we randomly generated the other ones. All

Conclusions

The algorithm proposed in this paper provides a robust and efficient method to solve CSP instances. Its performances are globally better than the performances of the state-of-the-art heuristics, as described above. By increasing parameter IT, namely the number of iterations of the algorithm, it is possible to obtain a further improvement in the solution quality.

The method introduced can also be adapted to other combinatorial optimization problems with some adjustments. The issues that have to

References (10)

There are more references available in the full text version of this article.

Cited by (5)

  • An improved integer linear programming formulation for the closest 0-1 string problem

    2017, Computers and Operations Research
    Citation Excerpt :

    Due to its importance, the problem has recently attracted extensive research, see e.g. [5,8–10]. Various integer linear programming (ILP) formulations have also been proposed to solve it, see [1,6,7], and ILP is a key factor of success for the present state-of-the-art heuristics [2,3]. Therefore, improving the performance of ILP formulations for the CSP is a way to improve the performance of those algorithms.

  • On the role of metaheuristic optimization in bioinformatics

    2023, International Transactions in Operational Research
  • Selected string problems

    2018, Handbook of Heuristics
  • Metaheuristics for string problems in bio-informatics

    2016, Metaheuristics for String Problems in Bio-informatics
  • Optimum solution of the closest string problem via rank distance

    2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View full text