Abstract
Computational molecular biology has emerged as one of the most exciting interdisciplinary fields. It has currently benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biochemistry, and computer science. In the past few years it has been shown that a large number of molecular biology problems can be formulated as combinatorial optimization problems, including sequence alignment problems, genome rearrangement problems, string selection and comparison problems, and protein structure prediction and recognition. This paper provides a detailed description of string selection and string comparison problems. For finding good-quality solutions of a particular class of string comparison molecular biology problems, known as the far from most string problem, we propose new heuristics, including a Greedy Randomized Adaptive Search Procedure (GRASP) and a Genetic Algorithm (GA). Computational results indicate that these randomized heuristics find better quality solutions compared with results produced by the best state-of-the-art heuristic approach.
Similar content being viewed by others
References
Aiex, R. M., Resende, M. G. C., & Ribeiro, C. C. (2002). Probability distribution of solution time in grasp: an experimental investigation. Journal of Heuristics, 8, 343–373.
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., & Protasi, M. (1999). Complexity and approximation. Berlin: Springer.
Bean, J. C. (1994). Genetic algorithms and random keys for sequencing and optimization. ORSA Journal on Computing, 6, 154–160.
Domínguez-Marín, P., Nickel, S., Hansen, P., & Mladenović, N. (2005). Heuristic procedures for solving the discrete ordered median problem. Annals of Operations Research, 136(1), 145–173.
Easton, T., & Singireddy, A. (2007). A specialized branching and fathoming technique for the longest common subsequence problem. International Journal of Operational Research, 4(2), 98–104.
Eiben, A. E., Aarts, E. H. L., & Van Hee, K. M. (1991). Global convergence of genetic algorithms: A Markov chain analysis. In Lecture Notes in Computer Science: Vol. 496. Proceedings of 1st workshop on parallel problem solving from nature (pp. 3–12). Berlin: Springer.
Fagin, R. (1974). Generalized first-order spectra and polynomial time recognizable sets. In R. Karp (Ed.), Complexity of computation (pp. 43–73). Providence: Am. Math. Soc.
Feo, T. A., & Resende, M. G. C. (1989). A probabilistic heuristic for a computationally difficult set covering problem. Operations Research Letters, 8, 67–71.
Feo, T. A., & Resende, M. G. C. (1995). Greedy randomized adaptive search procedures. Journal of Global Optimization, 6, 109–133.
Festa, P. (2007). On some optimization problems in molecular biology. Mathematical Biosciences, 207(2), 219–234.
Festa, P., & Resende, M. G. C. (2002). GRASP: An annotated bibliography. In C. C. Ribeiro & P. Hansen (Eds.), Essays and surveys on metaheuristics (pp. 325–367). Dordrecht: Kluwer Academic.
Fleurent, C., & Ferland, J. A. (1996). Genetic and hybrid algorithms for graph coloring. Annals of Operations Research, 63, 437–461.
Frances, M., & Litman, A. (1997). On covering problems of codes. Theory of Computing Systems, 30(2), 113–119.
Garey, M., & Johnson, D. (1979). Computers and intractability: a guide to the theory of NP-completeness. San Francisco: Freeman.
Glover, F. (1996). Tabu search and adaptive memory programming: Advances, applications and challenges. In R. Barr, R. Helgason, & J. Kennington (Eds.), Interfaces in computer science and operations research (pp. 1–75). Dordrecht: Kluwer Academic.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Reading: Addison-Wesley.
Goldberg, D. E., & Segrest, P. (1987). Finite Markov chain analysis of genetic algorithms. In J. J. Grefenstette (Ed.), Proceedings of the second international conference on genetic algorithms (Lawrence Erlbaum associates) (pp. 1–8).
Gomes, F. C., Meneses, C. N., Pardalos, P. M., & Viana, G. V. R. (2008). A parallel multistart algorithm for the closest string problem. Computers & Operations Research, 35(11), 3636–3643.
Guvenir, H. A., & Erel, E. (1998). Multicriteria inventory classification using a genetic algorithm. European Journal of Operational Research, 105(1), 29–37.
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Hollstein, R. B. (1971). Artificial genetic adaptation in computer control systems. PhD thesis, 1971.
Koza, J. R., Bennett, F. H. III, Andre, D., & Keane, M. A. (1999). Genetic programming III, Darwinian invention and problem solving. San Mateo: Morgan Kaufmann.
Lanctot, J., Li, M., Ma, B., Wang, S., & Zhang, L. (1999). Distinguishing string selection problems. In Proceedings of the annual ACM-SIAM symposium on discrete algorithms (SODA) (pp. 633–642).
Lanctot, J., Li, M., Ma, B., Wang, S., & Zhang, L. (2003). Distinguishing string selection problems. Information and Computation, 185(1), 41–55.
Li, M., Ma, B., & Wang, L. (1999). Finding similar regions in many strings. In Proceedings of the annual ACM symposium on theory of computing (pp. 473–482).
Liepins, G. E., & Hilliard, M. R. (1989). Genetic algorithms: Foundations and applications. Annals of Operations Research, 21(1–4), 31–58.
Meneses, C. N., Gomes, F. C., Pardalos, P. M., & Viana, G. V. R. (2005a). Parallel algorithm for the closest string problem. In R. Mondaini (Ed.), Proceedings of the fourth Brazilian symposium on mathematical and computational biology/first international symposium on mathematical and computational biology (Vol. 2, pp. 326–332).
Meneses, C. N., Oliveira, C. A. S., & Pardalos, P. M. (2005b). Optimization techniques for string selection and comparison problems in genomics. IEEE Engineering in Medicine and Biology Magazine, 24(3), 81–87.
Oliveira, C. A. S., & Pardalos, P. M. (2005). Network flow algorithm for the longest common subsequence problem. In R. Mondaini (Ed.), Proceedings of the fourth Brazilian symposium on mathematical and computational biology / first international symposium on mathematical and computational biology (Vol. 2, pp. 300–313).
Pardalos, P. M., Oliveira, C. A. S., Lu, Z., & Meneses, C. N. (2004). Optimal solutions for the closest string problem via integer programming. INFORMS Journal on Computing, 16, 419–429.
Resende, M. G. C., & Ribeiro, C. C. (2002). Greedy randomized adaptive search procedures. In F. Glover & G. Kochenberger (Eds.), State-of-the-art handbook of metaheuristics Dordrecht: Kluwer Academic.
Resende, M. G. C., & Ribeiro, C. C. (2005). GRASP and path-relinking: Recent advances and applications. In T. Ibaraki, K. Nonobe, & M. Yagiura (Eds.), Metaheuristics: progress as real problem solvers (pp. 29–63). Berlin: Springer.
Roman, S. (1992). Graduate Texts in Mathematics: Vol. 134. Coding and information theory. Berlin: Springer.
Sim, J. S., & Park, K. (1999). The consensus string problem for a metric is NP-complete. In Proceedings of the annual Australasian workshop on combinatorial algorithms (AWOCA) (pp. 107–113).
Wang, D., & Fang, S.-C. (1996). A semi-infinite programming model for earliness/tardiness production planning with a genetic algorithm. Computers & Mathematics with Applications, 31(8), 95–106.
White, A. R. P., Mann, J. W., & Smith, G. D. (1996). Genetic algorithms and network ring design. Annals of Operations Research, 86, 347–374.
Acknowledgements
The authors gratefully acknowledge Daniele Ferone for his help in the implementation and experimentation phase and the anonymous referees for their comments and suggestions which have been revealed useful to improve both quality and readability of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Festa, P., Pardalos, P.M. Efficient solutions for the far from most string problem. Ann Oper Res 196, 663–682 (2012). https://doi.org/10.1007/s10479-011-1028-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-011-1028-7