Abstract
Multiple sequence alignment (MSA) is characterized as a very high computational complex problem. Therefore, MSA problem cannot be solved by exhaustive methods. Nowadays, MSA is being solved by optimizing more than one objective simultaneously. In this paper, we propose a new genetic algorithm based alignment technique, named bi-objective sequence alignment using genetic algorithm (BSAGA). The novelty of this approach is its selection process. One part of the population is selected based on the Sum of Pair, and rest is selected based on Total Conserve Columns. We applied integer-based chromosomal coding to represent only the gap positions in an alignment. Such representation improves the search technique to reach an optimum even for longer sequences. We tested and compared the alignment score of BSAGA with other relevant alignment techniques on BAliBASE and SABmark. The BSAGA shows better performance than others do, which was further proved by the Wilcoxon sign test.
Similar content being viewed by others
References
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28:235–242
Bradley RK et al (2009) Fast statistical alignment. PLoS Comput Biol 5:e1000392
Chuong BD, Kazutaka K (2008) Protein multiple sequence alignment. Methods Mol Biol 484:379–413
Corder GW (2009) Foreman DI: nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New York
Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans Evol Comput 6:182–197
DeRonne KW, Karypis G (2013) Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform 10:481–493
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32:1792–1797
Ehrgott M (2005) Multicriteria optimization. Springer, Berlin
Eusuff M, Lansey K, Pasha F (2006) Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 38:129–154
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Boston
Gondro C, Kinghorn BP (2007) A simple genetic algorithm for multiple sequence alignment. Genet Mol Res 6:964–982
Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89:10915–10919
Heringa J, Taylor WR (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7:416–421
Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06. Erciyes University, Engineering Faculty, Computer Engineering Department
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066
Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518
Kaya M, Sarhan A, Alhajj R (2014) Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput Methods Prog Biomed 114:38–49
Kemena C, Taly JF, Kleinjung J, Notredame C (2011) STRIKE: evaluation of protein MSAs using a single 3D structure. Bioinformatics 27:3385–3391
Lam AY, Li VO (2010) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evol Comput 14:381–399
Lassmann T, Frings O, Sonnhammer ELL (2009) Kalign2: high performance multiple alignment of protein and nucleotide sequences allowing external features. Nucl Acids Res 37:858–865
Lee ZH, Su SF, Chuang CC, Liu KH (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78
Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415
Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964
Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci 102:10557–10562
Miller BL, Golberg DE (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Syst 9:193–212
Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Narimani Z, Hamid B, Hassan A (2012) A new genetic algorithm for multiple sequence alignment. Int J Comput Intell Appl. https://doi.org/10.1142/S146902681250023X
Naznin F, Sarker R, Essam D (2011) Vertical decomposition with genetic algorithm for multiple sequence alignment. BMC Bioinform 12:353
Naznin F, Sarker R, Essam D (2012) Progressive alignment method using genetic algorithm for multiple sequence alignment. IEEE Trans Evol Comput 16:615–631
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3:131–144
Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24:1515–1524
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217
Ortuño FM, Valenzuela O, Rojas F, Pomares H, Florido JP, Urquiza JM, Rojas I (2013) Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 29:2112–2121
Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucl Acids Res 34:4364–4374
Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808
Rahman RA, Ramli R, Jamari Z, Ku-Mahamud KR (2016) Evolutionary algorithm with roulette-tournament selection for solving aquaculture diet formulation. Math Probl Eng 2016:1–10
Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721
Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016a) Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 41:157–168
Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016b) A hybrid multiobjective memetic metaheuristic for multiple sequence alignment. IEEE Trans Evol Comput 20:499–514
Sean RE (2002) A memory-efficient dynamic programming algorithm for optimal alignment of sequence to an RNA secondary structure. BMC Bioinform 3:13
Shyu C, Sheneman L, Foster JA (2004) Multiple sequence alignment with evolutionary computation. Genet Progr Evolvable Mach 5:121–144
Sievers F et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539
Smith TF, Waterman MS (1981) Identification of common molecular sequences. J Mol Biol 147:195–197
Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:1–11
Taheri J, Zomaya AY (2009) RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem. BMC Genom 10(Suppl 1):S10
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:87–88
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136
Thompson JD, Linard B, Lecompte D, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS ONE 6:1–14
Van Walle I, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268
Wadud MS, Islam MR, Kundu N, Kabir MR (2018) Multiple sequence alignment using chemical reaction optimization algorithm. Int Conf Intell Syst Des Appl 941:1065–1074
Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337–348
Yamada S, Gotoh O, Yamana H (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform 7:524
Zhou A, Qu BY, Li H, Zhao SZ, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol Comput 1:32–49
Acknowledgements
The authors would like to sincerely thank the reviewers for helpful and constructive suggestions to improve the quality of the paper. The authors would also like to thank Prof. Dr. Ansuman Lahiri of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, for his helpful suggestions and ideas while doing this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal participants
This article does not contain any studies with human participants or animals performed by any of the authors
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chowdhury, B., Garai, G. A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm. Soft Comput 24, 15871–15888 (2020). https://doi.org/10.1007/s00500-020-04917-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04917-5