Skip to main content

Advertisement

Log in

A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Multiple sequence alignment (MSA) is characterized as a very high computational complex problem. Therefore, MSA problem cannot be solved by exhaustive methods. Nowadays, MSA is being solved by optimizing more than one objective simultaneously. In this paper, we propose a new genetic algorithm based alignment technique, named bi-objective sequence alignment using genetic algorithm (BSAGA). The novelty of this approach is its selection process. One part of the population is selected based on the Sum of Pair, and rest is selected based on Total Conserve Columns. We applied integer-based chromosomal coding to represent only the gap positions in an alignment. Such representation improves the search technique to reach an optimum even for longer sequences. We tested and compared the alignment score of BSAGA with other relevant alignment techniques on BAliBASE and SABmark. The BSAGA shows better performance than others do, which was further proved by the Wilcoxon sign test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28:235–242

    Article  Google Scholar 

  • Bradley RK et al (2009) Fast statistical alignment. PLoS Comput Biol 5:e1000392

    Article  MathSciNet  Google Scholar 

  • Chuong BD, Kazutaka K (2008) Protein multiple sequence alignment. Methods Mol Biol 484:379–413

    Article  Google Scholar 

  • Corder GW (2009) Foreman DI: nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New York

    Book  MATH  Google Scholar 

  • Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans Evol Comput 6:182–197

    Article  Google Scholar 

  • DeRonne KW, Karypis G (2013) Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform 10:481–493

    Article  Google Scholar 

  • Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340

    Article  Google Scholar 

  • Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32:1792–1797

    Article  Google Scholar 

  • Ehrgott M (2005) Multicriteria optimization. Springer, Berlin

    MATH  Google Scholar 

  • Eusuff M, Lansey K, Pasha F (2006) Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 38:129–154

    Article  MathSciNet  Google Scholar 

  • Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360

    Article  Google Scholar 

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Boston

    MATH  Google Scholar 

  • Gondro C, Kinghorn BP (2007) A simple genetic algorithm for multiple sequence alignment. Genet Mol Res 6:964–982

    Google Scholar 

  • Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838

    Article  Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89:10915–10919

    Article  Google Scholar 

  • Heringa J, Taylor WR (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7:416–421

    Article  Google Scholar 

  • Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06. Erciyes University, Engineering Faculty, Computer Engineering Department

  • Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066

    Article  Google Scholar 

  • Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518

    Article  Google Scholar 

  • Kaya M, Sarhan A, Alhajj R (2014) Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput Methods Prog Biomed 114:38–49

    Article  Google Scholar 

  • Kemena C, Taly JF, Kleinjung J, Notredame C (2011) STRIKE: evaluation of protein MSAs using a single 3D structure. Bioinformatics 27:3385–3391

    Article  Google Scholar 

  • Lam AY, Li VO (2010) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evol Comput 14:381–399

    Article  Google Scholar 

  • Lassmann T, Frings O, Sonnhammer ELL (2009) Kalign2: high performance multiple alignment of protein and nucleotide sequences allowing external features. Nucl Acids Res 37:858–865

    Article  Google Scholar 

  • Lee ZH, Su SF, Chuang CC, Liu KH (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78

    Article  Google Scholar 

  • Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415

    Article  Google Scholar 

  • Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964

    Article  Google Scholar 

  • Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci 102:10557–10562

    Article  Google Scholar 

  • Miller BL, Golberg DE (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Syst 9:193–212

    MathSciNet  Google Scholar 

  • Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor

    Google Scholar 

  • Narimani Z, Hamid B, Hassan A (2012) A new genetic algorithm for multiple sequence alignment. Int J Comput Intell Appl. https://doi.org/10.1142/S146902681250023X

    Article  Google Scholar 

  • Naznin F, Sarker R, Essam D (2011) Vertical decomposition with genetic algorithm for multiple sequence alignment. BMC Bioinform 12:353

    Article  Google Scholar 

  • Naznin F, Sarker R, Essam D (2012) Progressive alignment method using genetic algorithm for multiple sequence alignment. IEEE Trans Evol Comput 16:615–631

    Article  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  Google Scholar 

  • Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3:131–144

    Article  Google Scholar 

  • Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24:1515–1524

    Article  Google Scholar 

  • Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    Article  Google Scholar 

  • Ortuño FM, Valenzuela O, Rojas F, Pomares H, Florido JP, Urquiza JM, Rojas I (2013) Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 29:2112–2121

    Article  Google Scholar 

  • Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucl Acids Res 34:4364–4374

    Article  Google Scholar 

  • Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808

    Article  Google Scholar 

  • Rahman RA, Ramli R, Jamari Z, Ku-Mahamud KR (2016) Evolutionary algorithm with roulette-tournament selection for solving aquaculture diet formulation. Math Probl Eng 2016:1–10

    Google Scholar 

  • Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721

    Article  Google Scholar 

  • Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016a) Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 41:157–168

    Article  Google Scholar 

  • Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016b) A hybrid multiobjective memetic metaheuristic for multiple sequence alignment. IEEE Trans Evol Comput 20:499–514

    Article  Google Scholar 

  • Sean RE (2002) A memory-efficient dynamic programming algorithm for optimal alignment of sequence to an RNA secondary structure. BMC Bioinform 3:13

    Article  Google Scholar 

  • Shyu C, Sheneman L, Foster JA (2004) Multiple sequence alignment with evolutionary computation. Genet Progr Evolvable Mach 5:121–144

    Article  Google Scholar 

  • Sievers F et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539

    Article  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular sequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  • Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:1–11

    Article  Google Scholar 

  • Taheri J, Zomaya AY (2009) RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem. BMC Genom 10(Suppl 1):S10

    Article  Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680

    Article  Google Scholar 

  • Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:87–88

    Article  Google Scholar 

  • Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136

    Article  Google Scholar 

  • Thompson JD, Linard B, Lecompte D, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS ONE 6:1–14

    Article  Google Scholar 

  • Van Walle I, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268

    Article  Google Scholar 

  • Wadud MS, Islam MR, Kundu N, Kabir MR (2018) Multiple sequence alignment using chemical reaction optimization algorithm. Int Conf Intell Syst Des Appl 941:1065–1074

    Google Scholar 

  • Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337–348

    Article  Google Scholar 

  • Yamada S, Gotoh O, Yamana H (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform 7:524

    Article  Google Scholar 

  • Zhou A, Qu BY, Li H, Zhao SZ, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol Comput 1:32–49

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to sincerely thank the reviewers for helpful and constructive suggestions to improve the quality of the paper. The authors would also like to thank Prof. Dr. Ansuman Lahiri of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, for his helpful suggestions and ideas while doing this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biswanath Chowdhury.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal participants

This article does not contain any studies with human participants or animals performed by any of the authors

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 266 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, B., Garai, G. A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm. Soft Comput 24, 15871–15888 (2020). https://doi.org/10.1007/s00500-020-04917-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-04917-5

Keywords

Navigation