Skip to main content

Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

  • Conference paper
  • 865 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 378))

Abstract

The Multiple sequence alignment (MSA) is a fundamental step for almost all aspects of biological sequence analysis. The reliability and accuracy of sequence analyses depend on the quality of MSA. Including anchor points into multiple sequence alignment to be aligned has been proved to be a good way to increase the quality of MSA. In this paper, we have applied Shared Near Neighbors method to construct the anchor points as partial alignment columns which will be aligned for final output. These anchor points can be used as guide with DIALIGN-TX method to overcome the limitation of DIALIGN-TX to increase the accuracy of final MSA. The results showed 4-8% improvement in the six reference sets in BAliBASE 3.0 benchmark regarding to CS score compared to DIALIGN-TX. In addition, it achieved the highest overall mean Q-score and CS score comparing to other MSA methods in IRMBASE 2.0 benchmark.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006)

    Article  Google Scholar 

  2. Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144 (2002)

    Article  Google Scholar 

  3. Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009)

    Google Scholar 

  4. Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One 6, e18093 (2011)

    Google Scholar 

  5. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 467–480 (1994)

    Article  Google Scholar 

  6. Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of the National Academy of Sciences of the United States of America 93, 12098–12103 (1996)

    Article  MATH  Google Scholar 

  7. Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005)

    Article  Google Scholar 

  8. Subramanian, A.R., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology: AMB 3, 6 (2008)

    Article  Google Scholar 

  9. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)

    Article  Google Scholar 

  10. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002)

    Article  Google Scholar 

  11. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)

    Article  Google Scholar 

  12. Morgenstern, B., Prohaska, S.J., Pöhler, D., Stadler, P.F.: Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology: AMB 1, 6 (2006)

    Article  Google Scholar 

  13. Deng, X., Cheng, J.: MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12, 472 (2011)

    Article  Google Scholar 

  14. Subramanian, A.R., Hiran, S., Steinkamp, R., Meinicke, P., Corel, E., Morgenstern, B.: DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS. Nucleic Acids Research 38, W19–W22 (2010)

    Google Scholar 

  15. Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research 28, 2919–2926 (2000)

    Article  Google Scholar 

  16. Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers C-22, 1025–1034 (1973)

    Google Scholar 

  17. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  18. Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology, 195–197 (1981)

    Google Scholar 

  19. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 1, 337–348 (1994)

    Article  Google Scholar 

  20. Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 86, 4412–4415 (1989)

    Article  Google Scholar 

  21. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)

    Google Scholar 

  22. Notredame, C., Holm, L., Higgins, D.G.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)

    Article  Google Scholar 

  23. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 330–340 (2005)

    Google Scholar 

  24. Pei, J., Grishin, N.V.: MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information  34, 4364–4374 (2006)

    Google Scholar 

  25. Liu, Y., Schmidt, B., Maskell, D.L.: MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26, 1958–1964 (2010)

    Article  Google Scholar 

  26. Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721 (2006)

    Article  Google Scholar 

  27. Corel, E., Pitschi, F., Morgenstern, B.: A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 26, 1015–1021 (2010)

    Article  Google Scholar 

  28. Ert, L., Steinbach, M.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, pp. 47–58 (2003)

    Google Scholar 

  29. Blackshields, G., Wallace, I.M., Larkin, M., Higgins, D.G.: Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology 6, 321–339 (2006)

    Google Scholar 

  30. Ert, L., Steinbach, M.: Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach. Performance Computing, 1–20 (2002)

    Google Scholar 

  31. Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)

    Article  Google Scholar 

  32. QSCORE multiple alignment scoring Software, http://www.drive5.com/qscore

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boraik, A.N., Abdullah, R., Venkat, I. (2013). Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40567-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40566-2

  • Online ISBN: 978-3-642-40567-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics