Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

Boraik, Aziz Nasser; Abdullah, Rosni; Venkat, Ibrahim

doi:10.1007/978-3-642-40567-9_15

Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

Aziz Nasser Boraik⁷,
Rosni Abdullah⁷ &
Ibrahim Venkat⁷

Conference paper

865 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 378))

Abstract

The Multiple sequence alignment (MSA) is a fundamental step for almost all aspects of biological sequence analysis. The reliability and accuracy of sequence analyses depend on the quality of MSA. Including anchor points into multiple sequence alignment to be aligned has been proved to be a good way to increase the quality of MSA. In this paper, we have applied Shared Near Neighbors method to construct the anchor points as partial alignment columns which will be aligned for final output. These anchor points can be used as guide with DIALIGN-TX method to overcome the limitation of DIALIGN-TX to increase the accuracy of final MSA. The results showed 4-8% improvement in the six reference sets in BAliBASE 3.0 benchmark regarding to CS score compared to DIALIGN-TX. In addition, it achieved the highest overall mean Q-score and CS score comparing to other MSA methods in IRMBASE 2.0 benchmark.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006)
Article Google Scholar
Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144 (2002)
Article Google Scholar
Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009)
Google Scholar
Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One 6, e18093 (2011)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 467–480 (1994)
Article Google Scholar
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of the National Academy of Sciences of the United States of America 93, 12098–12103 (1996)
Article MATH Google Scholar
Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005)
Article Google Scholar
Subramanian, A.R., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology: AMB 3, 6 (2008)
Article Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)
Article Google Scholar
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002)
Article Google Scholar
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)
Article Google Scholar
Morgenstern, B., Prohaska, S.J., Pöhler, D., Stadler, P.F.: Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology: AMB 1, 6 (2006)
Article Google Scholar
Deng, X., Cheng, J.: MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12, 472 (2011)
Article Google Scholar
Subramanian, A.R., Hiran, S., Steinkamp, R., Meinicke, P., Corel, E., Morgenstern, B.: DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS. Nucleic Acids Research 38, W19–W22 (2010)
Google Scholar
Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research 28, 2919–2926 (2000)
Article Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers C-22, 1025–1034 (1973)
Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Article Google Scholar
Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology, 195–197 (1981)
Google Scholar
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 1, 337–348 (1994)
Article Google Scholar
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 86, 4412–4415 (1989)
Article Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)
Google Scholar
Notredame, C., Holm, L., Higgins, D.G.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)
Article Google Scholar
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 330–340 (2005)
Google Scholar
Pei, J., Grishin, N.V.: MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information 34, 4364–4374 (2006)
Google Scholar
Liu, Y., Schmidt, B., Maskell, D.L.: MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26, 1958–1964 (2010)
Article Google Scholar
Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721 (2006)
Article Google Scholar
Corel, E., Pitschi, F., Morgenstern, B.: A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 26, 1015–1021 (2010)
Article Google Scholar
Ert, L., Steinbach, M.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, pp. 47–58 (2003)
Google Scholar
Blackshields, G., Wallace, I.M., Larkin, M., Higgins, D.G.: Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology 6, 321–339 (2006)
Google Scholar
Ert, L., Steinbach, M.: Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach. Performance Computing, 1–20 (2002)
Google Scholar
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)
Article Google Scholar
QSCORE multiple alignment scoring Software, http://www.drive5.com/qscore

Download references

Author information

Authors and Affiliations

School of Computer Science, Universiti Sains Malaysia, Malaysia
Aziz Nasser Boraik, Rosni Abdullah & Ibrahim Venkat

Authors

Aziz Nasser Boraik
View author publications
You can also search for this author in PubMed Google Scholar
Rosni Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Venkat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Science & Technology, University Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Shahrul Azman Noah
Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor D. E, Malaysia
Azizi Abdullah
Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor Darul Ehsan, Malaysia
Haslina Arshad , Zulaiha Ali Othman & Zalinda Othman , &
School of Computer Science, FTSM, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor Darul Ehsan, Malaysia
Azuraliza Abu Bakar
Pattern Recognition Research Group, CAIT, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahnorbanun Sahran
Faculty of Information Science & IT, National University of Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlia Omar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boraik, A.N., Abdullah, R., Venkat, I. (2013). Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-40567-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics