Abstract
The BLOSUM matrices estimate the likelihood for one amino acid to be substituted with another, and are commonly used in sequence alignments. Each BLOSUM matrix is associated with a parameter x—the matrix elements are computed based on the diversity among sequences of no more than x% similar. In an earlier work, Song et al. observed a property in the BLOSUM matrices—eigendecompositions of the matrices produce nearly identical sets of eigenvectors. Furthermore, for each eigenvector, a nearly linear trend is observed in all its eigenvalues. This property allowed Song et al. to devise an iterative alignment and matrix selection process to produce more accurate matrices. In this paper, we investigate the reasons behind this property of the BLOSUM matrices. Using this knowledge, we analyze the situations under which the property holds, and hence clarify the extent of the earlier method’s validity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Chapelle, O., Do, C.B., Teo, C.H., Le, Q.V., Smola, A.J.: Tighter bounds for structured estimation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 281–288. Curran Associates, Inc. (2009)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Nat. Biomed. Res. Found. 5(3), 345–358 (1978)
Dewey, C.N., Huggins, P., Woods, K., Sturmfels, B., Pachter, L.: Parametric alignment of drosophila genomes. PLoS Comput. Biol. 2(6) (2006)
Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)
Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. In: ISMB (Supplement of Bioinformatics), vol. 22, pp. 90–98 (2006)
Edgar, R.C.: Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 10(396) (2009)
Flannick, J., Novak, A., Do, C.B., Srinivasan, B.S., Batzoglou, S.: Automatic parameter learning for multiple local network alignment. J. Comput. Biol. 16(8), 1001–1022 (2006)
Gaston, H., Gonnet, M., Cohen, A., Benner, S.: Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–1445 (1992)
Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. In: Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1992, pp. 432–439. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)
Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl. Acids Res. 33(2), 511–518 (2005)
Kim, E., Kececioglu, J.: Learning scoring schemes for sequence alignment from partial examples. IEEE/ACM Trans. Comput. Biol. Bioinformatics 5(4), 546–556 (2008)
Kosial, C., Goldman, N.: Different versions of the dayhoff rate matrix. Mol. Biol. Evol. 22(2), 193–199 (2005)
Kuznetsov, I.: Protein sequence alignment with family-specific amino acid similarity matrices. BMC Research Notes 4(1), 296 (2011)
Lassmann, T., Sonnhammer, E.: Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6(1), 298 (2005)
Song, D., Chen, J., Chen, G., Li, N., Li, J., Fan, J., Bu, D., Li, S.C.: Parameterized blosum matrices for protein alignment. IEEE/ACM Trans. Comput. Biol. Bioinformatics PP(99), 1 (2014)
Wang, H.-C., Susko, E., Roger, A.J.: An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation. Mol. Biol. Evol. 31(4), 779–792 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, J., Ng, Y.K., Liu, X., Li, S.C. (2015). On the Near-Linear Correlation of the Eigenvalues Across BLOSUM Matrices. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-19048-8_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19047-1
Online ISBN: 978-3-319-19048-8
eBook Packages: Computer ScienceComputer Science (R0)