Skip to main content

On the Near-Linear Correlation of the Eigenvalues Across BLOSUM Matrices

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9096))

Included in the following conference series:

  • 1951 Accesses

Abstract

The BLOSUM matrices estimate the likelihood for one amino acid to be substituted with another, and are commonly used in sequence alignments. Each BLOSUM matrix is associated with a parameter x—the matrix elements are computed based on the diversity among sequences of no more than x% similar. In an earlier work, Song et al. observed a property in the BLOSUM matrices—eigendecompositions of the matrices produce nearly identical sets of eigenvectors. Furthermore, for each eigenvector, a nearly linear trend is observed in all its eigenvalues. This property allowed Song et al. to devise an iterative alignment and matrix selection process to produce more accurate matrices. In this paper, we investigate the reasons behind this property of the BLOSUM matrices. Using this knowledge, we analyze the situations under which the property holds, and hence clarify the extent of the earlier method’s validity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Chapelle, O., Do, C.B., Teo, C.H., Le, Q.V., Smola, A.J.: Tighter bounds for structured estimation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 281–288. Curran Associates, Inc. (2009)

    Google Scholar 

  3. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Nat. Biomed. Res. Found. 5(3), 345–358 (1978)

    Google Scholar 

  4. Dewey, C.N., Huggins, P., Woods, K., Sturmfels, B., Pachter, L.: Parametric alignment of drosophila genomes. PLoS Comput. Biol. 2(6) (2006)

    Google Scholar 

  5. Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)

    Article  Google Scholar 

  6. Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. In: ISMB (Supplement of Bioinformatics), vol. 22, pp. 90–98 (2006)

    Google Scholar 

  7. Edgar, R.C.: Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 10(396) (2009)

    Google Scholar 

  8. Flannick, J., Novak, A., Do, C.B., Srinivasan, B.S., Batzoglou, S.: Automatic parameter learning for multiple local network alignment. J. Comput. Biol. 16(8), 1001–1022 (2006)

    Article  MathSciNet  Google Scholar 

  9. Gaston, H., Gonnet, M., Cohen, A., Benner, S.: Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–1445 (1992)

    Article  Google Scholar 

  10. Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. In: Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1992, pp. 432–439. Society for Industrial and Applied Mathematics, Philadelphia (1992)

    Google Scholar 

  11. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  12. Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl. Acids Res. 33(2), 511–518 (2005)

    Article  Google Scholar 

  13. Kim, E., Kececioglu, J.: Learning scoring schemes for sequence alignment from partial examples. IEEE/ACM Trans. Comput. Biol. Bioinformatics 5(4), 546–556 (2008)

    Article  Google Scholar 

  14. Kosial, C., Goldman, N.: Different versions of the dayhoff rate matrix. Mol. Biol. Evol. 22(2), 193–199 (2005)

    Article  Google Scholar 

  15. Kuznetsov, I.: Protein sequence alignment with family-specific amino acid similarity matrices. BMC Research Notes 4(1), 296 (2011)

    Article  Google Scholar 

  16. Lassmann, T., Sonnhammer, E.: Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6(1), 298 (2005)

    Article  Google Scholar 

  17. Song, D., Chen, J., Chen, G., Li, N., Li, J., Fan, J., Bu, D., Li, S.C.: Parameterized blosum matrices for protein alignment. IEEE/ACM Trans. Comput. Biol. Bioinformatics PP(99), 1 (2014)

    Google Scholar 

  18. Wang, H.-C., Susko, E., Roger, A.J.: An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation. Mol. Biol. Evol. 31(4), 779–792 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, J., Ng, Y.K., Liu, X., Li, S.C. (2015). On the Near-Linear Correlation of the Eigenvalues Across BLOSUM Matrices. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19048-8_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19047-1

  • Online ISBN: 978-3-319-19048-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics