Skip to main content

A Class of Evolution-Based Kernels for Protein Homology Analysis: A Generalization of the PAM Model

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5542))

Included in the following conference series:

Abstract

There are two desirable properties that a pair-wise similarity measure between amino acid sequences should possess in order to produce good performance in protein homology analysis. First, it is the presence of kernel properties that allow using popular and well-performing computational tools designed for linear spaces, like SVM and k-means. Second, it is very important to take into account common evolutionary descent of homologous proteins. However, none of the existing similarity measures possesses both of these properties at once. In this paper, we propose a simple probabilistic evolution model of amino acid sequences that is built as a straightforward generalization of the PAM evolution model of single amino acids. This model produces a class of kernel functions each of which is computed as the likelihood of the hypothesis that both sequences are results of two independent evolutionary transformations of a hidden common ancestor under some specific assumptions on the evolution mechanism. The proposed class of kernels is rather wide and contains as particular subclasses not only the family of J.-P Vert’s local alignment kernels, whose algebraic structure was introduced without any evolutionary motivation, but also some other families of local and global kernels. We demonstrate, via k-means clustering of a set of amino acid sequences from the VIDA database, that the global kernel can be useful in bringing together otherwise very different protein families.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino-acid sequence of two proteins. J.of Molecular Biology 48, 443–453 (1970)

    Article  CAS  Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. of Mol. Biol. 147, 195–197 (1981)

    Article  CAS  Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Pearson, W.R.: Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132, 185–219 (2000)

    CAS  PubMed  Google Scholar 

  5. Mirkin, B., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Aggregating homologous protein families in evolutionary reconstructions of herpesviruses. In: Ashlock, D. (ed.) Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 255–262 (2006)

    Google Scholar 

  6. Rocha, J., Rossello, F., Segura, J.: The Universal Similarity Metric does not detect domain similarity. Technical Report, Quantitative Methods, Q-bio QM (2006), http://arxiv.org/abs/q-bio/0603007

  7. Vinga, S., Almeida, J.: Alignment-free sequence comparison – A review. Bioinformatics 19, 513–523 (2003)

    Article  CAS  PubMed  Google Scholar 

  8. Vert, J.-P., Saigo, H., Akutsu, T.: Local alignment kernels for biological sequences. In: Scholkopf, B., Tsuda, K., Vert, J.P. (eds.) Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)

    Google Scholar 

  9. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  10. Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21(23), 4239–4247 (2005)

    Article  CAS  PubMed  Google Scholar 

  11. Qiu, J., Hue, M., Ben-Hur, A., Vert, J.-P., Noble, W.S.: A structural alignment kernel for protein structures. Bioinformatics 23(9), 1090–1098 (2007)

    Article  CAS  PubMed  Google Scholar 

  12. Sun, L., Ji, S., Ye, J.: Adaptive diffusion kernel learning from biological networks for protein function prediction. BMC Bioinformatics 9, 162 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for dis-criminative protein classification. Bioinformatics 20(4), 467–476 (2004)

    Article  CAS  PubMed  Google Scholar 

  14. Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)

    Google Scholar 

  15. Cuturi, M., Vert, J.-P.: A mutual information kernel for sequences. In: Proc. of IEEE Int. Joint Conference on Neural Networks, vol. 3, pp. 1905–1910 (2004)

    Google Scholar 

  16. Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution 33, 114–124 (1991)

    Article  CAS  PubMed  Google Scholar 

  17. Miklos, I., Lunter, G.A., Holmes, I.: A “long indel” model for evolutionary sequence alignment. Molecular Biology and Evolution 21(3), 529–540 (2004)

    Article  CAS  PubMed  Google Scholar 

  18. Miklos, I., Novak, A., Satija, R., Lyngso, R., Hein, J.: Stochastic models of sequence evolution including insertion-deletion events. Statistical methods in medical research 29 (2008)

    Google Scholar 

  19. Metzler, D.: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19, 490–499 (2003)

    Article  CAS  PubMed  Google Scholar 

  20. Dayhoff, M.O., Schwarts, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequences and Structures 5(suppl. 3), 345–352 (1978)

    Google Scholar 

  21. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sulimova, V., Mottl, V., Kulikowski, C., Muchnik, I.: Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families. In: DIMACS Technical Report 2008-16. DIMACS Technical Report 2008-16, Rutgers University, 17 p. (2008), ftp://dimacs.rutgers.edu/pub/dimacs/TechicalReports/TechReports/2008/2008-16.pdf

  23. Mercer, T.: Functions of positive and negative type and their connection with the theory of integral equations. Trans. London. Philos. Soc. A 209, 415–416 (1999)

    Article  Google Scholar 

  24. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. Wiley, New York (1990)

    Book  Google Scholar 

  25. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman and Hall/CRC, Boca Raton (2005)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sulimova, V., Mottl, V., Mirkin, B., Muchnik, I., Kulikowski, C. (2009). A Class of Evolution-Based Kernels for Protein Homology Analysis: A Generalization of the PAM Model. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds) Bioinformatics Research and Applications. ISBRA 2009. Lecture Notes in Computer Science(), vol 5542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01551-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01551-9_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01550-2

  • Online ISBN: 978-3-642-01551-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics