Skip to main content

Classification of Biological Sequences with Kernel Methods

  • Conference paper
Grammatical Inference: Algorithms and Applications (ICGI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4201))

Included in the following conference series:

  • 590 Accesses

Abstract

We survey the foundations of kernel methods and the recent developments of kernels for variable-length strings, in the context of biological sequence analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Schölkopf, B., Tsuda, K., Vert, J.P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)

    Google Scholar 

  2. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  3. Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Mach. Learn. 57(3), 205–232 (2004)

    Article  MATH  Google Scholar 

  4. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  5. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  6. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  7. Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  8. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: a string kernel for SVM protein classification. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauerdale, K., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing 2002, pp. 564–575. World Scientific, Singapore (2002)

    Google Scholar 

  9. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)

    Article  Google Scholar 

  10. Wang, M., Yang, J., Liu, G.P., Xu, Z.J., Chou, K.C.: Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Eng. Des. Sel. 17(6), 509–516 (2004)

    Article  Google Scholar 

  11. Zhang, S.W., Pan, Q., Zhang, H.C., Zhang, Y.L., Wang, H.Y.: Classification of protein quaternary structure with support vector machine. Bioinformatics 19(18), 2390–2396 (2003)

    Article  Google Scholar 

  12. Logan, B., Moreno, P., Suzek, B., Weng, Z., Kasif, S.: A Study of Remote Homology Detection. Technical Report CRL 2001/05, Compaq Cambridge Research laboratory (2001)

    Google Scholar 

  13. Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 19(suppl. 1), i26–i33 (2003)

    Article  Google Scholar 

  14. Liao, L., Noble, W.: Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. J. Comput. Biol. 10(6), 857–868 (2003)

    Article  Google Scholar 

  15. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  16. Jaakkola, T., Diekhans, M., Haussler, D.: A Discriminative Framework for Detecting Remote Protein Homologies. J. Comput. Biol. 7(1,2), 95–114 (2000)

    Article  Google Scholar 

  17. Seeger, M.: Covariance Kernels from Bayesian Generative Models. In: Adv. Neural Inform. Process. Syst., vol. 14, pp. 905–912 (2002)

    Google Scholar 

  18. Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Network 18(4), 1111–1123 (2005)

    Article  Google Scholar 

  19. Cuturi, M., Vert, J.P.: Semigroup kernels on finite sets. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Adv. Neural Inform. Process. Syst., vol. 17, pp. 329–336. MIT Press, Cambridge (2005)

    Google Scholar 

  20. Tsuda, K., Kin, T., Asai, K.: Marginalized Kernels for Biological Sequences. Bioinformatics 18, S268–S275 (2002)

    Google Scholar 

  21. Vert, J.P., Thurman, R., Noble, W.S.: Kernels for gene regulatory regions. In: Adv. Neural. Inform. Process Syst. (2006)

    Google Scholar 

  22. Kin, T., Tsuda, K., Asai, K.: Marginalized kernels for RNA sequence data analysis. In: Lathtop, R., Nakai, K., Miyano, S., Takagi, T., Kanehisa, M. (eds.) Genome Informatics 2002, pp. 112–122. Universal Academic Press, London (2002)

    Google Scholar 

  23. Kashima, H., Tsuda, K., Inokuchi, A.: Kernels for graphs. In: Schölkopf, B., Tsuda, K., Vert, J. (eds.) Kernel Methods in Computational Biology, pp. 155–170. MIT Press, Cambridge (2004)

    Google Scholar 

  24. Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)

    Google Scholar 

  25. Vert, J.P., Saigo, H., Akutsu, T.: Local alignment kernels for biological sequences. In: Schölkopf, B., Tsuda, K., Vert, J. (eds.) Kernel Methods in Computational Biology, pp. 131–154. MIT Press, Cambridge (2004)

    Google Scholar 

  26. Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)

    Article  Google Scholar 

  27. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  28. Shimodaira, H., Noma, K.I., Nakai, M., Sagayama, S.: Dynamic time-alignment kernel in support vector machine. In: Adv. Neural. Inform. Process Syst., pp. 921–928 (2001)

    Google Scholar 

  29. Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)

    Article  Google Scholar 

  30. Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18, 147–159 (2002)

    Article  Google Scholar 

  31. Cai, C., Wang, W., Sun, L., Chen, Y.: Protein function classification via support vector machine approach. Math. Biosci. 185(2), 111–122 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  32. Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)

    Article  Google Scholar 

  33. Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13), 1656–1663 (2003)

    Article  Google Scholar 

  34. Matsuda, A., Vert, J.P., Saigo, H., Ueda, N., Toh, H., Akutsu, T.: A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14(11), 2804–2813 (2005)

    Article  Google Scholar 

  35. Karklin, Y., Meraz, R.F., Holbrook, S.R.: Classification of non-coding RNA using graph representations of secondary structure. In: Pac. Symp. Biocomput., pp. 4–15 (2005)

    Google Scholar 

  36. Zhang, X.H.F., Heller, K.A., Hefter, I., Leslie, C.S., Chasin, L.A.: Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification. Genome Res. 13(12), 2637–2650 (2003)

    Article  Google Scholar 

  37. Dror, G., Sorek, R., Shamir, R.: Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7), 897–901 (2005)

    Article  Google Scholar 

  38. Friedel, C.C., Jahn, K.H.V., Sommer, S., Rudd, S., Mewes, H.W., Tetko, I.V.: Support vector machines for separation of mixed plant-pathogen EST collections based on codon usage. Bioinformatics 21, 1383–1388 (2005)

    Article  Google Scholar 

  39. Rose, J.R., Turkett, W.H.J., Oroian, I.C., Laegreid, W.W., Keele, J.: Correlation of amino acid preference and mammalian viral genome type. Bioinformatics (2005)

    Google Scholar 

  40. Lin, K., Kuang, Y., Joseph, J.S., Kolatkar, P.R.: Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. Nucl. Acids Res. 30(11), 2599–2607 (2002)

    Article  Google Scholar 

  41. Lanckriet, G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.: Learning the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vert, JP. (2006). Classification of Biological Sequences with Kernel Methods. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_2

Download citation

  • DOI: https://doi.org/10.1007/11872436_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45264-5

  • Online ISBN: 978-3-540-45265-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics