Abstract
We survey the foundations of kernel methods and the recent developments of kernels for variable-length strings, in the context of biological sequence analysis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Schölkopf, B., Tsuda, K., Vert, J.P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Mach. Learn. 57(3), 205–232 (2004)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: a string kernel for SVM protein classification. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauerdale, K., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing 2002, pp. 564–575. World Scientific, Singapore (2002)
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)
Wang, M., Yang, J., Liu, G.P., Xu, Z.J., Chou, K.C.: Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Eng. Des. Sel. 17(6), 509–516 (2004)
Zhang, S.W., Pan, Q., Zhang, H.C., Zhang, Y.L., Wang, H.Y.: Classification of protein quaternary structure with support vector machine. Bioinformatics 19(18), 2390–2396 (2003)
Logan, B., Moreno, P., Suzek, B., Weng, Z., Kasif, S.: A Study of Remote Homology Detection. Technical Report CRL 2001/05, Compaq Cambridge Research laboratory (2001)
Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 19(suppl. 1), i26–i33 (2003)
Liao, L., Noble, W.: Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. J. Comput. Biol. 10(6), 857–868 (2003)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Jaakkola, T., Diekhans, M., Haussler, D.: A Discriminative Framework for Detecting Remote Protein Homologies. J. Comput. Biol. 7(1,2), 95–114 (2000)
Seeger, M.: Covariance Kernels from Bayesian Generative Models. In: Adv. Neural Inform. Process. Syst., vol. 14, pp. 905–912 (2002)
Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Network 18(4), 1111–1123 (2005)
Cuturi, M., Vert, J.P.: Semigroup kernels on finite sets. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Adv. Neural Inform. Process. Syst., vol. 17, pp. 329–336. MIT Press, Cambridge (2005)
Tsuda, K., Kin, T., Asai, K.: Marginalized Kernels for Biological Sequences. Bioinformatics 18, S268–S275 (2002)
Vert, J.P., Thurman, R., Noble, W.S.: Kernels for gene regulatory regions. In: Adv. Neural. Inform. Process Syst. (2006)
Kin, T., Tsuda, K., Asai, K.: Marginalized kernels for RNA sequence data analysis. In: Lathtop, R., Nakai, K., Miyano, S., Takagi, T., Kanehisa, M. (eds.) Genome Informatics 2002, pp. 112–122. Universal Academic Press, London (2002)
Kashima, H., Tsuda, K., Inokuchi, A.: Kernels for graphs. In: Schölkopf, B., Tsuda, K., Vert, J. (eds.) Kernel Methods in Computational Biology, pp. 155–170. MIT Press, Cambridge (2004)
Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)
Vert, J.P., Saigo, H., Akutsu, T.: Local alignment kernels for biological sequences. In: Schölkopf, B., Tsuda, K., Vert, J. (eds.) Kernel Methods in Computational Biology, pp. 131–154. MIT Press, Cambridge (2004)
Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Shimodaira, H., Noma, K.I., Nakai, M., Sagayama, S.: Dynamic time-alignment kernel in support vector machine. In: Adv. Neural. Inform. Process Syst., pp. 921–928 (2001)
Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)
Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18, 147–159 (2002)
Cai, C., Wang, W., Sun, L., Chen, Y.: Protein function classification via support vector machine approach. Math. Biosci. 185(2), 111–122 (2003)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)
Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13), 1656–1663 (2003)
Matsuda, A., Vert, J.P., Saigo, H., Ueda, N., Toh, H., Akutsu, T.: A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14(11), 2804–2813 (2005)
Karklin, Y., Meraz, R.F., Holbrook, S.R.: Classification of non-coding RNA using graph representations of secondary structure. In: Pac. Symp. Biocomput., pp. 4–15 (2005)
Zhang, X.H.F., Heller, K.A., Hefter, I., Leslie, C.S., Chasin, L.A.: Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification. Genome Res. 13(12), 2637–2650 (2003)
Dror, G., Sorek, R., Shamir, R.: Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7), 897–901 (2005)
Friedel, C.C., Jahn, K.H.V., Sommer, S., Rudd, S., Mewes, H.W., Tetko, I.V.: Support vector machines for separation of mixed plant-pathogen EST collections based on codon usage. Bioinformatics 21, 1383–1388 (2005)
Rose, J.R., Turkett, W.H.J., Oroian, I.C., Laegreid, W.W., Keele, J.: Correlation of amino acid preference and mammalian viral genome type. Bioinformatics (2005)
Lin, K., Kuang, Y., Joseph, J.S., Kolatkar, P.R.: Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. Nucl. Acids Res. 30(11), 2599–2607 (2002)
Lanckriet, G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.: Learning the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res. 5, 27–72 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vert, JP. (2006). Classification of Biological Sequences with Kernel Methods. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_2
Download citation
DOI: https://doi.org/10.1007/11872436_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)