The Gapped Spectrum Kernel for Support Vector Machines

Onodera, Taku; Shibuya, Tetsuo

doi:10.1007/978-3-642-39712-7_1

Taku Onodera²⁰ &
Tetsuo Shibuya²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

4482 Accesses

Abstract

We consider the problem of classifying string data faster and more accurately. This problem naturally arises in various fields that involve the analysis of huge amount of strings such as computational biology. Our solution, a new string kernel we call gapped spectrum kernel, yields a kind of sequence of kernels that interpolates faster and less accurate string kernels such as the spectrum kernel and slower and more accurate ones such as the wildcard kernel. As a result, we obtain an algorithm to compute the wildcard kernel that is provably faster than the state-of-the-art method. The recently introduced b-suffix array data structure plays an important role here. Another result is a better trade-off between the speed and accuracy of classification, which we demonstrate by protein classification experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Space-Efficient Feature Maps for String Alignment Kernels

Article Open access 18 May 2020

A Framework for Space-Efficient String Kernels

Support Vector Machine: Applications and Improvements Using Evolutionary Algorithms

References

Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B., Rätsch, G.: Support vector machines and kernels for computational biology. PLoS Computational Biology 4(10), e1000173 (2008)
Google Scholar
Asa, B.-H., Noble, W.S.: Kernel methods for predicting protein-protein interactions. In: ISMB (Supplement of Bioinformatics), pp. 38–46 (2005)
Google Scholar
Chandonia, J.-M., Hon, G., Walker, N.S., Conte, L.L., Koehl, P., Levitt, M., Brenner, S.E.: The ASTRAL Compendium in 2004. Nucleic Acids Research 32(Database-Issue), 189–192 (2004)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM TIST 2(3), 27 (2011)
Google Scholar
Farach, M.: Optimal Suffix Tree Construction with Large Alphabets. In: FOCS, pp. 137–143. IEEE Computer Society (1997)
Google Scholar
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher Kernel Method to Detect Remote Protein Homologies. In: Lengauer, T., Schneider, R., Bork, P., Brutlag, D.L., Glasgow, J.I., Mewes, H.-W., Zimmer, R. (eds.) ISMB, pp. 149–158. AAAI (1999)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.S.: Profile-Based String Kernels for Remote Homology Detection and Motif Extraction. In: CSB, pp. 152–160. IEEE Computer Society (2004)
Google Scholar
Kuksa, P.P., Huang, P.-H., Pavlovic, V.: Scalable Algorithms for String Kernels with Inexact Matching. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) NIPS, pp. 881–888. Curran Associates, Inc. (2008)
Google Scholar
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)
Article Google Scholar
Leslie, C.S., Eskin, E., Noble, W.S.: The Spectrum Kernel: A String Kernel for SVM Protein Classification. In: Pacific Symposium on Biocomputing, pp. 566–575 (2002)
Google Scholar
Leslie, C.S., Eskin, E., Weston, J., Noble, W.S.: Mismatch String Kernels for SVM Protein Classification. In: Becker, S., Thrun, S., Obermayer, K. (eds.) NIPS, pp. 1417–1424. MIT Press (2002)
Google Scholar
Leslie, C.S., Kuang, R.: Fast Kernels for Inexact String Matching. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 114–128. Springer, Heidelberg (2003)
Chapter Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In: RECOMB, pp. 225–232 (2002)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text Classification using String Kernels. Journal of Machine Learning Research 2, 419–444 (2002)
MATH Google Scholar
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. In: Johnson, D.S. (ed.) SODA, pp. 319–327. SIAM (1990)
Google Scholar
Metz, C.E.: Basic principles of ROC analysis. Seminars in Nuclear Medicine 8(4), 283–298 (1978)
Article Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)
Google Scholar
Noble, W.S., Kuehn, S., Thurman, R.E., Yu, M., Stamatoyannopoulos, J.A.: Predicting the in vivo signature of human gene regulatory sequence. In: ISMB (Supplement of Bioinformatics), pp. 328–343 (2005)
Google Scholar
Onodera, T., Shibuya, T.: An Index Structure for Spaced Seed Search. In: Asano, T., Nakano, S., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 764–772. Springer, Heidelberg (2011)
Chapter Google Scholar
Swamidass, S.J., Chen, J.H., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. In: ISMB (Supplement of Bioinformatics), pp. 359–368 (2005)
Google Scholar
Vapnik, V.: Statistical learning theory (1998)
Google Scholar
Weiner, P.: Linear Pattern Matching Algorithms. In: SWAT (FOCS), pp. 1–11. IEEE Computer Society (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan
Taku Onodera & Tetsuo Shibuya

Authors

Taku Onodera
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuo Shibuya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Onodera, T., Shibuya, T. (2013). The Gapped Spectrum Kernel for Support Vector Machines. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-39712-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39711-0
Online ISBN: 978-3-642-39712-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics