Abstract
In this paper, we present a new kind of gapped string kernel, named length-weighted kernels, including p-length-weighted and all-length-weighted kernels. Moreover, we propose a dynamic programming algorithm based on suffix kernel to compute the length-weighted kernels. Given strings s and t, and a gap penalty λ, all-length-weighted kernel can be calculated in time O(|s||t|) using our algorithms. Based on the relationship between all-length and p-length kernels, the p-length-weighted can be computed in O(p|s||t|) time. Furthermore, a bit-parallel technique is used to reduce the complexity from O(p|s||t|) to O(⌈pk/w⌉|s||t|), where w is the word size of the machine (e.g. 32 or 64 in practice) and k is determined by the longest matching subsequence of two strings s and t. The empirical results suggest that this bit-parallel technique algorithm combined with dynamic programming and suffix kernel technique outperforms the other approaches in some cases where the necessary condition of using bit-parallel technique can be satisfied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (2000)
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Shawe-Taylor, C., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)
Watkins, C.: Dynamic alignment kernels. Technical Report CSD-TR-98-11, Department of Computer Science, Royal Holloway University of London (1999)
Haussler, D.: Convolution kernels on discrete structures. Technical report, UC Santa Cruz (1999)
Leslie, C., Kuang, R.: Fast String Kernels using Inexact Matching for Protein Sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Dordrecht (2002)
Lodhi, H., Saunders, C., Shawe-Taylor, C., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Yin, C.H., Tian, S.F., Mu, S.M.: Detecting Anomalous Process Using Gapped String Kernels. Journal of Computational Information Systems (accepted)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the pacific biocomputing Symposium (2002)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Proceedings of Neural Information Processing Systems (2002)
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.-M.: Word-Sequence Kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)
Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels on large alphabets. Journal of Machine Learning Research 6, 1323–1344 (2005)
Hyyrö, H., Navarro, G.: Bit-Parallel Witnesses and Their Applications to Approximate String Matching. Algorithmic 41, 203–231 (2004)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 3, 395–415 (1999)
Forrest, S., Hofmeyr, S.A., Somayaji, A.: Longstaff. T.A.: A Sense of Self for UNIX Processes. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 120–128 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, C., Tian, S., Mu, S. (2006). A Fast Bit-Parallel Algorithm for Gapped String Kernels. In: King, I., Wang, J., Chan, LW., Wang, D. (eds) Neural Information Processing. ICONIP 2006. Lecture Notes in Computer Science, vol 4232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893028_71
Download citation
DOI: https://doi.org/10.1007/11893028_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46479-2
Online ISBN: 978-3-540-46480-8
eBook Packages: Computer ScienceComputer Science (R0)