Abstract
The problem of computing unordered tree kernels based on exhaustive counts of subtrees has known to be #P-complete. In this paper, we develop an efficient and general unordered tree kernel based on bifoliate q -grams that are unordered trees with at most two leaves and just q nodes. First, we introduce a bifoliate q -gram profile as a sequence of the frequencies of all bifoliate q-grams embedded into a given tree. Then, we formulate a bifoliate tree kernel as an inner product of bifoliate q-gram profiles of two trees. Next, we design an efficient algorithm for computing the bifoliate tree kernel. Finally, we apply the bifoliate tree kernel to classifying glycan structures.
This work is partly supported by Grant-in-Aid for Scientific Research No. 17700138 from the Ministry of Education, Culture, Sports, Science and Technology, Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aoki, K.F., Ueda, N., Yamaguchi, A., Akutsu, T., Kanehisa, M., Mamitsuka, H.: Managing and analyzing carbohydrate data. SIGMOD Rec. 33(2), 33–38 (2004)
Doubet, S., Albersheim, P.: CarbBank. Glycobiology 2(6), 505 (1992)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Collins, M., Duffy, N.: Convolution Kernels for Natural Language. In: Proc. NIPS 2001, pp. 625–632 (2001)
Haussler, D.: Convolution Kernels on Discrete Structures, Technical Report UCSC-CRL 99-10 (1999)
Hizukuri, Y., Yamanishi, Y., Nakamura, O., Yagi, F., Goto, S., Kanehisa, M.: Extraction of leukemia specific glycan motifs in humans by computational glycomics. Carbohydrate Research 340, 2270–2278 (2005)
Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. Theoret. Comput. Sci. 143, 137–148 (1995)
Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Kashima, H., Koyanagi, T.: Kernels for Semi-Structured Data. In: Proc. ICML 2002, pp. 291–298 (2002)
Kashima, H., Sakamoto, H., Koyanagi, T.: Tree Kernels (in Japanese). J. JSAI 21(1), 113–121 (2006)
Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K.F., Ueda, N.: KEGG as a glycome informatics resource. Glycobiology 16, 63R–70R (2006)
Kuboyama, T., Hirata, K., Ohkura, N., Harao, M.: A q-gram based distance measure for ordered labeled trees. In: Proc. LLLL 2006, pp. 77–83 (2006)
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A gram distribution kernel applied to glycan classification and motif extraction. In: Proc. GIW 2006, pp. 25–34 (2006)
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A spectrum tree kernel. J. JSAI 22(2), 140–147 (2007)
Ohkura, N., Hirata, K., Kuboyama, T., Harao, M.: The q-gram distance for ordered unlabeled trees. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 189–202. Springer, Heidelberg (2005)
Ohkura, N., Hirata, K., Kuboyama, T., Nakano, S., Harao, M.: The gram distribution for rooted ordered trees. In: Proc. LLLL 2006, pp. 69–76 (2006)
Vishwanathan, S.V.N.: Kernel Methods: Fast Algorithms and Real Life Applications, PhD thesis, Indian Institute of Science, Bangalore (2002)
Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structed data. In: Proc. SIGMOD 2005, pp. 754–765 (2005)
Zhang, K., Shasha, D.: Tree pattern matching. In: Apostolico, A., Galil, Z. (eds.) Pattern matching algorithms, pp. 341–371 (1997)
Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Inform. Proc. Let. 42, 133–139 (1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F. (2008). An Efficient Unordered Tree Kernel and Its Application to Glycan Classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)