Abstract
Feature engineering is one of the most complex aspects of system design in machine learning. Fortunately, kernel methods provide the designer with formidable tools to tackle such complexity. Among others, tree kernels (TKs) have been successfully applied for representing structured data in diverse domains, ranging from bioinformatics and data mining to natural language processing. One drawback of such methods is that learning with them typically requires a large number of kernel computations (quadratic in the number of training examples) between training examples. However, in practice substructures often repeat in the data which makes it possible to avoid a large number of redundant kernel evaluations. In this paper, we propose the use of Directed Acyclic Graphs (DAGs) to compactly represent trees in the training algorithm of Support Vector Machines. In particular, we use DAGs for each iteration of the cutting plane algorithm (CPA) to encode the model composed by a set of trees. This enables DAG kernels to efficiently evaluate TKs between the current model and a given training tree. Consequently, the amount of total computation is reduced by avoiding redundant evaluations over shared substructures. We provide theory and algorithms to formally characterize the above idea, which we tested on several datasets. The empirical results confirm the benefits of the approach in terms of significant speedups over previous state-of-the-art methods. In addition, we propose an alternative sampling strategy within the CPA to address the class-imbalance problem, which coupled with fast learning methods provides a viable TK learning framework for a large class of real-world applications.
Similar content being viewed by others
References
Aiolli F, Da San Martino G, Sperduti A, Moschitti A (2006) Fast on-line kernel learning for trees. In: Proceedings of the 2006 IEEE conference on data mining. IEEE Computer Society, Los Alamitos, CA, pp 787–791. http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.69
Aiolli F, Martino GDS, Sperduti A, Moschitti A (2007) Efficient kernel-based learning for trees. In: CIDM, pp 308–315
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM
Bšdi R, Herr K, Joswig M (2011) Algorithms for symmetric linear and integer programs. Mathematical programming, series A, Online First. Comments: 21 pp, 1 figure; sums up and extends results from 0908.3329 and 0908.3331
Cancedda N, Gaussier E, Goutte C, Renders JM (2003) Word sequence kernels. J Mach Learn Res 3: 1059–1082
Carreras X, Mà àrquez L (2005) Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the 9th conference on natural language learning, CoNLL-2005, Ann Arbor
Charniak E (2000) A maximum-entropy-inspired parser. In: ANLP, pp 132–139
Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: An efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: SSDBM, pp 11–20
Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL, pp 263–270
Daumé III H, Marcu D (2004) A tree-position kernel for document compression. In: Proceedings of the DUC, Boston
Denoyer L, Gallinari P (2007) Report on the xml mining track at INEX 2005 and INEX 2006: categorization and clustering of xml documents. SIGIR Forum 41: 79–90
Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: ICML, pp 320–327
Giuglea AM, Moschitti A (2004) Knowledge Discovering using FrameNet, VerbNet and PropBank. In: Proceedings of the workshop on ontology and knowledge discovering at ECML 2004, Pisa, Italy
Giuglea AM, Moschitti A (2006) Semantic role labeling via framenet, verbnet and propbank. In: Proceedings of ACL, Sydney, Australia
Haussler D (1999) Convolution kernels on discrete structures. Tech. Rep. UCSC-CRL-99-10, University of California, Santa Cruz
Joachims T (1999) Making large-scale SVM learning practical. In: Advances in kernel methods—support vector learning, chap 11. MIT Press, Cambridge, pp 169–184
Joachims T (2005) A support vector method for multivariate performance measures. In: International conference on machine learning (ICML), pp 377–384
Joachims T (2006) Training linear SVMs in linear time. In: KDD
Joachims T, Yu CNJ (2009) Sparse kernel SVMS via cutting-plane training. Mach Learn 76(2–3): 179–193
Kate RJ, Mooney RJ (2006) Using string-kernels for learning semantic parsers. In: ACL
Kuang R, Ie E, Wang K, Wang K, Siddiqi M, Freund Y, Leslie CS (2004) Profile-based string kernels for remote homology detection and motif extraction. In: 3rd international IEEE computer society computational systems bioinformatics conference (CSB 2004), pp 152–160
Kudo T, Matsumoto Y (2003) Fast methods for kernel-based text analysis. In: Proceedings of ACL’03
Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4): 467–476
Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2): 313–330
Mehdad Y, Moschitti A, Zanzotto FM (2010) Syntactic/semantic structures for textual entailment recognition. In: HLT-NAACL, pp 1020–1028
Moschitti A (2004) A study on convolution kernel for shallow semantic parsing. In: Proceedings of ACL’04. Barcelona, Spain
Moschitti A (2006a) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of ECML
Moschitti A (2006b) Making tree kernels practical for natural language learning. In: EACL. The Association for Computer Linguistics
Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of CIKM ’08, New York, USA
Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: Ng HT, Riloff E (eds) HLT-NAACL 2004 workshop: eighth conference on computational natural language learning (CoNLL-2004). Association for Computational Linguistics, Boston, pp 17–24
Moschitti A, Pighin D, Basili R (2008) Tree kernels for semantic role labeling. Comput Linguist 34(2): 193–224
Nguyen TVT, Moschitti A (2011) Joint distant and direct supervision for relation extraction. In: Proceedings of 5th international joint conference on natural language processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 732–740.http://www.aclweb.org/anthology/I11-1082
Noreen EW (1989) Computer-intensive methods for testing hypotheses : an introduction. Wiley-Interscience, New York
Padó S (2006) User’s guide to sigf: significance testing by approximate randomisation
Palmer M, Kingsbury P, Gildea D (2005) The proposition bank: an annotated corpus of semantic roles. Comput Linguist 31(1): 71–106
Pighin D, Moschitti A (2009a) Efficient linearization of tree kernel functions. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009). Association for Computational Linguistics, Boulder, pp 30–38
Pighin D, Moschitti A (2009b) Reverse engineering of tree kernel feature spaces. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 111–120
Pighin D, Moschitti A (2010) On reverse feature engineering of syntactic tree kernels. In: Proceedings of the fourteenth conference on computational natural language learning. Association for Computational Linguistics, Uppsala, Sweden, pp 223–233
Rieck K, Krueger T, Brefeld U, Mueller KRs (2010) Approximate tree kernels. J Mach Learn Res 11: 555–580
Saigo H, Vert J, Akutsu T, Ueda N (2004) Protein homology detection using string alignment kernels. Bioinformatics 20: 1682–1689
Severyn A, Moschitti A (2010) Large-scale support vector learning with structural kernels. In: ECML/PKDD (3), pp 229–244
Severyn A, Moschitti A (2011) Fast support vector machines for structural kernels. In: ECML
Shasha D, Wang JTL, Zhang S (2004) Unordered tree mining with applications to phylogeny. In: ICDE, pp 708–719
Shervashidze N, Borgwardt K (2009) Fast subtree kernels on graphs. In: Proceedings of advances in neural information processing systems
Shi Q, Petterson J, Dror G, Langford J, Smola AJ, Vishwanathan SVN (2009) Hash kernels for structured data. JMLR 10: 2615–2637
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4: 1071–1105
Termier A, Rousset MC, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546
Trentini F, Hagenbuchner M, Sperduti A, Scarselli F (2006) A self-organising map approach for clustering of xml documents. In: IJCNN, pp 1805–1812. IEEE
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6: 1453–1484
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the IJCAI, pp 55–60
Versley Y, Moschitti A, Poesio M, Yang X (2008) Coreference systems based on kernels methods. In: The 22nd international conference on computational linguistics (Coling’08). Manchester, England
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451
Wu G, Chang E (2003) Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC, pp 49–56
Xia Y, Yang Y (2005) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202
Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD, pp 731–736
Yu CNJ, Joachims T (2008) Training structural svms with kernels using sampled cuts. In: KDD, pp 794–802
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of ICDM
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Dimitrios Gunopulos, Donato Malerba, Michalis Vazirgiannis.
Rights and permissions
About this article
Cite this article
Severyn, A., Moschitti, A. Fast support vector machines for convolution tree kernels. Data Min Knowl Disc 25, 325–357 (2012). https://doi.org/10.1007/s10618-012-0276-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0276-8