Skip to main content
Log in

Fast support vector machines for convolution tree kernels

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Feature engineering is one of the most complex aspects of system design in machine learning. Fortunately, kernel methods provide the designer with formidable tools to tackle such complexity. Among others, tree kernels (TKs) have been successfully applied for representing structured data in diverse domains, ranging from bioinformatics and data mining to natural language processing. One drawback of such methods is that learning with them typically requires a large number of kernel computations (quadratic in the number of training examples) between training examples. However, in practice substructures often repeat in the data which makes it possible to avoid a large number of redundant kernel evaluations. In this paper, we propose the use of Directed Acyclic Graphs (DAGs) to compactly represent trees in the training algorithm of Support Vector Machines. In particular, we use DAGs for each iteration of the cutting plane algorithm (CPA) to encode the model composed by a set of trees. This enables DAG kernels to efficiently evaluate TKs between the current model and a given training tree. Consequently, the amount of total computation is reduced by avoiding redundant evaluations over shared substructures. We provide theory and algorithms to formally characterize the above idea, which we tested on several datasets. The empirical results confirm the benefits of the approach in terms of significant speedups over previous state-of-the-art methods. In addition, we propose an alternative sampling strategy within the CPA to address the class-imbalance problem, which coupled with fast learning methods provides a viable TK learning framework for a large class of real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aiolli F, Da San Martino G, Sperduti A, Moschitti A (2006) Fast on-line kernel learning for trees. In: Proceedings of the 2006 IEEE conference on data mining. IEEE Computer Society, Los Alamitos, CA, pp 787–791. http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.69

  • Aiolli F, Martino GDS, Sperduti A, Moschitti A (2007) Efficient kernel-based learning for trees. In: CIDM, pp 308–315

  • Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM

  • Bšdi R, Herr K, Joswig M (2011) Algorithms for symmetric linear and integer programs. Mathematical programming, series A, Online First. Comments: 21 pp, 1 figure; sums up and extends results from 0908.3329 and 0908.3331

  • Cancedda N, Gaussier E, Goutte C, Renders JM (2003) Word sequence kernels. J Mach Learn Res 3: 1059–1082

    MathSciNet  MATH  Google Scholar 

  • Carreras X, Mà àrquez L (2005) Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the 9th conference on natural language learning, CoNLL-2005, Ann Arbor

  • Charniak E (2000) A maximum-entropy-inspired parser. In: ANLP, pp 132–139

  • Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: An efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: SSDBM, pp 11–20

  • Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL, pp 263–270

  • Daumé III H, Marcu D (2004) A tree-position kernel for document compression. In: Proceedings of the DUC, Boston

  • Denoyer L, Gallinari P (2007) Report on the xml mining track at INEX 2005 and INEX 2006: categorization and clustering of xml documents. SIGIR Forum 41: 79–90

    Article  Google Scholar 

  • Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: ICML, pp 320–327

  • Giuglea AM, Moschitti A (2004) Knowledge Discovering using FrameNet, VerbNet and PropBank. In: Proceedings of the workshop on ontology and knowledge discovering at ECML 2004, Pisa, Italy

  • Giuglea AM, Moschitti A (2006) Semantic role labeling via framenet, verbnet and propbank. In: Proceedings of ACL, Sydney, Australia

  • Haussler D (1999) Convolution kernels on discrete structures. Tech. Rep. UCSC-CRL-99-10, University of California, Santa Cruz

  • Joachims T (1999) Making large-scale SVM learning practical. In: Advances in kernel methods—support vector learning, chap 11. MIT Press, Cambridge, pp 169–184

  • Joachims T (2005) A support vector method for multivariate performance measures. In: International conference on machine learning (ICML), pp 377–384

  • Joachims T (2006) Training linear SVMs in linear time. In: KDD

  • Joachims T, Yu CNJ (2009) Sparse kernel SVMS via cutting-plane training. Mach Learn 76(2–3): 179–193

    Article  Google Scholar 

  • Kate RJ, Mooney RJ (2006) Using string-kernels for learning semantic parsers. In: ACL

  • Kuang R, Ie E, Wang K, Wang K, Siddiqi M, Freund Y, Leslie CS (2004) Profile-based string kernels for remote homology detection and motif extraction. In: 3rd international IEEE computer society computational systems bioinformatics conference (CSB 2004), pp 152–160

  • Kudo T, Matsumoto Y (2003) Fast methods for kernel-based text analysis. In: Proceedings of ACL’03

  • Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4): 467–476

    Article  Google Scholar 

  • Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2): 313–330

    Google Scholar 

  • Mehdad Y, Moschitti A, Zanzotto FM (2010) Syntactic/semantic structures for textual entailment recognition. In: HLT-NAACL, pp 1020–1028

  • Moschitti A (2004) A study on convolution kernel for shallow semantic parsing. In: Proceedings of ACL’04. Barcelona, Spain

  • Moschitti A (2006a) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of ECML

  • Moschitti A (2006b) Making tree kernels practical for natural language learning. In: EACL. The Association for Computer Linguistics

  • Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of CIKM ’08, New York, USA

  • Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: Ng HT, Riloff E (eds) HLT-NAACL 2004 workshop: eighth conference on computational natural language learning (CoNLL-2004). Association for Computational Linguistics, Boston, pp 17–24

  • Moschitti A, Pighin D, Basili R (2008) Tree kernels for semantic role labeling. Comput Linguist 34(2): 193–224

    Article  MathSciNet  Google Scholar 

  • Nguyen TVT, Moschitti A (2011) Joint distant and direct supervision for relation extraction. In: Proceedings of 5th international joint conference on natural language processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 732–740.http://www.aclweb.org/anthology/I11-1082

  • Noreen EW (1989) Computer-intensive methods for testing hypotheses : an introduction. Wiley-Interscience, New York

    Google Scholar 

  • Padó S (2006) User’s guide to sigf: significance testing by approximate randomisation

  • Palmer M, Kingsbury P, Gildea D (2005) The proposition bank: an annotated corpus of semantic roles. Comput Linguist 31(1): 71–106

    Article  Google Scholar 

  • Pighin D, Moschitti A (2009a) Efficient linearization of tree kernel functions. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009). Association for Computational Linguistics, Boulder, pp 30–38

  • Pighin D, Moschitti A (2009b) Reverse engineering of tree kernel feature spaces. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 111–120

  • Pighin D, Moschitti A (2010) On reverse feature engineering of syntactic tree kernels. In: Proceedings of the fourteenth conference on computational natural language learning. Association for Computational Linguistics, Uppsala, Sweden, pp 223–233

  • Rieck K, Krueger T, Brefeld U, Mueller KRs (2010) Approximate tree kernels. J Mach Learn Res 11: 555–580

    MathSciNet  Google Scholar 

  • Saigo H, Vert J, Akutsu T, Ueda N (2004) Protein homology detection using string alignment kernels. Bioinformatics 20: 1682–1689

    Article  Google Scholar 

  • Severyn A, Moschitti A (2010) Large-scale support vector learning with structural kernels. In: ECML/PKDD (3), pp 229–244

  • Severyn A, Moschitti A (2011) Fast support vector machines for structural kernels. In: ECML

  • Shasha D, Wang JTL, Zhang S (2004) Unordered tree mining with applications to phylogeny. In: ICDE, pp 708–719

  • Shervashidze N, Borgwardt K (2009) Fast subtree kernels on graphs. In: Proceedings of advances in neural information processing systems

  • Shi Q, Petterson J, Dror G, Langford J, Smola AJ, Vishwanathan SVN (2009) Hash kernels for structured data. JMLR 10: 2615–2637

    MathSciNet  MATH  Google Scholar 

  • Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4: 1071–1105

    MathSciNet  Google Scholar 

  • Termier A, Rousset MC, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546

  • Trentini F, Hagenbuchner M, Sperduti A, Scarselli F (2006) A self-organising map approach for clustering of xml documents. In: IJCNN, pp 1805–1812. IEEE

  • Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6: 1453–1484

    MathSciNet  MATH  Google Scholar 

  • Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the IJCAI, pp 55–60

  • Versley Y, Moschitti A, Poesio M, Yang X (2008) Coreference systems based on kernels methods. In: The 22nd international conference on computational linguistics (Coling’08). Manchester, England

  • Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451

  • Wu G, Chang E (2003) Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC, pp 49–56

  • Xia Y, Yang Y (2005) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202

    Article  MathSciNet  Google Scholar 

  • Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD, pp 731–736

  • Yu CNJ, Joachims T (2008) Training structural svms with kernels using sampled cuts. In: KDD, pp 794–802

  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of ICDM

  • Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aliaksei Severyn.

Additional information

Responsible editor: Dimitrios Gunopulos, Donato Malerba, Michalis Vazirgiannis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Severyn, A., Moschitti, A. Fast support vector machines for convolution tree kernels. Data Min Knowl Disc 25, 325–357 (2012). https://doi.org/10.1007/s10618-012-0276-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0276-8

Keywords

Navigation