Fast support vector machines for convolution tree kernels

Severyn, Aliaksei; Moschitti, Alessandro

doi:10.1007/s10618-012-0276-8

Fast support vector machines for convolution tree kernels

Published: 21 June 2012

Volume 25, pages 325–357, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Aliaksei Severyn¹ &
Alessandro Moschitti¹

435 Accesses
7 Citations
Explore all metrics

Abstract

Feature engineering is one of the most complex aspects of system design in machine learning. Fortunately, kernel methods provide the designer with formidable tools to tackle such complexity. Among others, tree kernels (TKs) have been successfully applied for representing structured data in diverse domains, ranging from bioinformatics and data mining to natural language processing. One drawback of such methods is that learning with them typically requires a large number of kernel computations (quadratic in the number of training examples) between training examples. However, in practice substructures often repeat in the data which makes it possible to avoid a large number of redundant kernel evaluations. In this paper, we propose the use of Directed Acyclic Graphs (DAGs) to compactly represent trees in the training algorithm of Support Vector Machines. In particular, we use DAGs for each iteration of the cutting plane algorithm (CPA) to encode the model composed by a set of trees. This enables DAG kernels to efficiently evaluate TKs between the current model and a given training tree. Consequently, the amount of total computation is reduced by avoiding redundant evaluations over shared substructures. We provide theory and algorithms to formally characterize the above idea, which we tested on several datasets. The empirical results confirm the benefits of the approach in terms of significant speedups over previous state-of-the-art methods. In addition, we propose an alternative sampling strategy within the CPA to address the class-imbalance problem, which coupled with fast learning methods provides a viable TK learning framework for a large class of real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aiolli F, Da San Martino G, Sperduti A, Moschitti A (2006) Fast on-line kernel learning for trees. In: Proceedings of the 2006 IEEE conference on data mining. IEEE Computer Society, Los Alamitos, CA, pp 787–791. http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.69
Aiolli F, Martino GDS, Sperduti A, Moschitti A (2007) Efficient kernel-based learning for trees. In: CIDM, pp 308–315
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM
Bšdi R, Herr K, Joswig M (2011) Algorithms for symmetric linear and integer programs. Mathematical programming, series A, Online First. Comments: 21 pp, 1 figure; sums up and extends results from 0908.3329 and 0908.3331
Cancedda N, Gaussier E, Goutte C, Renders JM (2003) Word sequence kernels. J Mach Learn Res 3: 1059–1082
MathSciNet MATH Google Scholar
Carreras X, Mà àrquez L (2005) Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the 9th conference on natural language learning, CoNLL-2005, Ann Arbor
Charniak E (2000) A maximum-entropy-inspired parser. In: ANLP, pp 132–139
Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: An efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: SSDBM, pp 11–20
Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL, pp 263–270
Daumé III H, Marcu D (2004) A tree-position kernel for document compression. In: Proceedings of the DUC, Boston
Denoyer L, Gallinari P (2007) Report on the xml mining track at INEX 2005 and INEX 2006: categorization and clustering of xml documents. SIGIR Forum 41: 79–90
Article Google Scholar
Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: ICML, pp 320–327
Giuglea AM, Moschitti A (2004) Knowledge Discovering using FrameNet, VerbNet and PropBank. In: Proceedings of the workshop on ontology and knowledge discovering at ECML 2004, Pisa, Italy
Giuglea AM, Moschitti A (2006) Semantic role labeling via framenet, verbnet and propbank. In: Proceedings of ACL, Sydney, Australia
Haussler D (1999) Convolution kernels on discrete structures. Tech. Rep. UCSC-CRL-99-10, University of California, Santa Cruz
Joachims T (1999) Making large-scale SVM learning practical. In: Advances in kernel methods—support vector learning, chap 11. MIT Press, Cambridge, pp 169–184
Joachims T (2005) A support vector method for multivariate performance measures. In: International conference on machine learning (ICML), pp 377–384
Joachims T (2006) Training linear SVMs in linear time. In: KDD
Joachims T, Yu CNJ (2009) Sparse kernel SVMS via cutting-plane training. Mach Learn 76(2–3): 179–193
Article Google Scholar
Kate RJ, Mooney RJ (2006) Using string-kernels for learning semantic parsers. In: ACL
Kuang R, Ie E, Wang K, Wang K, Siddiqi M, Freund Y, Leslie CS (2004) Profile-based string kernels for remote homology detection and motif extraction. In: 3rd international IEEE computer society computational systems bioinformatics conference (CSB 2004), pp 152–160
Kudo T, Matsumoto Y (2003) Fast methods for kernel-based text analysis. In: Proceedings of ACL’03
Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4): 467–476
Article Google Scholar
Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2): 313–330
Google Scholar
Mehdad Y, Moschitti A, Zanzotto FM (2010) Syntactic/semantic structures for textual entailment recognition. In: HLT-NAACL, pp 1020–1028
Moschitti A (2004) A study on convolution kernel for shallow semantic parsing. In: Proceedings of ACL’04. Barcelona, Spain
Moschitti A (2006a) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of ECML
Moschitti A (2006b) Making tree kernels practical for natural language learning. In: EACL. The Association for Computer Linguistics
Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of CIKM ’08, New York, USA
Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: Ng HT, Riloff E (eds) HLT-NAACL 2004 workshop: eighth conference on computational natural language learning (CoNLL-2004). Association for Computational Linguistics, Boston, pp 17–24
Moschitti A, Pighin D, Basili R (2008) Tree kernels for semantic role labeling. Comput Linguist 34(2): 193–224
Article MathSciNet Google Scholar
Nguyen TVT, Moschitti A (2011) Joint distant and direct supervision for relation extraction. In: Proceedings of 5th international joint conference on natural language processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 732–740.http://www.aclweb.org/anthology/I11-1082
Noreen EW (1989) Computer-intensive methods for testing hypotheses : an introduction. Wiley-Interscience, New York
Google Scholar
Padó S (2006) User’s guide to sigf: significance testing by approximate randomisation
Palmer M, Kingsbury P, Gildea D (2005) The proposition bank: an annotated corpus of semantic roles. Comput Linguist 31(1): 71–106
Article Google Scholar
Pighin D, Moschitti A (2009a) Efficient linearization of tree kernel functions. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009). Association for Computational Linguistics, Boulder, pp 30–38
Pighin D, Moschitti A (2009b) Reverse engineering of tree kernel feature spaces. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 111–120
Pighin D, Moschitti A (2010) On reverse feature engineering of syntactic tree kernels. In: Proceedings of the fourteenth conference on computational natural language learning. Association for Computational Linguistics, Uppsala, Sweden, pp 223–233
Rieck K, Krueger T, Brefeld U, Mueller KRs (2010) Approximate tree kernels. J Mach Learn Res 11: 555–580
MathSciNet Google Scholar
Saigo H, Vert J, Akutsu T, Ueda N (2004) Protein homology detection using string alignment kernels. Bioinformatics 20: 1682–1689
Article Google Scholar
Severyn A, Moschitti A (2010) Large-scale support vector learning with structural kernels. In: ECML/PKDD (3), pp 229–244
Severyn A, Moschitti A (2011) Fast support vector machines for structural kernels. In: ECML
Shasha D, Wang JTL, Zhang S (2004) Unordered tree mining with applications to phylogeny. In: ICDE, pp 708–719
Shervashidze N, Borgwardt K (2009) Fast subtree kernels on graphs. In: Proceedings of advances in neural information processing systems
Shi Q, Petterson J, Dror G, Langford J, Smola AJ, Vishwanathan SVN (2009) Hash kernels for structured data. JMLR 10: 2615–2637
MathSciNet MATH Google Scholar
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4: 1071–1105
MathSciNet Google Scholar
Termier A, Rousset MC, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546
Trentini F, Hagenbuchner M, Sperduti A, Scarselli F (2006) A self-organising map approach for clustering of xml documents. In: IJCNN, pp 1805–1812. IEEE
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6: 1453–1484
MathSciNet MATH Google Scholar
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the IJCAI, pp 55–60
Versley Y, Moschitti A, Poesio M, Yang X (2008) Coreference systems based on kernels methods. In: The 22nd international conference on computational linguistics (Coling’08). Manchester, England
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451
Wu G, Chang E (2003) Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC, pp 49–56
Xia Y, Yang Y (2005) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202
Article MathSciNet Google Scholar
Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD, pp 731–736
Yu CNJ, Joachims T (2008) Training structural svms with kernels using sampled cuts. In: KDD, pp 794–802
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of ICDM
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Trento, Via Sommarive 5, 38123, Povo, TN, Italy
Aliaksei Severyn & Alessandro Moschitti

Authors

Aliaksei Severyn
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Moschitti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aliaksei Severyn.

Additional information

Responsible editor: Dimitrios Gunopulos, Donato Malerba, Michalis Vazirgiannis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Severyn, A., Moschitti, A. Fast support vector machines for convolution tree kernels. Data Min Knowl Disc 25, 325–357 (2012). https://doi.org/10.1007/s10618-012-0276-8

Download citation

Received: 02 November 2011
Accepted: 05 June 2012
Published: 21 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10618-012-0276-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast support vector machines for convolution tree kernels

Abstract

Access this article

Similar content being viewed by others

Learning subtree pattern importance for Weisfeiler-Lehman based graph kernels

Minimum Spanning Set Selection in Graph Kernels

Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast support vector machines for convolution tree kernels

Abstract

Access this article

Similar content being viewed by others

Learning subtree pattern importance for Weisfeiler-Lehman based graph kernels

Minimum Spanning Set Selection in Graph Kernels

Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation