Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr $$\ddot{o}$$ m Methods

Croce, Danilo; Basili, Roberto

doi:10.1007/978-3-319-30671-1_8

Danilo Croce²¹ &
Roberto Basili²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4568 Accesses

Abstract

Kernel methods have been used by many Machine Learning paradigms, achieving state-of-the-art performances in many Language Learning tasks. One drawback of expressive kernel functions, such as Sequence or Tree kernels, is the time and space complexity required both in learning and classification. In this paper, the Nystr$\ddot{o}$m methodology is studied as a viable solution to face these scalability issues. By mapping data in low-dimensional spaces as kernel space approximations, the proposed methodology positively impacts on scalability through compact linear representation of highly structured data. Computation can be also distributed on several machines by adopting the so-called Ensemble Nystr$\ddot{o}$m Method. Experimental results show that an accuracy comparable with state-of-the-art kernel-based methods can be obtained by reducing of orders of magnitude the required operations and enabling the adoption of datasets containing more than one million examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multilingual Tokenization and Part-of-speech Tagging. Lightweight Versus Heavyweight Algorithms

Impact of Quantization on Large Language Models for Portuguese Classification Tasks

A survey on multilingual large language models: corpora, alignment, and bias

Article Open access 03 April 2025

Notes

1.
We are referring to the PA-II version in [5].
2.
In this work we will consider the hinge loss $H(\varvec{w}; (\varvec{\tilde{x}}_t, y_t))=max(0,1-y_t \varvec{w}^\top \varvec{\tilde{x}}_t)$.
3.
http://sag.art.uniroma2.it/demo-software/kelp/.
4.
http://cogcomp.cs.illinois.edu/Data/QA/QC/.
5.
C-SVM [3] proposes a caching policy, here ignored for comparative purposes. Large-scale applications may impose prohibitive requirements on the required space.
6.
Only sentences whose lexical unit corresponds to a verb are adopted in our tests.

References

Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of COLING-ACL. Montreal, Canada(1998)
Google Scholar
Cancedda, N., Gaussier, É., Goutte, C., Renders, J.M.: Word-sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)
MathSciNet MATH Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Article Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP (2011)
Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ACL 2004. Stroudsburg, PA, USA (2004)
Google Scholar
Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: Proceedings of ICML 2008. ACM, New York (2008)
Google Scholar
Drineas, P., Mahoney, M.W.: On the nystrm method for approximating a gram matrix for improved kernel-based learning. J. ML Res. 6, 2153–2175 (2005)
MathSciNet MATH Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Filice, S., Castellucci, G., Croce, D., Basili, R.: Effective kernelized online learning in language processing tasks. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 347–358. Springer, Heidelberg (2014)
Chapter Google Scholar
Filice, S., Castellucci, G., Croce, D., Basili, R.: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL: System Demonstrations. Beijing, China, July 2015
Google Scholar
Filice, S., Croce, D., Basili, R.: A stratified strategy for efficient kernel-based learning. In: AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Filice, S., Croce, D., Basili, R., Zanzotto, F.M.: Linear online learning over structured data with distributed tree kernels. In: Proceedings of ICMLA 2013 (2013)
Google Scholar
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the ICML 2008, pp. 408–415. ACM, New York (2008)
Google Scholar
Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)
Article MATH Google Scholar
Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: NIPS, pp. 1060–1068. Curran Associates, Inc.(2009)
Google Scholar
Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)
MathSciNet MATH Google Scholar
Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(3), 229–249 (2006)
Article Google Scholar
Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34, 193–224 (2008)
Article MathSciNet Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of ICML. ACM, New York (2007)
Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)
MathSciNet MATH Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Book MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
MATH Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)
Article Google Scholar
Wang, J., Zhao, P., Hoi, S.C.: Exact soft confidence-weighted learning. In: Proceedings of the ICML 2012. ACM, New York (2012)
Google Scholar
Wang, Z., Vucetic, S.: Online passive-aggressive algorithms on a budget. J. Mach. Learn. Res. Proc. Track 9, 908–915 (2010)
Google Scholar
Williams, C.K.I., Seeger, M.: Using the nyström method to speed up kernel machines. In: Proceedings of NIpPS 2000 (2001)
Google Scholar
Zanzotto, F.M., Dell’Arciprete, L.: Distributed tree kernels. In: Proceedings of ICML 2012 (2012)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR 2003, pp. 26–32. ACM, New York (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Enterprise Engineering, University of Roma, 00133, Roma, Tor Vergata, Italy
Danilo Croce & Roberto Basili

Authors

Danilo Croce
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Basili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danilo Croce .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Croce, D., Basili, R. (2016). Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr$\ddot{o}$m Methods. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr\(\ddot{o}\)m Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multilingual Tokenization and Part-of-speech Tagging. Lightweight Versus Heavyweight Algorithms

Impact of Quantization on Large Language Models for Portuguese Classification Tasks

A survey on multilingual large language models: corpora, alignment, and bias

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr\(\ddot{o}\)m Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multilingual Tokenization and Part-of-speech Tagging. Lightweight Versus Heavyweight Algorithms

Impact of Quantization on Large Language Models for Portuguese Classification Tasks

A survey on multilingual large language models: corpora, alignment, and bias

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us