Skip to main content

Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr\(\ddot{o}\)m Methods

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

Kernel methods have been used by many Machine Learning paradigms, achieving state-of-the-art performances in many Language Learning tasks. One drawback of expressive kernel functions, such as Sequence or Tree kernels, is the time and space complexity required both in learning and classification. In this paper, the Nystr\(\ddot{o}\)m methodology is studied as a viable solution to face these scalability issues. By mapping data in low-dimensional spaces as kernel space approximations, the proposed methodology positively impacts on scalability through compact linear representation of highly structured data. Computation can be also distributed on several machines by adopting the so-called Ensemble Nystr\(\ddot{o}\)m Method. Experimental results show that an accuracy comparable with state-of-the-art kernel-based methods can be obtained by reducing of orders of magnitude the required operations and enabling the adoption of datasets containing more than one million examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We are referring to the PA-II version in [5].

  2. 2.

    In this work we will consider the hinge loss \(H(\varvec{w}; (\varvec{\tilde{x}}_t, y_t))=max(0,1-y_t \varvec{w}^\top \varvec{\tilde{x}}_t)\).

  3. 3.

    http://sag.art.uniroma2.it/demo-software/kelp/.

  4. 4.

    http://cogcomp.cs.illinois.edu/Data/QA/QC/.

  5. 5.

    C-SVM [3] proposes a caching policy, here ignored for comparative purposes. Large-scale applications may impose prohibitive requirements on the required space.

  6. 6.

    Only sentences whose lexical unit corresponds to a verb are adopted in our tests.

References

  1. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of COLING-ACL. Montreal, Canada(1998)

    Google Scholar 

  2. Cancedda, N., Gaussier, É., Goutte, C., Renders, J.M.: Word-sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)

    MathSciNet  MATH  Google Scholar 

  3. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  4. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001)

    Google Scholar 

  5. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP (2011)

    Google Scholar 

  7. Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ACL 2004. Stroudsburg, PA, USA (2004)

    Google Scholar 

  8. Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: Proceedings of ICML 2008. ACM, New York (2008)

    Google Scholar 

  9. Drineas, P., Mahoney, M.W.: On the nystrm method for approximating a gram matrix for improved kernel-based learning. J. ML Res. 6, 2153–2175 (2005)

    MathSciNet  MATH  Google Scholar 

  10. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  11. Filice, S., Castellucci, G., Croce, D., Basili, R.: Effective kernelized online learning in language processing tasks. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 347–358. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. Filice, S., Castellucci, G., Croce, D., Basili, R.: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL: System Demonstrations. Beijing, China, July 2015

    Google Scholar 

  13. Filice, S., Croce, D., Basili, R.: A stratified strategy for efficient kernel-based learning. In: AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  14. Filice, S., Croce, D., Basili, R., Zanzotto, F.M.: Linear online learning over structured data with distributed tree kernels. In: Proceedings of ICMLA 2013 (2013)

    Google Scholar 

  15. Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the ICML 2008, pp. 408–415. ACM, New York (2008)

    Google Scholar 

  16. Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)

    Article  MATH  Google Scholar 

  17. Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: NIPS, pp. 1060–1068. Curran Associates, Inc.(2009)

    Google Scholar 

  18. Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)

    MathSciNet  MATH  Google Scholar 

  19. Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(3), 229–249 (2006)

    Article  Google Scholar 

  20. Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34, 193–224 (2008)

    Article  MathSciNet  Google Scholar 

  21. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  22. Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of ICML. ACM, New York (2007)

    Google Scholar 

  23. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  24. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  25. Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)

    MATH  Google Scholar 

  26. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)

    Article  Google Scholar 

  27. Wang, J., Zhao, P., Hoi, S.C.: Exact soft confidence-weighted learning. In: Proceedings of the ICML 2012. ACM, New York (2012)

    Google Scholar 

  28. Wang, Z., Vucetic, S.: Online passive-aggressive algorithms on a budget. J. Mach. Learn. Res. Proc. Track 9, 908–915 (2010)

    Google Scholar 

  29. Williams, C.K.I., Seeger, M.: Using the nyström method to speed up kernel machines. In: Proceedings of NIpPS 2000 (2001)

    Google Scholar 

  30. Zanzotto, F.M., Dell’Arciprete, L.: Distributed tree kernels. In: Proceedings of ICML 2012 (2012)

    Google Scholar 

  31. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR 2003, pp. 26–32. ACM, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danilo Croce .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Croce, D., Basili, R. (2016). Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr\(\ddot{o}\)m Methods. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics