On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

Airola, Antti; Pahikkala, Tapio; Salakoski, Tapio

doi:10.1007/s11063-010-9159-4

On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

Published: 01 December 2010

Volume 33, pages 17–30, (2011)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Antti Airola¹,
Tapio Pahikkala¹ &
Tapio Salakoski¹

125 Accesses
3 Citations
Explore all metrics

Abstract

The high computational costs of training kernel methods to solve nonlinear tasks limits their applicability. However, recently several fast training methods have been introduced for solving linear learning tasks. These can be used to solve nonlinear tasks by mapping the input data nonlinearly to a low-dimensional feature space. In this work, we consider the mapping induced by decomposing the Nyström approximation of the kernel matrix. We collect together prior results and derive new ones to show how to efficiently train, make predictions with and do cross-validation for reduced set approximations of learning algorithms, given an efficient linear solver. Specifically, we present an efficient method for removing basis vectors from the mapping, which we show to be important when performing cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Fundamentals of Artificial Neural Networks and Deep Learning

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

References

Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Analy Appl 10(3): 203–214
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory. ACM Press, pp 144–152.
Bottou L, Lin CJ (2007) Support vector machine solvers. In: DD Léon Bottou Olivier Chapelle, Weston J (eds) Large-scale kernel machines, neural information processing, MIT Press, Cambridge, pp 1–28
Cawley GC, Talbot NLC (2004) Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw 17(10): 1467–1475
Article MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297
MATH Google Scholar
Harmeling S, Ziehe A, Kawanabe M, Müller KR (2002) Kernel feature spaces and nonlinear blind source separation. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 761–768
Google Scholar
Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
MATH Google Scholar
Joachims T (2006) Training linear SVMs in linear time. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2006). ACM Press, New York, pp 217–226
Chapter Google Scholar
Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS 2009). JMLR workshop and conference proceedings, vol 5, JMLR, pp 304–311
Lee YJ, Mangasarian OJ (2001) RSVM: reduced support vector machines. In: Proceedings of the first SIAM international conference on data mining, Chicago
Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14: 1449–1459
Article Google Scholar
Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia
MATH Google Scholar
Pahikkala T, Boberg J, Salakoski T (2006) Fast n-fold cross-validation for regularized least-squares. In: Honkela T, Raiko T, Kortela J, Valpola H (eds) Proceedings of the ninth Scandinavian conference on artificial intelligence (SCAI 2006). Otamedia Oy, Espoo, Finland, pp 83–90
Google Scholar
Pahikkala T, Suominen H, Boberg J, Salakoski T (2009) Efficient hold-out for subset of regressors. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Proceedings of the international conference on natural and adaptive computing algorithms (ICANNGA 2009). Lecture notes in computer science, vol 5495. Springer, pp 350–359
Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Järvinen J (2009) An efficient algorithm for learning to rank from preference graphs. Mach Learn 75(1): 129–165
Article Google Scholar
Poggio T, Girosi F (1990) Networks for approximation and learning. Proceedings of the IEEE 78(9)
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6: 1939–1959
MathSciNet Google Scholar
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST, Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20. MIT Press, Cambridge
Google Scholar
Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, model and applications, nato science series III: computer and system sciences, vol 190, chap. 7. IOS Press, Amsterdam, pp 131–154
Google Scholar
Sætre R, Sagae K, Tsujii J (2008) Syntactic features for protein–protein interaction extraction. In: Baker CJ, Jian S (eds) Proceedings of the 2nd international symposium on languages in biology and medicine (LBM 2007), CEUR Workshop Proceedings, pp 6.1–6.14
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson R (eds) Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (COLT 2001). Springer, Berlin, Germany, pp 416–426
Google Scholar
Schölkopf B, Mika S, Burges C, Knirsch P, Müller KR, Rätsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5): 1000–1017
Article Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge
Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Google Scholar
Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Ghahramani Z (ed) Proceedings of the 24th international conference on Machine learning (ICML 2007). ACM international conference proceeding series, vol 227. New York, pp 807–814. doi:10.1145/1273496.1273598
Smola AJ, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann Publishers Inc., San Francisco, pp 911–918
Google Scholar
Smola AJ, Vishwanathan SVN, Le Q (2007) Bundle methods for machine learning. In: McCallum A (ed) Advances in neural information processing systems 20. MIT Press, Cambridge
Google Scholar
Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2003) Least squares support vector machines. World Scientific Publishing Company
Tsivtsivadze E, Pahikkala T, Airola A, Boberg J, Salakoski T (2008) A sparse regularized least-squares preference learning algorithm. In: Holst A, Kreuger P, Funk P (eds) Proceedings of the Tenth Scandinavian Conference on Artificial Intelligence (SCAI 2008). Frontiers in artificial intelligence and applications, vol 173. IOS Press, pp 76–83
Tsuda K (1999) Support vector classifier with asymmetric kernel functions. In: European symposium on artificial neural networks (ESANN 1999), pp 183–188
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 682–688
Google Scholar
Xiong H, Swamy M, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474
Article Google Scholar
Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: McCallum A, Roweis S (eds) Proceedings of the 25th international conference on Machine learning (ICML 2008). ACM international conference proceeding series, vol 307. New York, pp 1232–1239

Download references

Author information

Authors and Affiliations

Department of Information Technology, Turku Centre for Computer Science (TUCS), University of Turku, Joukahaisenkatu 3-5 B, Turku, Finland
Antti Airola, Tapio Pahikkala & Tapio Salakoski

Authors

Antti Airola
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antti Airola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Airola, A., Pahikkala, T. & Salakoski, T. On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix. Neural Process Lett 33, 17–30 (2011). https://doi.org/10.1007/s11063-010-9159-4

Download citation

Published: 01 December 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s11063-010-9159-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Fundamentals of Artificial Neural Networks and Deep Learning

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Fundamentals of Artificial Neural Networks and Deep Learning

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation