Abstract
The high computational costs of training kernel methods to solve nonlinear tasks limits their applicability. However, recently several fast training methods have been introduced for solving linear learning tasks. These can be used to solve nonlinear tasks by mapping the input data nonlinearly to a low-dimensional feature space. In this work, we consider the mapping induced by decomposing the Nyström approximation of the kernel matrix. We collect together prior results and derive new ones to show how to efficiently train, make predictions with and do cross-validation for reduced set approximations of learning algorithms, given an efficient linear solver. Specifically, we present an efficient method for removing basis vectors from the mapping, which we show to be important when performing cross-validation.
Similar content being viewed by others
References
Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Analy Appl 10(3): 203–214
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory. ACM Press, pp 144–152.
Bottou L, Lin CJ (2007) Support vector machine solvers. In: DD Léon Bottou Olivier Chapelle, Weston J (eds) Large-scale kernel machines, neural information processing, MIT Press, Cambridge, pp 1–28
Cawley GC, Talbot NLC (2004) Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw 17(10): 1467–1475
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297
Harmeling S, Ziehe A, Kawanabe M, Müller KR (2002) Kernel feature spaces and nonlinear blind source separation. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 761–768
Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
Joachims T (2006) Training linear SVMs in linear time. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2006). ACM Press, New York, pp 217–226
Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS 2009). JMLR workshop and conference proceedings, vol 5, JMLR, pp 304–311
Lee YJ, Mangasarian OJ (2001) RSVM: reduced support vector machines. In: Proceedings of the first SIAM international conference on data mining, Chicago
Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14: 1449–1459
Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia
Pahikkala T, Boberg J, Salakoski T (2006) Fast n-fold cross-validation for regularized least-squares. In: Honkela T, Raiko T, Kortela J, Valpola H (eds) Proceedings of the ninth Scandinavian conference on artificial intelligence (SCAI 2006). Otamedia Oy, Espoo, Finland, pp 83–90
Pahikkala T, Suominen H, Boberg J, Salakoski T (2009) Efficient hold-out for subset of regressors. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Proceedings of the international conference on natural and adaptive computing algorithms (ICANNGA 2009). Lecture notes in computer science, vol 5495. Springer, pp 350–359
Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Järvinen J (2009) An efficient algorithm for learning to rank from preference graphs. Mach Learn 75(1): 129–165
Poggio T, Girosi F (1990) Networks for approximation and learning. Proceedings of the IEEE 78(9)
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6: 1939–1959
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST, Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20. MIT Press, Cambridge
Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, model and applications, nato science series III: computer and system sciences, vol 190, chap. 7. IOS Press, Amsterdam, pp 131–154
Sætre R, Sagae K, Tsujii J (2008) Syntactic features for protein–protein interaction extraction. In: Baker CJ, Jian S (eds) Proceedings of the 2nd international symposium on languages in biology and medicine (LBM 2007), CEUR Workshop Proceedings, pp 6.1–6.14
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson R (eds) Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (COLT 2001). Springer, Berlin, Germany, pp 416–426
Schölkopf B, Mika S, Burges C, Knirsch P, Müller KR, Rätsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5): 1000–1017
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Ghahramani Z (ed) Proceedings of the 24th international conference on Machine learning (ICML 2007). ACM international conference proceeding series, vol 227. New York, pp 807–814. doi:10.1145/1273496.1273598
Smola AJ, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann Publishers Inc., San Francisco, pp 911–918
Smola AJ, Vishwanathan SVN, Le Q (2007) Bundle methods for machine learning. In: McCallum A (ed) Advances in neural information processing systems 20. MIT Press, Cambridge
Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2003) Least squares support vector machines. World Scientific Publishing Company
Tsivtsivadze E, Pahikkala T, Airola A, Boberg J, Salakoski T (2008) A sparse regularized least-squares preference learning algorithm. In: Holst A, Kreuger P, Funk P (eds) Proceedings of the Tenth Scandinavian Conference on Artificial Intelligence (SCAI 2008). Frontiers in artificial intelligence and applications, vol 173. IOS Press, pp 76–83
Tsuda K (1999) Support vector classifier with asymmetric kernel functions. In: European symposium on artificial neural networks (ESANN 1999), pp 183–188
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 682–688
Xiong H, Swamy M, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474
Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: McCallum A, Roweis S (eds) Proceedings of the 25th international conference on Machine learning (ICML 2008). ACM international conference proceeding series, vol 307. New York, pp 1232–1239
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Airola, A., Pahikkala, T. & Salakoski, T. On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix. Neural Process Lett 33, 17–30 (2011). https://doi.org/10.1007/s11063-010-9159-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-010-9159-4