Abstract
Recently, there has been considerable work on analyzing learning algorithms with pairwise loss functions in the batch setting. There is relatively little theoretical work on analyzing their online algorithms, despite of their popularity in practice due to the scalability to big data. In this paper, we consider online learning algorithms with pairwise loss functions based on regularization schemes in reproducing kernel Hilbert spaces. In particular, we establish the convergence of the last iterate of the online algorithm under a very weak assumption on the step sizes and derive satisfactory convergence rates for polynomially decaying step sizes. Our technique uses Rademacher complexities which handle function classes associated with pairwise loss functions. Since pairwise learning involves pairs of examples, which are no longer i.i.d., standard techniques do not directly apply to such pairwise learning algorithms. Hence, our results are a non-trivial extension of those in the setting of univariate loss functions to the pairwise setting.
Similar content being viewed by others
References
Agarwal, S., Niyogi, P.: Generalization bounds for ranking algorithms via algorithmic stability. J. Mach. Learn. Res. 10, 441–474 (2009)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Bartlett, P.L., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33, 1497–1537 (2005)
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)
Bellet, A., Habrard, A., Sebban, M.: Similarity learning for provably accurate sparse linear classification. ICML (2012)
Cao, Q., Guo, Z.C., Ying, Y.: Generalization bounds for metric and similarity learning. Machine Learning Journal 102(1), 115–132 (2016)
Cesa-Bianchi, C., Gentile, C.: Improved risk tail bounds for online algorithms. IEEE Trans. Inf. Theory 54(1), 286–390 (2008)
Chen, H., Pan, Z., Li, L.: Learning performance of coefficient-based regularized ranking. Neurocomputing 133, 54–62 (2014)
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)
Clémencon, S., Lugosi, G., Vayatis, N.: Ranking and empirical minimization of U-statistics. Ann. of Stat. 36, 844–874 (2008)
Cucker, F., Zhou, D.-X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge Univesity Press (2007)
Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning (ICML) (2007)
Fan, J., Hu, T., Wu, Q., Zhou, D. X.: Consistency analysis of an empirical minimum error entropy algorithm, Applied and Computational Harmonic Analysis 41(1), 161–189 (2016). doi:10.1016/j.acha.2014.12.005
Guo, Z. C., Ying, Y.: Guaranteed classification via regularized similarity learning. Neural Comput. 26, 497–522 (2014)
Hu, T., Fan, J., Wu, Q., Zhou, D.X.: Regularization schemes for minimum error entropy principle. Anal. Appl. 13, 437–455 (2015)
Kar, P., Sriperumbudur, B., Jain, P., Karnick, H.: On the generalization ability of online learning algorithms for pairwise loss functions. In: Proceedings of the 30th International Conference on Machine Learning (ICML) (2013)
Jin, R., Wang, S., Zhou, Y.: Regularized distance metric learning: theory and algorithm. In: Advances in Neural Information Processing Systems (NIPS) (2009)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: isoperimetry and processes. Springer (1991)
McDiarmid, C.: Surveys in Combinatorics, Chapter on the methods of bounded differences, pp 148–188. Cambridge University Press, Cambridge (UK) (1989)
Meir, R., Zhang, T.: Generalization error bounds for Bayesian mixture algorithms. J. Mach. Learn. Res. 4, 839–860 (2003)
Mukherjee, S., Wu, Q.: Estimation of gradients and coordinate covariation in classification. J. Mach. Learn. Res. 7, 2481–2514 (2006)
Mukherjee, S., Zhou, D. X.: Learning coordinate covariances via gradients. J. Mach. Learn. Res. 7, 519–549 (2006)
Pinelis, I.: Optimum bounds for the distributions of martingales in banach spaces. Ann. Prob. 22, 1679–1706 (1994)
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on Machine Learning (ICML) (2012)
Rejchel, W.: On ranking and generalization bounds. J. Mach. Learn. Res. 13, 1373–1392 (2012)
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of the 30th International Conference on Machine Learning (ICML) (2013)
Smale, S., Yao, Y.: Online learning algorithms. Found. Comput. Math. 6, 145–170 (2006)
Sridharan, K., Srebro, N., Shalev-Shwartz, S.: Fast rates for regularized objectives Advances in Neural Information Processing Systems (NIPS) (2008)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer-Verlag, New York (2008)
Weinberger, K. Q., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbour classification. In: Advances in Neural Information Processing Systems (NIPS) (2005)
Vitter, J. S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Wang, Y., Khardon, R., Pechyony, D., Jones, R.: Generalization bounds for online learning algorithms with pairwise loss functions. COLT (2012)
Wang, Y., Khardon, R., Pechyony, D., Jones, R.: Online learning with pairwise loss functions. ArXiv Preprint (2013). arXiv:1301.5332
Wu, Q., Zhou, D.X.: Analysis of support vector machine classification. J. Comput. Anal. Appl. 8(2), 99–119 (2006)
Ying, Y., Li, P.: Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13, 1–26 (2012)
Ying, Y., Pontil, M.: Online gradient descent algorithms. Found. Comput. Math. 5, 561–596 (2008)
Ying, Y., Zhou, D.X.: Online regularized classification algorithms. IEEE Trans. Inf. Theory 11, 4775–4788 (2006)
Zhao, P., Hoi, S.C.H., Jin, R., Yang, T.: Online AUC Maximization. In: Proceedings of the 28th International Conference on Machine Learning (ICML) (2011)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. of Stat. 32, 56–85 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Karsten Urban
Rights and permissions
About this article
Cite this article
Guo, ZC., Ying, Y. & Zhou, DX. Online regularized learning with pairwise loss functions. Adv Comput Math 43, 127–150 (2017). https://doi.org/10.1007/s10444-016-9479-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10444-016-9479-7