This paper presents a detailed performance analysis for the kernel-based regularized pairwise learning model associated with a strongly convex loss. The robustness for the model is analyzed by applying an improved convex analysis method. The results show that the regularized pairwise learning model has better qualitatively robustness according to the probability measure. Some new comparison inequalities are provided, with which the convergence rates are derived. In particular an explicit learning rate is obtained in case that the loss is the least square loss.
Citation: |
[1] |
S. Agarwal and P. Niyogi, Generalization bounds for ranking algorithms via algorithmic stability, J. Mach. Learn. Res., 10 (2009), 441-474.
![]() ![]() |
[2] |
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, New York, 2011.
doi: 10.1007/978-1-4419-9467-7.![]() ![]() ![]() |
[3] |
M. Boissier, S. Lyu, Y. Ying and D.-X. Zhou, Fast convergence of online pairwise learning algorithms, Proceedings of the 19th International Conference on Artifical Intelligence and Statistics, 2016,204-212.
![]() |
[4] |
C. Brunner, A. Fischer, A. Luig and T. Thies, Pairwise support vector machines and their application to large scale problems, J. Mach. Learn. Res., 13 (2012), 2279-2292.
![]() ![]() |
[5] |
Q. Cao, Z.-C. Guo and Y. Ying, Generalization bounds for metric and similarity learning, Mach. Learn., 102 (2016), 115-132.
doi: 10.1007/s10994-015-5499-7.![]() ![]() ![]() |
[6] |
H. Chen, The convergence rate of a regularized ranking algorithm, J. Approx. Theory, 164 (2012), 1513-1519.
doi: 10.1016/j.jat.2012.09.001.![]() ![]() ![]() |
[7] |
H. Chen, Z. Pan and L. Li, Learning performance of coefficient-based regularized ranking, Neurocomputing, 133 (2014), 54-62.
doi: 10.1016/j.neucom.2013.11.032.![]() ![]() |
[8] |
H. Chen and J. Wu, Regularized ranking with convex losses and penalty, Abstract and Applied Analysis 2013 (2013), 8 pp.
doi: 10.1155/2013/927827.![]() ![]() ![]() |
[9] |
H. Chen, J. T. Wu and D. R. Chen, Semi-supervised learning for regression based on the diffusion matrix, Sci. Sinica Math., 44 (2014), 399-408.
![]() |
[10] |
X. Chen and Y. Lei, Refined bounds for online pairwise learning algorithms, Neurocomputing, 275 (2018), 2656-2665.
doi: 10.1016/j.neucom.2017.11.049.![]() ![]() |
[11] |
A. Christmann and A. Van Messem, Bouligand derivatives and robustness of support vector machines for regression, J. Mach. Learn. Res., 9 (2008), 915-936.
![]() ![]() |
[12] |
A. Christmann, D. H. Xiang and D.-X. Zhou, Total stability of kernel methods, Neurocomputing, 289 (2018), 101-118.
doi: 10.1016/j.neucom.2018.02.009.![]() ![]() |
[13] |
A. Christmann and D.-X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33.
doi: 10.1016/j.jco.2016.07.001.![]() ![]() ![]() |
[14] |
C. K. Chui, S.-B. Lin, B. Zhang and D.-X. Zhou, Realization of spatial sparseness by deep ReLU nets with massive data, IEEE Trans. Neural Netw. Learn. Syst., 33 (2022), 229-243.
doi: 10.1109/TNNLS.2020.3027613.![]() ![]() ![]() |
[15] |
S. Clémencon, G. Lugosi and N. Vayatis, Ranking and empirical minimization of -statistics, Ann. Statist., 36 (2008), 844-874.
doi: 10.1214/009052607000000910.![]() ![]() ![]() |
[16] |
F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, 24, Cambridge University Press, Cambridge, 2007.
doi: 10.1017/CBO9780511618796.![]() ![]() ![]() |
[17] |
J. V. Davis, B. Kulis, P. Jain, S. Sra and I. S. Dhillon, Information-theoretic metric learning, Proceedings of the 24th International Conference on Machine Learning, 2007, 209-216.
doi: 10.1145/1273496.1273523.![]() ![]() |
[18] |
E. De Vito, L. Rosasco and A. Toigo, Learning sets with separating kernels, Appl. Comput. Harmon. Anal., 37 (2014), 185-217.
doi: 10.1016/j.acha.2013.11.003.![]() ![]() ![]() |
[19] |
J. Fan, T. Hu, Q. Wu and D.-X. Zhou, Consistency analysis of an empirical minimum error entropy algorithm, Appl. Comput. Harmon. Anal., 41 (2016), 164-189.
doi: 10.1016/j.acha.2014.12.005.![]() ![]() ![]() |
[20] |
Z.-C. Guo, Y. Ying and D.-X. Zhou, Online regularized learning with pairwise loss functions, Adv. Comput. Math., 43 (2017), 127-150.
doi: 10.1007/s10444-016-9479-7.![]() ![]() ![]() |
[21] |
J.-B. Hiriart-Urruty and C. Lemaréchal, Fundamentals of Convex Analysis, Grundlehren Text Editions, Springer-Verlag, New York, 2001.
doi: 10.1007/978-3-642-56468-0.![]() ![]() |
[22] |
W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13-30.
doi: 10.1080/01621459.1963.10500830.![]() ![]() ![]() |
[23] |
T. Hu, J. Fan, Q. Wu and D.-X. Zhou, Regularization schemes for minimum error entropy principle, Anal. Appl. (Singap.), 13 (2015), 437-455.
doi: 10.1142/S0219530514500110.![]() ![]() ![]() |
[24] |
T. Hu, J. Fan and D.-H. Xiang, Convergence analysis of distributed multi-penalty regularized pairwise learning, Anal. Appl. (Singap.), 18 (2020), 109-127.
doi: 10.1142/S0219530519410045.![]() ![]() ![]() |
[25] |
T. Hu, Q. Wu and D.-X. Zhou, Kernel gradient descent algorithm for information theoretic learning, J. Approx. Theory, 263 (2021), 22 pp.
doi: 10.1016/j.jat.2020.105518.![]() ![]() ![]() |
[26] |
G. Kriukova, S. V. Pereverzyev and P. Tkachenko, On the convergence rate and some applications of regularized ranking algorithms, J. Complexity, 33 (2016), 14-29.
doi: 10.1016/j.jco.2015.09.004.![]() ![]() ![]() |
[27] |
Y. Lei and Y. Ying, Stochastic proximal AUC maximization, J. Mach. Learn. Res., 22 (2021), 45 pp.
![]() ![]() |
[28] |
B. Li and Z. Dai, Error analysis on regularized regression based on the maximum correntropy criterion, Math. Found. Comput., 3 (2020), 25-40.
doi: 10.3934/mfc.2020003.![]() ![]() |
[29] |
L. Li, Regularized least square regression with spherical polynomial kernels, Int. J. Wavelets Multiresolut. Inf. Process., 7 (2009), 781-801.
doi: 10.1142/S0219691309003240.![]() ![]() ![]() |
[30] |
J. Lin and D.-X. Zhou, Online learning algorithms can converge comparably fast as batch learning, IEEE Trans. Neural Netw. Learn. Syst., 29 (2018), 2367–2378.
doi: 10.1109/TNNLS.2017.2677970.![]() ![]() ![]() |
[31] |
H. Liu, B. Sheng and P. Ye, The learning rate of vector-valued ranking with least square loss, Wseas Trans. Math., 14 (2015), 364-375. Available from: https://www.wseas.org/multimedia/journals/mathematics/2015/a705806-793.pdf.
![]() |
[32] |
T. Mao, Z. J. Shi and D.-X. Zhou, Theory of deep convolutional neural networks III: Approximating radial functions, Neural Networks, 144 (2021), 778-790.
doi: 10.1016/j.neunet.2021.09.027.![]() ![]() |
[33] |
C. S. Ong, A. J. Smola and R. C. Williamson, Hyperkernels, Conference paper, 2002. Available from: https://www.researchgate.net/publication/221618045.
![]() |
[34] |
C. S. Ong, A. J. Smola and R. C. Williamson, Learning the kernel with hyperkernels, J. Mach. Learn. Res., 6 (2005), 1043-1071.
![]() ![]() |
[35] |
T. Pahikkala, M. Viljanen, A. Airola and W. Waegeman, Spectral analysis of symmetric and antisymmetric pairwise kernels, preprint, 2015, arXiv: 1506.05950v1.
![]() |
[36] |
Z. Qin, F. Yu, C. Liu and X. Chen, How convolutional neural networks see the world –- A survey of convolutional neural network visualization methods, Math. Found. Comput., 1 (2018), 149-180.
doi: 10.3934/mfc.2018008.![]() ![]() |
[37] |
W. Rejchel, On ranking and generalization bounds, J. Mach. Learn. Res., 13 (2012), 1373-1392.
![]() ![]() |
[38] |
B. Sheng, H. Liu and H. Wang, Learning rates for kernel regularized regression with a differentiable strongly convex loss, Commun. Pure Appl. Anal., 19 (2020), 3973-4005.
doi: 10.3934/cpaa.2020176.![]() ![]() ![]() |
[39] |
B. H. Sheng and D. H. Xiang, The learning rate of -coefficient regularized classification with strong loss, Acta Math. Sin.(Engl. Ser.), 29 (2013), 2397-2408.
doi: 10.1007/s10114-013-0175-y.![]() ![]() ![]() |
[40] |
B. H. Sheng, P. X. Ye and J. L. Wang, Learning rates for least square regressions with coefficient regularization, Acta Math. Sin.(Engl. Ser.), 28 (2012), 2205-2212.
doi: 10.1007/s10114-012-0607-0.![]() ![]() ![]() |
[41] |
S. Smale and D.-X. Zhou, Estimating the approximation error in learning theory, Anal. Appl. (Singap.), 1 (2003), 17-41.
doi: 10.1142/S0219530503000089.![]() ![]() ![]() |
[42] |
B. Su and H.-W. Sun, Coefficient-based regularization network with variance loss for error, Int. J. Wavelets Multiresolut. Inf. Process, 19 (2021), 22 pp.
doi: 10.1142/S0219691320500551.![]() ![]() ![]() |
[43] |
H. Sun, Behavior of a functional in learning theory, Front. Math. China, 2 (2007), 455-465.
doi: 10.1007/s11464-007-0028-z.![]() ![]() ![]() |
[44] |
M. Tian, B. Sheng and S. Wang, Some upper bounds for RKHS approximation by Bessel functions, Axioms, 11 (2022), 20 pp.
doi: 10.3390/axioms11050233.![]() ![]() |
[45] |
C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl. (Singap.), 17 (2020), 293-322.
doi: 10.1142/S0219530518500148.![]() ![]() ![]() |
[46] |
C. Wang and T. Hu, Online regularized pairwise learning with least squares loss, Anal. Appl. (Singap.), 18 (2020), 49-78.
doi: 10.1142/S0219530519410070.![]() ![]() ![]() |
[47] |
S. Wang, Z. Chen and B. Sheng, Convergence of online pairwise regression learning with quadratic loss, Commun. Pure Appl. Anal., 19 (2020), 4023-4054.
doi: 10.3934/cpaa.2020178.![]() ![]() ![]() |
[48] |
P. Wang, Z. Yang, Y. Lei, Y. Ying and H. Zhang, Differentially private empirical risk minimization for AUC maximization, Neurocomputing, 461 (2021), 419-437.
doi: 10.1016/j.neucom.2021.07.001.![]() ![]() |
[49] |
Y. Wang, R. Khardon, D. Pechyony and R. Jones, Generalization bounds for online learning algorithms with pairwise loss functions, Proceedings of the 25th Annual Conference on Learning Theory, 23 (2012), 13.1-13.22.
![]() |
[50] |
K. Q. Weinberger and L. K. Saul, Distance metric learning for large margin nearest neighbor classification, J. Mach. Learn. Res., 10 (2009), 207-244.
![]() |
[51] |
Q. Wu, Y. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.
doi: 10.1016/j.jco.2006.06.007.![]() ![]() ![]() |
[52] |
Y. Xia, J. Zhou, T. Xu and W. Gao, An improved deep convolutional neural network model with kernel loss function in image classifiaction, Math. Found. Comput., 3 (2020), 51-64.
doi: 10.3934/mfc.2020005.![]() ![]() |
[53] |
J. G. Yan and C. Su, Precise asymptotics of -statistics, Acta Math. Sinica (Chinese Ser.), 50 (2007), 517-526.
![]() ![]() |
[54] |
Z. Yang, Y. Lei, P. Wang, T. Yang and Y. Ying, Simple stochastic and online gradient descent algorithms for pairwise learning, preprint, 2021, arXiv: 2111.12050.
![]() |
[55] |
Y. Ying and P. Li, Distance metric learning with eigenvalue optimization, J. Mach. Learn. Res., 13 (2012), 1-26.
![]() ![]() |
[56] |
Y. Ying and D.-X. Zhou, Online pairwise learning algorithms, Neural Comput., 28 (2016), 743-777.
doi: 10.1162/NECO_a_00817.![]() ![]() ![]() |
[57] |
Z. Yu, D. W. C. Ho, Z. Shi and D.-X. Zhou, Robust kernel-based distribution regression, Inverse Problems, 37 (2021), 34 pp.
doi: 10.1088/1361-6420/ac23c3.![]() ![]() ![]() |
[58] |
P. Zhao, S. C. H. Hoi, R. Jin and T. Yang, Online AUC maximization, Proceedings of the 28th International Conference on International Conference on Machine Learning, (2011), 233-240.
![]() |
[59] |
Y. Zhao, J. Fan and L. Shi, Learning rates for regularized least squares ranking algorithm, Anal. Appl. (Singap.), 15 (2017), 815-836.
doi: 10.1142/S0219530517500063.![]() ![]() ![]() |
[60] |
D.-X. Zhou, Deep distributed convolutional neural networks: Universality, Anal. Appl. (Singap.), 16 (2018), 895-919.
doi: 10.1142/S0219530518500124.![]() ![]() ![]() |
[61] |
D.-X. Zhou, Theory of deep convolutional neural networks: Ownsampling, Neural Networks, 124 (2020), 319-327.
![]() |
[62] |
D.-X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48 (2020), 787-794.
doi: 10.1016/j.acha.2019.06.004.![]() ![]() ![]() |