Error analysis of distributed least squares ranking
Introduction
Divide and conquer strategy has been used successfully for kernel methods with big scale data [1], [2], [3]. Compared with other techniques for big scale data, e.g., Markov sampling [4], [5], [6] and path algorithm [7], [8], divide and conquer strategy explores all the data in small batches simultaneously. In general, there are three key steps for kernel methods under divide and conquer strategy: dividing the whole data into manageable subsets, obtaining the individual learning machine in each subset, and getting the final predictor by combining these individual machines [2], [3], [9]. Following this way, distributed learning has provided an effective strategy to conquer the big data challenges [1], [2], [10] and to realize the privacy-preserving data mining [3].
Along with its practical success, theoretical foundations of distributed kernel methods have been investigated recently. For distributed regularized least-squares, the optimal learning rates in expectation are provided in [2], [9]. For the distributed spectral algorithm, the convergence rate has been well understood in [11]. In [12], the error analysis for distributed multi-penalty regularization is derived by Neumann expansion and a second order decomposition on the difference of operator inverse approach. In [13], the generalization ability is characterized for a distributed gradient descent algorithm. A novel analysis framework is conducted for distributed semi-supervised regression [3], which shows the unlabeled data play important roles in reducing the distributed error and relaxing the restriction on the number of data subsets. In [14], the bias correction is applied to further improve the learning performance of regularized kernel networks, and their optimal learning rates are reached for distributed regression setting.
Despite rapid progress on distributed learning theory, all the above results are restricted to the pointwise kernel methods (or kernel methods with pointwise losses). It still remains unclear for learning theory analysis to distributed kernel methods with pairwise losses (e.g., pairwise ranking [15], [16], [17], [18], similarity/metric learning [19], [20], [21]). Meanwhile, the computational complexity of pairwise learning machines is usually higher than the corresponding pointwise learning, especially for the big data setting. This motivates us to explore theoretic foundations for pairwise kernel methods under divide and conquer strategy.
By applying the divide and conquer strategy to regularized least squares ranking (RLSRank) [15], [16], [22], we formulate a distributed ranking algorithm, called distributed regularized least squares ranking (DRLSRank). Under this distributed learning strategy, the proposed ranking model has much advantage on computational feasibility and can deal with ranking task with big data. The main contribution of this paper is to establish the generalization bounds of DRLSRank based on the solution characteristics of RLSRank [16], [17], [23], the operator approximation techniques [24], and the error decomposition strategy in [3]. The error bounds show the proposed DRLSRank can also achieve satisfactory learning rates under mild conditions, which provide the learning theory guarantees for the distributed pairwise approach.
The remainder of this paper includes the algorithm framework and its theoretical analysis. In Section 2, we briefly review some related works on regularized least squares ranking and ranking learning theory. Section 3 formulates the distributed regularized ranking and Section 4 provides the upper bound on the excess ranking risk. Finally, Section 5 concludes the paper.
Section snippets
Related works
In this section, we recall some related works on regularized least squares ranking. In [15], the magnitude-preserving least squares ranking is proposed and its concentration estimation is established by stability analysis technique. In [16], a novel solution expression and the convergence rate of RLSRank are derived by the properties of integral operators. In [23], the multi-scale kernel is incorporated into RLSRank to better approximate the non-flat function. Furthermore, a stochastic gradient
Preliminaries
Let us revisit the background on ranking problem for learning a real-valued function [15], [26].
Let be a compact input space and the output set for some M > 0. Each sample is drawn independently from a unknown distribution where is the conditional probability given and is the corresponding marginal distribution. In machine learning problem, we usually just know the empirical information of intrinsic distribution ρ through i.i.d
Generalization error analysis
In this paper, we assume that the labeled data in each Dj are drawn independently from an unknown probability distribution ρ and the unlabeled observations in each are obtained independently according to . Our theoretical concern is to bound the divergence in expectation. For analysis feasibility, we assume that |y| ≤ M almost surely for some M > 0 and .
Conclusion
In this paper, we investigate the learning theory foundations for distributed regularized ranking by developing the error decomposition in [3] and operator approximation techniques in [16], [22]. Our learning theory analysis shows DRLSRank can achieve satisfactory learning rate besides the computational feasibility. In particular, we also observe that the additional unlabeled data is crucial to reduce the distributed error and relax the restriction on the number of sub-ranking models. There are
Declaration of competing interest
None.
Acknowledgment
This work was supported by National Natural Science Foundation of China (NSFC) under grant nos. 11671161 and 11801201 and the Fundamental Research Funds for the Central Universities (Project Nos. 2662019FW003, 2662018QD018 and 2662015PY138).
Hong Chen received the B.Sc. and Ph.D. degrees from Hubei University, Wuhan, China, in 2003 and 2009, respectively. During Feb 2016–Aug 2017, he worked as a postdoc researcher in the Department of Computer Science and Engineering, University of Texas at Arlington, USA. Currently, he is a professor in the Department of Mathematics and Statistics, College of Science, Huazhong Agricultural University, Wuhan, China. His current research interests include machine learning, statistical learning
References (42)
- et al.
Kernelized elastic net regularization based on Markov selective sampling
Knowl.-Based Syst.
(2019) - et al.
Distributed learning with multi-penalty regularization
Appl. Comput. Harmon. Anal.
(2019) The convergence rate of a regularized ranking algorithm
J. Approx. Theory
(2012)- et al.
Extreme learning machine for ranking: generalization analysis and applications
Neural Netw.
(2014) - et al.
On the convergence rate and some applications of regularized ranking algorithms
J. Complex.
(2016) - et al.
A linear functional strategy for regularized ranking
Neural Netw.
(2016) - et al.
Distributed pairwise algorithms with gradient descent methods
Neurocomputing
(2019) - et al.
Online pairwise learning algorithms with convex loss functions
Inf. Sci.
(2017) - et al.
Generalization performance of gaussian kernels SVMC based on Markov sampling
Neural Netw.
(2014) - et al.
A divide-and-conquer solver for kernel support vector machines
Proceedings of the Thirty-First International Conference on Machine Learning (ICML)
(2014)
Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
J. Mach. Learn. Res.
Distributed semi-supervised learning with kernel ridge regression
J. Mach. Learn. Res.
k-Times Markov sampling for SVMC
IEEE Trans. Neural Netw. Learn. Syst.
New incremental learning algorithm with support vector machines, IEEE Trans
Syst. Man Cybern.: Syst.
Groups-keeping solution path algorithm for sparse regression with automatic feature grouping
Proceedings of the International Conference on Knowledge Discovery & Data Mining, KDD
A new generalized error path algorithm for model selection
Proceedings of the International Conference on Machine Learning, ICML
Distributed learning with regularized least squares
J. Mach. Learn. Res.
On the feasibility of distributed kernel regression for big data
IEEE Trans. Knowl. Data Eng.
Learning theory for distributed spectral algorithm
Inverse Probl.
Distributed kernel-based gradient descent algorithms
Constr. Approx.
Learning theory of distributed regression with bias corrected regularization kernel network
J. Mach. Learn. Res.
Cited by (0)
Hong Chen received the B.Sc. and Ph.D. degrees from Hubei University, Wuhan, China, in 2003 and 2009, respectively. During Feb 2016–Aug 2017, he worked as a postdoc researcher in the Department of Computer Science and Engineering, University of Texas at Arlington, USA. Currently, he is a professor in the Department of Mathematics and Statistics, College of Science, Huazhong Agricultural University, Wuhan, China. His current research interests include machine learning, statistical learning theory and approximation theory.
Han Li received B.S. degree in Mathematics and Applied Mathematics from Faculty of Mathematics and Computer Science, Hubei University in 2007. She received her Ph.D. degree in the School of Mathematics and Statistics at Beijing University of Aeronautics and Astronautics. She worked as a project assistant professor in the Department of Mechanical Engineering, Kyushu University. She now works as an associate professor in the College of Informatics, Huazhong Agricultural University. Her research interests include neural networks, learning theory and pattern recognition.
Zhibin Pan received M.Sc. degree in mathematics from Hubei University in 2004 and his Ph.D. degree in information and communication system from Huazhong University of Science and Technology, China in 2014. During Aug 2015–Aug 2016, he worked as a visiting researcher in the Department of Computer Science and Engineering, University of Texas at Arlington, USA. He is now an associate professor with the Department of Mathematics and Statistics, Huazhong Agricultural University, China. His research interests include machine learning and pattern recognition.