Error analysis of distributed least squares ranking

doi:10.1016/j.neucom.2019.06.071

Neurocomputing

Volume 361, 7 October 2019, Pages 222-228

https://doi.org/10.1016/j.neucom.2019.06.071 Get rights and content

Abstract

Learning theory of distributed kernel methods has attracted much attentions recently. However, the existing theory analysis is limited to the kernel regression with pointwise losses. It is not clear whether theory guarantees can be obtained for distributed kernel methods with pairwise losses. To answer this question, this paper considers a new pairwise ranking algorithm, called distributed regularized least squares ranking (DRLSRank), under the divide and conquer strategy. Rather than minimizing the empirical pairwise risk associated the whole data, the proposed DRLSRank obtains the individual ranking functions based on data subsets and takes their weighted average as a final predictor. Theoretically, we derive the generalization bounds in expectation via integral operator approximation technique. Our results show that DRLSRank can achieve a satisfactory learning rate and the additional unlabeled data are crucial for relaxing the restriction on the number of data subsets, which fills the theoretical gap on learning theory for distributed pairwise ranking.

Introduction

Divide and conquer strategy has been used successfully for kernel methods with big scale data [1], [2], [3]. Compared with other techniques for big scale data, e.g., Markov sampling [4], [5], [6] and path algorithm [7], [8], divide and conquer strategy explores all the data in small batches simultaneously. In general, there are three key steps for kernel methods under divide and conquer strategy: dividing the whole data into manageable subsets, obtaining the individual learning machine in each subset, and getting the final predictor by combining these individual machines [2], [3], [9]. Following this way, distributed learning has provided an effective strategy to conquer the big data challenges [1], [2], [10] and to realize the privacy-preserving data mining [3].

Along with its practical success, theoretical foundations of distributed kernel methods have been investigated recently. For distributed regularized least-squares, the optimal learning rates in expectation are provided in [2], [9]. For the distributed spectral algorithm, the convergence rate has been well understood in [11]. In [12], the error analysis for distributed multi-penalty regularization is derived by Neumann expansion and a second order decomposition on the difference of operator inverse approach. In [13], the generalization ability is characterized for a distributed gradient descent algorithm. A novel analysis framework is conducted for distributed semi-supervised regression [3], which shows the unlabeled data play important roles in reducing the distributed error and relaxing the restriction on the number of data subsets. In [14], the bias correction is applied to further improve the learning performance of regularized kernel networks, and their optimal learning rates are reached for distributed regression setting.

Despite rapid progress on distributed learning theory, all the above results are restricted to the pointwise kernel methods (or kernel methods with pointwise losses). It still remains unclear for learning theory analysis to distributed kernel methods with pairwise losses (e.g., pairwise ranking [15], [16], [17], [18], similarity/metric learning [19], [20], [21]). Meanwhile, the computational complexity of pairwise learning machines is usually higher than the corresponding pointwise learning, especially for the big data setting. This motivates us to explore theoretic foundations for pairwise kernel methods under divide and conquer strategy.

By applying the divide and conquer strategy to regularized least squares ranking (RLSRank) [15], [16], [22], we formulate a distributed ranking algorithm, called distributed regularized least squares ranking (DRLSRank). Under this distributed learning strategy, the proposed ranking model has much advantage on computational feasibility and can deal with ranking task with big data. The main contribution of this paper is to establish the generalization bounds of DRLSRank based on the solution characteristics of RLSRank [16], [17], [23], the operator approximation techniques [24], and the error decomposition strategy in [3]. The error bounds show the proposed DRLSRank can also achieve satisfactory learning rates under mild conditions, which provide the learning theory guarantees for the distributed pairwise approach.

The remainder of this paper includes the algorithm framework and its theoretical analysis. In Section 2, we briefly review some related works on regularized least squares ranking and ranking learning theory. Section 3 formulates the distributed regularized ranking and Section 4 provides the upper bound on the excess ranking risk. Finally, Section 5 concludes the paper.

Section snippets

Related works

In this section, we recall some related works on regularized least squares ranking. In [15], the magnitude-preserving least squares ranking is proposed and its concentration estimation is established by stability analysis technique. In [16], a novel solution expression and the convergence rate of RLSRank are derived by the properties of integral operators. In [23], the multi-scale kernel is incorporated into RLSRank to better approximate the non-flat function. Furthermore, a stochastic gradient

Preliminaries

Let us revisit the background on ranking problem for learning a real-valued function [15], [26].

Let $X \subset R^{p}$ be a compact input space and the output set $Y = [0, M]$ for some M > 0. Each sample $(x, y) \in Z : = X \times Y$ is drawn independently from a unknown distribution $ρ (X, Y) = ρ (Y | X = x) ρ_{X} (x),$ where $ρ (\cdot | X = x)$ is the conditional probability given $X = x$ and $ρ_{X}$ is the corresponding marginal distribution. In machine learning problem, we usually just know the empirical information of intrinsic distribution ρ through i.i.d

Generalization error analysis

In this paper, we assume that the labeled data in each D_j are drawn independently from an unknown probability distribution ρ and the unlabeled observations in each ${\tilde{D}}_{j}$ are obtained independently according to $ρ_{X}$ . Our theoretical concern is to bound the divergence $∥ {\bar{f}}_{D^{*}, λ} - f_{ρ} ∥_{K}$ in expectation. For analysis feasibility, we assume that |y| ≤ M almost surely for some M > 0 and $κ : = \sqrt{\sup_{x \in X} K (x, x)} < \infty$ .

Conclusion

In this paper, we investigate the learning theory foundations for distributed regularized ranking by developing the error decomposition in [3] and operator approximation techniques in [16], [22]. Our learning theory analysis shows DRLSRank can achieve satisfactory learning rate besides the computational feasibility. In particular, we also observe that the additional unlabeled data is crucial to reduce the distributed error and relax the restriction on the number of sub-ranking models. There are

Declaration of competing interest

None.

Acknowledgment

This work was supported by National Natural Science Foundation of China (NSFC) under grant nos. 11671161 and 11801201 and the Fundamental Research Funds for the Central Universities (Project Nos. 2662019FW003, 2662018QD018 and 2662015PY138).

Hong Chen received the B.Sc. and Ph.D. degrees from Hubei University, Wuhan, China, in 2003 and 2009, respectively. During Feb 2016–Aug 2017, he worked as a postdoc researcher in the Department of Computer Science and Engineering, University of Texas at Arlington, USA. Currently, he is a professor in the Department of Mathematics and Statistics, College of Science, Huazhong Agricultural University, Wuhan, China. His current research interests include machine learning, statistical learning

References (42)

ChenW. et al.
Kernelized elastic net regularization based on Markov selective sampling
Knowl.-Based Syst.
(2019)
GuoZ.C. et al.
Distributed learning with multi-penalty regularization
Appl. Comput. Harmon. Anal.
(2019)
ChenH.
The convergence rate of a regularized ranking algorithm
J. Approx. Theory
(2012)
ChenH. et al.
Extreme learning machine for ranking: generalization analysis and applications
Neural Netw.
(2014)
G. Kriukova et al.
On the convergence rate and some applications of regularized ranking algorithms
J. Complex.
(2016)
G. Kriukova et al.
A linear functional strategy for regularized ranking
Neural Netw.
(2016)
WangB. et al.
Distributed pairwise algorithms with gradient descent methods
Neurocomputing
(2019)
LinJ. et al.
Online pairwise learning algorithms with convex loss functions
Inf. Sci.
(2017)
XuJ. et al.
Generalization performance of gaussian kernels SVMC based on Markov sampling
Neural Netw.
(2014)
C.J. Hsieh et al.
A divide-and-conquer solver for kernel support vector machines
Proceedings of the Thirty-First International Conference on Machine Learning (ICML)
(2014)

ZhangY. et al.

Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates

J. Mach. Learn. Res.

(2015)

ChangX. et al.

Distributed semi-supervised learning with kernel ridge regression

J. Mach. Learn. Res.

(2017)

ZouB. et al.

k-Times Markov sampling for SVMC

IEEE Trans. Neural Netw. Learn. Syst.

(2018)

XuJ. et al.

New incremental learning algorithm with support vector machines, IEEE Trans

Syst. Man Cybern.: Syst.

(2018)

B. Gu et al.

Groups-keeping solution path algorithm for sparse regression with automatic feature grouping

Proceedings of the International Conference on Knowledge Discovery & Data Mining, KDD

(2017)

B. Gu et al.

A new generalized error path algorithm for model selection

Proceedings of the International Conference on Machine Learning, ICML

(2015)

LinS.B. et al.

Distributed learning with regularized least squares

J. Mach. Learn. Res.

(2017)

XuC. et al.

On the feasibility of distributed kernel regression for big data

IEEE Trans. Knowl. Data Eng.

(2016)

GuoZ.C. et al.

Learning theory for distributed spectral algorithm

Inverse Probl.

(2017)

LinS.B. et al.

Distributed kernel-based gradient descent algorithms

Constr. Approx.

(2018)

GuoZ.C. et al.

Learning theory of distributed regression with bias corrected regularization kernel network

J. Mach. Learn. Res.

(2017)

Cited by (0)

Han Li received B.S. degree in Mathematics and Applied Mathematics from Faculty of Mathematics and Computer Science, Hubei University in 2007. She received her Ph.D. degree in the School of Mathematics and Statistics at Beijing University of Aeronautics and Astronautics. She worked as a project assistant professor in the Department of Mechanical Engineering, Kyushu University. She now works as an associate professor in the College of Informatics, Huazhong Agricultural University. Her research interests include neural networks, learning theory and pattern recognition.

Zhibin Pan received M.Sc. degree in mathematics from Hubei University in 2004 and his Ph.D. degree in information and communication system from Huazhong University of Science and Technology, China in 2014. During Aug 2015–Aug 2016, he worked as a visiting researcher in the Department of Computer Science and Engineering, University of Texas at Arlington, USA. He is now an associate professor with the Department of Mathematics and Statistics, Huazhong Agricultural University, China. His research interests include machine learning and pattern recognition.

View full text

Error analysis of distributed least squares ranking

Abstract

Introduction

Section snippets

Related works

Preliminaries

Generalization error analysis

Conclusion

Declaration of competing interest

Acknowledgment

Knowl.-Based Syst.

Appl. Comput. Harmon. Anal.

J. Approx. Theory

Neural Netw.

J. Complex.

Neural Netw.

Neurocomputing

Inf. Sci.

Neural Netw.

A divide-and-conquer solver for kernel support vector machines

Proceedings of the Thirty-First International Conference on Machine Learning (ICML)

Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates

J. Mach. Learn. Res.

Distributed semi-supervised learning with kernel ridge regression

J. Mach. Learn. Res.

k-Times Markov sampling for SVMC

IEEE Trans. Neural Netw. Learn. Syst.

New incremental learning algorithm with support vector machines, IEEE Trans

Syst. Man Cybern.: Syst.

Groups-keeping solution path algorithm for sparse regression with automatic feature grouping

Proceedings of the International Conference on Knowledge Discovery & Data Mining, KDD

A new generalized error path algorithm for model selection

Proceedings of the International Conference on Machine Learning, ICML

Distributed learning with regularized least squares

J. Mach. Learn. Res.

On the feasibility of distributed kernel regression for big data

IEEE Trans. Knowl. Data Eng.

Learning theory for distributed spectral algorithm

Inverse Probl.

Distributed kernel-based gradient descent algorithms

Constr. Approx.

Learning theory of distributed regression with bias corrected regularization kernel network

J. Mach. Learn. Res.