Elsevier

Neural Networks

Volume 34, October 2012, Pages 65-71
Neural Networks

Analysis of convergence performance of neural networks ranking algorithm

https://doi.org/10.1016/j.neunet.2012.06.012Get rights and content

Abstract

The ranking problem is to learn a real-valued function which gives rise to a ranking over an instance space, which has gained much attention in machine learning in recent years. This article gives analysis of the convergence performance of neural networks ranking algorithm by means of the given samples and approximation property of neural networks. The upper bounds of convergence rate provided by our results can be considerably tight and independent of the dimension of input space when the target function satisfies some smooth condition. The obtained results imply that neural networks are able to adapt to ranking function in the instance space. Hence the obtained results are able to circumvent the curse of dimensionality on some smooth condition.

Introduction

The analysis of convergence performance of learning algorithm is an important and hot topic in machine learning research. To our knowledge, Vapnik and Chervonenkis (1971) first started to study the learning algorithm and established the analysis of convergence for classification algorithm from the statistical analysis. Since then, more different tools have been used to study the convergence performance of learning algorithms and have been applied to both classification (learning of binary-valued functions) and regression (learning of real-valued functions). In many learning algorithms, the goal is not simply classifying objects into one of a fixed number of classes; instead, a ranking of objects is desired. For example, in information retrieval problems, where one likes to retrieve documents from some databases that are ‘relevant’ to a given query or topic. In such problems, one needs a ranking of the documents so that relevant documents are ranked higher than irrelevant documents. Recently, the ranking problem has gained much attention in machine learning (see Agarwal and Niyogi, 2005, Agarwal and Niyogi, 2009, Clemencon et al., 2008, Cohen et al., 1999, Cortes et al., 2007, Cossock and Zhang, 2006, Cucker and Smale, 2001, Cucker and Smale, 2002). For ranking problem, we learn a real-valued function which gives scores to instances; however, these scores themselves do not matter; instead, we are only interested in the relative ranking of instances which are given by these scores.

Now, the ranking has been successfully applied to all kinds of fields, such as social choice theory (Kenneth, 1970), statistics (Lehmann, 1975) and mathematical economics (Chiang & Wainwright, 2005). However, in 1999 Cohen et al. (1999) first began to study the ranking in machine learning. From then on, many researchers started to pay attention to it and study the interesting topic from machine learning view, for example, Crammer and Singer (2002) and Herbrich, Graepel, and Obermayer (2000) considered the related ranking but distinct problem of ordinal regression. Radlinski and Joachims (2005) developed an algorithmic framework for ranking in information retrieval applications. Both Agarwal and Niyogi (2005) and Freund, Iyer, Schapire, and Singer (2003) have considered the convergence properties of ranking algorithms for the special setting of bipartite ranking respectively. Clemencon et al. (2008) have given statistical convergence properties of ranking algorithms based on empirical and convex risk minimization by using the theory of U-statistics. Agarwal and Niyogi (2009) studied the convergence properties of ranking algorithms in a more general setting of the ranking problem that arise frequently in applications and convergence error via ranking algorithmic stability. Burges et al. (2005) have developed a neural network based on the algorithm of ranking problem. Although there have been several recent advances in developing algorithms for various settings of the ranking problem, the study of generalization properties of ranking algorithms has been largely limited to the special setting of bipartite ranking (see Agarwal & Niyogi, 2005, Freund et al., 2003). Similar to Agarwal and Niyogi (2009), we study the convergence property of ranking learning algorithms in a more general setting of the ranking problem that arises frequently in applications and practice. Our convergence rates are derived by using the approximation property of neural networks and covering number instead of the notion of algorithmic stability in reproducing kernel Hilbert space in Agarwal and Niyogi (2009).

Similar to both classification and regression, the ranking problem takes place in some hypothesis space which has good approximation property for a ranking function. It is well known that feedforward neural networks (FNNs) have universal approximation property for any continuous or integrable functions defined on a compact set; there are some algorithms to carry out the approximation. In 1989, Cybenko (1989) first proved that if the activation function in FNNs is a continuous sigmoidal function, and I=[0,1]d is a unit cube in Rd, then any continuous function on I=[0,1]d is approximated by FNNs. Since then, some different methods from Cybenko (1989) have been designed. Meanwhile, a series of investigations into the condition of activation function ensuring the validity of the density theorem can be found in Chen and Chen, 1995a, Chen and Chen, 1995b, Chen, Chen, and Liu (1995), Hornik (1991), and Mhaskar and Micchelli (1992). The complexity of FNN approximation mainly describes the relationship among the topology structure of hidden layer (such as the number of neurons and the value of weights), the approximation ability and the approximation rate. The study of complexity has attracted much attention in recent years (Cao et al., 2009, Cao et al., 2009, Chui and Li, 1992, Maiorov and Meir, 1998, Xu and Cao, 2005). In the study of machine learning, FNNs are usually used as hypothesis space to study the convergence performance of learning algorithm. For example, Barron (1993) gave the convergence rate of least square regression learning algorithm by the approximation property of FNNs. In 2006, Hamers and Kohler (2006) obtained nonasymptotic bounds on the least square regression estimates by minimizing the empirical risk over suitable sets of FNNs. Recently, Kohler and Mehnert (2011) gave an analysis of the rate of convergence of least squares learning algorithm in FNNs for smooth regression function. In this article, we study ranking learning algorithm by using neural networks, where the hypothesis space is chosen as a class of FNNs with one hidden layer.

The article is organized into six sections. Following the introduction in the present section, we describe general ranking problem in a more general setting and introduce neural networks in Section 2. In Section 3, we give approximation error of ranking algorithm by the approximation property of neural networks. Section 4 estimates the sample error. The obtained upper bound in connection with the approximation error leads to estimating the upper bound of convergence rate of neural networks ranking algorithm. In Section 5, we compare our results with the known related work. Finally, we conclude the article with the obtained results.

Section snippets

General ranking problem and neural networks

For the ranking problem, one is given some samples of ordering relationships among instances in some instance space X, and the goal is to learn a real-value function from these samples that ranks future instances. The ranking problems arise in all kinds of domains: in user-preference modeling, one wants to order movies or texts according to likes of oneself, and in information retrieval, where one is interested in retrieving documents from some databases that are ‘relevant’ to a given query or

Estimating approximation error of ranking learning algorithm by neural networks

Here we would expect that the minimizer of the empirical error, i.e., fz, is a good approximation of the minimizer fρ. This is actually true if fρ can be approximated by functions from Fk,m, measured by the decay of the approximation error defined as D=inffFk,m{E(f)E(fρ)}. Thus, the excess error E(fz)E(fρ) may be divided into E(fz)E(fρ){E(fz)Ez(fz)+Ez(fk,m)E(fk,m)}+D, where the function fk,m is defined as fk,m=argminfFk,mE(f). In fact, E(fz)E(fρ)E(fz)Ez(fz)+Ez(fz)Ez(fk,m)+Ez(fk

Estimating the convergence rate of the ranking learning algorithm

In this section, we will estimate the sample error in (4). The obtained error together with the approximation error in Section 3 will lead to estimating the excess error E(fz)E(fρ). In fact, according to the first part of (4), we obtain E(fz)Ez(fz)+Ez(fk,m)E(fk,m)={(Ez(fk,m)Ez(fρ))(E(fk,m)E(fρ))}+{(E(fz)E(fρ))(Ez(fz)Ez(fρ))}.

We know that the main difference in the formulation of the ranking problem as compared with the problems of classification and regression is the performance

Comparisons with related work

In Section 4, we have studied the convergence performance of ranking learning algorithms in a setting that is more general than what has been considered previously. We have derived the upper bound of ranking learning algorithms by using the approximation property of neural networks and covering number. In this section we discuss how our results relate to other recent studies.

Conclusions

This article has studied the generalization performance of ranking algorithms based on a setting where ranking preferences among instances are indicated by real-valued labels on the instances. This setting of the ranking problem arises frequently in practice and is more general than the setting of bipartite ranking. Comparing with giving the generalization properties of ranking learning algorithms by algorithmic stability in reproducing kernel Hilbert space, we give the upper bound of uniform

References (35)

  • F.L. Cao et al.

    The lower estimation of approximation rate for neural networks

    Science in China. Series F. Information Sciences

    (2009)
  • T.P. Chen et al.

    Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks

    IEEE Transactions on Neural Networks

    (1995)
  • T.P. Chen et al.

    Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems

    IEEE Transactions on Neural Networks

    (1995)
  • T.P. Chen et al.

    Approximation capability in C(Rn) by multilayer feedforward networks and related problems

    IEEE Transactions on Neural Networks

    (1995)
  • A.C. Chiang et al.

    Fundamental methods of mathematical economics

    (2005)
  • S. Clemencon et al.

    Ranking and empirical minimization of U-statistics

    Annals of Statistics

    (2008)
  • W.W. Cohen et al.

    Learning to order things

    Journal of Artificial Intelligence Research

    (1999)
  • Cited by (8)

    • Stability analysis for ranking with stationary ϕ-mixing samples

      2016, Neurocomputing
      Citation Excerpt :

      Furthermore, a weak apparent time dependency may also influence data sampling in many other learning problems [12–15]. Recently, more and more researches focus on ranking problem, especially ranking algorithmic stability analysis and generalization performance, in the literature of machine learning [4-7,16-28]. The main object of this paper is to investigate the stability-based generalization bounds of ranking algorithms, in which the samples are drawn from a stationary φ-mixing sequence.

    • Extreme learning machine for ranking: Generalization analysis and applications

      2014, Neural Networks
      Citation Excerpt :

      From different perspectives, many ranking algorithms have been proposed including RankSVM (Herbrich, Graepel, & Obermayer, 2000; Joachims, 2002), RankNet (Burges, Ragno, & Le, 2007; Burges et al., 2005), RankBoost (Freund, Iyer, Schapire, & Singer, 2003), and MPRank (Cortes, Mohri, & Rastogi, 2007). The generalization analysis for the ranking problem has been established via stability analysis (Agarwal & Niyogi, 2009; Cossock & Zhang, 2008), uniform convergence estimate based on the capacity of hypothesis spaces (Clemencon, Luogosi, & Vayatis, 2008; Rejchel, 2012; Rudin, 2009; Zhang & Cao, 2012), and approximation estimate based on the operator approximation (Chen, 2012; Chen et al., 2013). In this paper, inspired by the theoretical analysis in Liu et al. (2013), we propose an ELM-based ranking (ELMRank) algorithm to search a ranking function in a coefficient-based regularization scheme.

    • Data-driven satisficing measure and ranking

      2020, Journal of the Operational Research Society
    View all citing articles on Scopus

    This research was supported by the National Natural Science Foundation of China (No. 61101240) and the Zhejaing Provincial Nature Science Foundation of China (Nos Y6110117, Q12A010096).

    View full text