Elsevier

Neural Networks

Volume 23, Issue 7, September 2010, Pages 812-818
Neural Networks

Semi-supervised learning based on high density region estimation

https://doi.org/10.1016/j.neunet.2010.06.001Get rights and content

Abstract

In this paper, we consider local regression problems on high density regions. We propose a semi-supervised local empirical risk minimization algorithm and bound its generalization error. The theoretical analysis shows that our method can utilize unlabeled data effectively and achieve fast learning rate.

Introduction

Semi-supervised learning, i.e., learning from both labeled and unlabeled data, has attracted more and more attention in recent years (Chapelle et al., 2006, Zhu, 2005). The key challenge of semi-supervised learning is how to improve the learning performance using few labeled data together with a large amount of unlabeled data. There are mainly two methods to implement semi-supervised learning. The first one is to label part of unlabeled data using a high precision learner, and then put the “automatically” labeled data into the training set. Examples include transductive inference (Joachims, 1999, Vapnik, 1998), co-training (Blum & Mitchell, 1998), and large margin algorithm (Wang & Shen, 2007). Another approach of semi-supervised learning assumes that the input space lies on a low-dimensional manifold. These algorithms are mainly to use unlabeled data to construct a data-manifold, on which proper smooth function classes can be defined (Belkin et al., 2004, Belkin et al., 2006, Chen et al., 2009). While various ideas have been proposed based on different intuitions, only recently there are theoretical studies trying to understand why these methods work (Belkin et al., 2004, Chen and Li, 2009, Johnson and Zhang, 2007, Johnson and Zhang, 2008, Rigollet, 2007, Wang and Shen, 2007). Despite progress, many open problems remain. Specially, in error analysis, the precise relationship between supervised learning and semi-supervised learning remains unclear. To explore the relationship, we consider the unlabeled data as a tool to simplify our learning tasks. In fact, it is possible to obtain an additional gain if one can make a “smart” partition of the input space. The question arises “how do we partition an input space into subspaces to obtain a good solution for problem of local function estimation?” To answer the question, the local empirical risk minimization (ERM) method is proposed and its generalization error bounds are established (Vapnik, 1998).

However, to best of our knowledge, the analysis about the choice of local regions has not been reported anywhere yet. In order to fill the gap, we consider using unlabeled data to decide the local regions. We assume that the prediction accuracy of high density regions is far more important than other regions. In fact, this assumption is reasonable from the definition of the generalization error. Then, we propose a semi-supervised local empirical risk minimization algorithm based on the estimation of high density regions. In essential, the algorithm uses unlabeled data to estimate the high density regions and then predicts the output value in these regions by means of labeled data.

The points below highlight several new features of the current paper:

  • Our method tries to bring together four distinct concepts that have received some independent attention recently in machine learning: local risk minimization method (Vapnik, 1998), error analysis of ERM method (Cucker and Smale, 2002, DeVore et al., 2006), concentration inequalities (Bousquet, 2003, Giné and Koltchinskii, 2006), and density estimation (Rigollet, 2007). We show how these ideas can be brought together in a coherent and natural way to construct and analyze a new semi-supervised learning algorithm.

  • Based on the principles of local risk minimization (Vapnik, 1998) and recent semi-supervised method (Rigollet, 2007), we pay attention to simplify learning tasks using unlabeled data. In fact, it is reasonable that we disregard some negligible regions to realize better prediction on important regions. It is worthy to note that our starting point and goal are different from the philosophy on manifold learning (Belkin et al., 2006, Ye and Zhou, 2008). Our viewpoint sheds some new light on the theoretical analysis of the local risk minimization method (Cucker & Zhou, 2007).

  • Compared with the semi-supervised cluster algorithm (Rigollet, 2007), our method does not need the cluster assumption. Thus, our method is more suitable to deal with general learning problems where the cluster assumption is not satisfied. Meanwhile, our framework is closely related to the characteristics of the input space, which is consistent with previous results of the utilizing data geometry structure (Belkin et al., 2004, Belkin et al., 2006, Chapelle et al., 2006). When the input space is high density in some small special regions, our local convergence rate is far faster than that of the ERM method (Cucker and Smale, 2002, DeVore et al., 2006).

  • Notice that input-dependent estimates of the generalization error have been established (Sugiyama & Müller, 2005) based on the density modification idea. Meanwhile, the subspace information criterion of assumption space has been well studied (Sugiyama et al., 2004, Sugiyama and Ogawa, 2002). Different from these results for supervised learning, we investigate semi-supervised regression and choose the assumption space based on partitions of the input space.

  • Based on the concentration inequalities for the empirical process (Bousquet, 2003, Giné and Koltchinskii, 2006), we derive a relative error bounds for the local ERM method. Although the bounds are not tight enough, they give a novel insight into error analysis of learning algorithms.

The rest of this paper is organized as follows. The necessary background for the local ERM method is reviewed in Section 2. The error analysis for known local regions is established in Section 3. After that, the main theoretical results of the paper are presented in Section 4, where semi-supervised local ERM methods are proposed and their error estimations are established. Finally, a brief conclusion is given in Section 5.

Section snippets

Problem setup and preliminaries

Let the input space XRd be a compact domain or a manifold in the Euclidean space and Y=[M,M]. In the semi-supervised model, the learner gets a labeled data set Zl={(x1,y1),,(xn,yn)} and an unlabeled data set Xu={xn+1,,xn+m}. Here, the labeled examples, (xi,yi)ZX×Y,1in, are independent copies of random element (x,y) having distribution ρ on Z. The unlabeled data xn+j,1jm, are independent copies of X, whose distribution (marginal distribution of ρ) we denote by ρX. The goal of learning

Error bounds for known high density regions

When the family T1,,TJ is known, we observe only the labeled data Zl. For any input xΓ, we predict the output value by the ERM method on Γ. The predictor defined on Tj is fˆnj=argminfHjEˆTj(f), where Hj is the assumption space on Tj. Then, the predictor defined on Γ is fˆn(x)=j1fˆnj(x)I{xTj}.

Now, we introduce the definition of covering numbers.

Definition 2

For a subset F of a metric space and η>0, the covering number N(F,η) is defined to be the minimal integer lN such that there exist l disks with

Error bounds based on high density region estimation

We consider a more realistic case where high density regions T1,,TJ are unknown and we have to estimate them using unlabeled data (Rigollet, 2007). In fact, when two high density regions are too close to each other in a certain case, we wish to identify them as a single region. This is consistent with the fact that the finite number of unlabeled observations allows us to have only a blurred vision of the high density regions. To provide a motivation for using labeled and unlabeled data to

Conclusion

We have investigated the generalization performance of the semi-supervised local empirical risk minimization method in this study. We establish the estimates of the generalization error in the ideal and realistic setting, respectively. Our results show that this method can achieve fast learning rates on the high density regions under mild assumptions.

Acknowledgements

The authors would like to thank Prof. Dr. D.-X. Zhou for his valuable suggestions. The authors are indebted to the handling associate editor and the anonymous reviewers for their detailed and careful comments and constructive suggestions. The research is supported partially by NSFC under Grant No. 10771053, by the Fundamental Research Funds for the Central Universities (Q52204-09099), by Huazhong Agricultural University Interdisciplinary Fund (2008xkjc008) and Torch Plan Fund (2009XH003).

References (31)

  • H. Chen et al.

    Semi-supervised multi-category classification with imperfect model

    IEEE Transactions on Neural Networks

    (2009)
  • H. Chen et al.

    Analysis of classification with a reject option

    International Journal of Wavelets, Multiresolution and Information Processing

    (2009)
  • F. Cucker et al.

    On the mathematical foundations of learning

    Bulletin of the American Mathematical Society

    (2002)
  • F. Cucker et al.

    Learning theory: an approximation theory viewpoint

    (2007)
  • A. Cuevas et al.

    A plug-in approach to support estimation

    Annals of Statistics

    (1997)
  • Cited by (10)

    • The convergence rate of semi-supervised regression with quadratic loss

      2018, Applied Mathematics and Computation
      Citation Excerpt :

      Semi-supervised learning addresses learning by using amount of unlabeled data, together with the labeled data, to build better learning because it requires less human effort and gives higher accuracy (see [34]). To provide theory supports for semi-supervised learning approach, many mathematicians have paid their attentions to the error analysis of the kernel regularized semi-supervised Laplacian learning (see e.g. [3–7,24,34]. Although the framework is complicated, it can actually means that one can accomplish the learning by increasing the unlabeled samples number l.

    • Convergence rate of the semi-supervised greedy algorithm

      2013, Neural Networks
      Citation Excerpt :

      The second one is that the learning rates essentially depend on the number of the labeled data even if the number of unlabeled data tends to infinity. Furthermore, our error analysis results rely on weaker conditions than the previous methods which are based on density assumption or manifold assumption in Belkin et al. (2006), Belkin and Niyogi (2004), Chen and Li (2009), Chen et al. (2009), Chen, Li, and Peng (2010), Johnson and Zhang (2007, 2008) and Rigollet (2007). Even for the supervised learning settings, we can achieve faster learning rates than the previous results in Xiao and Zhou (2010), Shi et al. (2011) and Sun and Wu (2011).

    • Convergence rate of SVM for kernel-based robust regression

      2019, International Journal of Wavelets, Multiresolution and Information Processing
    • The performance of semi-supervised Laplacian regularized regression with the least square loss

      2017, International Journal of Wavelets, Multiresolution and Information Processing
    • Convergence rate of semi-supervised gradient learning algorithms

      2015, International Journal of Wavelets, Multiresolution and Information Processing
    View all citing articles on Scopus
    View full text