Semi-supervised learning based on high density region estimation

doi:10.1016/j.neunet.2010.06.001

Neural Networks

Volume 23, Issue 7, September 2010, Pages 812-818

https://doi.org/10.1016/j.neunet.2010.06.001 Get rights and content

Abstract

In this paper, we consider local regression problems on high density regions. We propose a semi-supervised local empirical risk minimization algorithm and bound its generalization error. The theoretical analysis shows that our method can utilize unlabeled data effectively and achieve fast learning rate.

Introduction

Semi-supervised learning, i.e., learning from both labeled and unlabeled data, has attracted more and more attention in recent years (Chapelle et al., 2006, Zhu, 2005). The key challenge of semi-supervised learning is how to improve the learning performance using few labeled data together with a large amount of unlabeled data. There are mainly two methods to implement semi-supervised learning. The first one is to label part of unlabeled data using a high precision learner, and then put the “automatically” labeled data into the training set. Examples include transductive inference (Joachims, 1999, Vapnik, 1998), co-training (Blum & Mitchell, 1998), and large margin algorithm (Wang & Shen, 2007). Another approach of semi-supervised learning assumes that the input space lies on a low-dimensional manifold. These algorithms are mainly to use unlabeled data to construct a data-manifold, on which proper smooth function classes can be defined (Belkin et al., 2004, Belkin et al., 2006, Chen et al., 2009). While various ideas have been proposed based on different intuitions, only recently there are theoretical studies trying to understand why these methods work (Belkin et al., 2004, Chen and Li, 2009, Johnson and Zhang, 2007, Johnson and Zhang, 2008, Rigollet, 2007, Wang and Shen, 2007). Despite progress, many open problems remain. Specially, in error analysis, the precise relationship between supervised learning and semi-supervised learning remains unclear. To explore the relationship, we consider the unlabeled data as a tool to simplify our learning tasks. In fact, it is possible to obtain an additional gain if one can make a “smart” partition of the input space. The question arises “how do we partition an input space into subspaces to obtain a good solution for problem of local function estimation?” To answer the question, the local empirical risk minimization (ERM) method is proposed and its generalization error bounds are established (Vapnik, 1998).

However, to best of our knowledge, the analysis about the choice of local regions has not been reported anywhere yet. In order to fill the gap, we consider using unlabeled data to decide the local regions. We assume that the prediction accuracy of high density regions is far more important than other regions. In fact, this assumption is reasonable from the definition of the generalization error. Then, we propose a semi-supervised local empirical risk minimization algorithm based on the estimation of high density regions. In essential, the algorithm uses unlabeled data to estimate the high density regions and then predicts the output value in these regions by means of labeled data.

The points below highlight several new features of the current paper:

•
Our method tries to bring together four distinct concepts that have received some independent attention recently in machine learning: local risk minimization method (Vapnik, 1998), error analysis of ERM method (Cucker and Smale, 2002, DeVore et al., 2006), concentration inequalities (Bousquet, 2003, Giné and Koltchinskii, 2006), and density estimation (Rigollet, 2007). We show how these ideas can be brought together in a coherent and natural way to construct and analyze a new semi-supervised learning algorithm.
•
Based on the principles of local risk minimization (Vapnik, 1998) and recent semi-supervised method (Rigollet, 2007), we pay attention to simplify learning tasks using unlabeled data. In fact, it is reasonable that we disregard some negligible regions to realize better prediction on important regions. It is worthy to note that our starting point and goal are different from the philosophy on manifold learning (Belkin et al., 2006, Ye and Zhou, 2008). Our viewpoint sheds some new light on the theoretical analysis of the local risk minimization method (Cucker & Zhou, 2007).
•
Compared with the semi-supervised cluster algorithm (Rigollet, 2007), our method does not need the cluster assumption. Thus, our method is more suitable to deal with general learning problems where the cluster assumption is not satisfied. Meanwhile, our framework is closely related to the characteristics of the input space, which is consistent with previous results of the utilizing data geometry structure (Belkin et al., 2004, Belkin et al., 2006, Chapelle et al., 2006). When the input space is high density in some small special regions, our local convergence rate is far faster than that of the ERM method (Cucker and Smale, 2002, DeVore et al., 2006).
•
Notice that input-dependent estimates of the generalization error have been established (Sugiyama & Müller, 2005) based on the density modification idea. Meanwhile, the subspace information criterion of assumption space has been well studied (Sugiyama et al., 2004, Sugiyama and Ogawa, 2002). Different from these results for supervised learning, we investigate semi-supervised regression and choose the assumption space based on partitions of the input space.
•
Based on the concentration inequalities for the empirical process (Bousquet, 2003, Giné and Koltchinskii, 2006), we derive a relative error bounds for the local ERM method. Although the bounds are not tight enough, they give a novel insight into error analysis of learning algorithms.

The rest of this paper is organized as follows. The necessary background for the local ERM method is reviewed in Section 2. The error analysis for known local regions is established in Section 3. After that, the main theoretical results of the paper are presented in Section 4, where semi-supervised local ERM methods are proposed and their error estimations are established. Finally, a brief conclusion is given in Section 5.

Section snippets

Problem setup and preliminaries

Let the input space $X \subset R^{d}$ be a compact domain or a manifold in the Euclidean space and $Y = [- M, M]$ . In the semi-supervised model, the learner gets a labeled data set $Z_{l} = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}$ and an unlabeled data set $X_{u} = {x_{n + 1}, \dots, x_{n + m}}$ . Here, the labeled examples, $(x_{i}, y_{i}) \in Z ≔ X \times Y, 1 \leq i \leq n$ , are independent copies of random element $(x, y)$ having distribution $ρ$ on $Z$ . The unlabeled data $x_{n + j}, 1 \leq j \leq m$ , are independent copies of $X$ , whose distribution (marginal distribution of $ρ$ ) we denote by $ρ_{X}$ . The goal of learning

Error bounds for known high density regions

When the family $T_{1}, \dots, T_{J}$ is known, we observe only the labeled data $Z_{l}$ . For any input $x \in Γ$ , we predict the output value by the ERM method on $Γ$ . The predictor defined on $T_{j}$ is ${\hat{f}}_{n}^{j} = arg min_{f \in H_{j}} {\hat{E}}_{T_{j}} (f),$ where $H_{j}$ is the assumption space on $T_{j}$ . Then, the predictor defined on $Γ$ is ${\hat{f}}_{n} (x) = \sum_{j \geq 1} {\hat{f}}_{n}^{j} (x) I_{{x \in T_{j}}} .$

Now, we introduce the definition of covering numbers.

Definition 2

For a subset $F$ of a metric space and $η > 0$ , the covering number $N (F, η)$ is defined to be the minimal integer $l \in N$ such that there exist $l$ disks with

Error bounds based on high density region estimation

We consider a more realistic case where high density regions $T_{1}, \dots, T_{J}$ are unknown and we have to estimate them using unlabeled data (Rigollet, 2007). In fact, when two high density regions are too close to each other in a certain case, we wish to identify them as a single region. This is consistent with the fact that the finite number of unlabeled observations allows us to have only a blurred vision of the high density regions. To provide a motivation for using labeled and unlabeled data to

Conclusion

We have investigated the generalization performance of the semi-supervised local empirical risk minimization method in this study. We establish the estimates of the generalization error in the ideal and realistic setting, respectively. Our results show that this method can achieve fast learning rates on the high density regions under mild assumptions.

Acknowledgements

The authors would like to thank Prof. Dr. D.-X. Zhou for his valuable suggestions. The authors are indebted to the handling associate editor and the anonymous reviewers for their detailed and careful comments and constructive suggestions. The research is supported partially by NSFC under Grant No. 10771053, by the Fundamental Research Funds for the Central Universities (Q52204-09099), by Huazhong Agricultural University Interdisciplinary Fund (2008xkjc008) and Torch Plan Fund (2009XH003).

References (31)

H. Chen et al.
Error bounds of multi-graph regularized semi-supervised classification
Information Sciences
(2009)
M. Sugiyama et al.
Optimal design of regularization term and regularization parameter by subspace information criterion
Neural Networks
(2002)
D.X. Zhou
The covering number in learning theory
Journal of Complexity
(2002)
B. Zou et al.
The performance bounds of learning machines based on exponentially strongly mixing sequence
Computer and Mathematics with Applications
(2007)
Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). Optics: ordering points to identify the clustering...
M. Belkin et al.
Regularization and semi-supervised learning on large graphs
M. Belkin et al.
Manifold regularization: a geometric framework for learning from labeled and unlabeled examples
Journal of Machine Learning Research
(2006)
A. Blum et al.
Combining labeled and unlabeled data with co-training
O. Bousquet
New approaches to statistical learning theory
Annals of the Institute of Statistical Mathematics
(2003)
O. Chapelle et al.
Semi-supervised learning
(2006)

H. Chen et al.

Semi-supervised multi-category classification with imperfect model

IEEE Transactions on Neural Networks

(2009)

H. Chen et al.

Analysis of classification with a reject option

International Journal of Wavelets, Multiresolution and Information Processing

(2009)

F. Cucker et al.

On the mathematical foundations of learning

Bulletin of the American Mathematical Society

(2002)

F. Cucker et al.

Learning theory: an approximation theory viewpoint

(2007)

A. Cuevas et al.

A plug-in approach to support estimation

Annals of Statistics

(1997)

Cited by (10)

The convergence rate of semi-supervised regression with quadratic loss
2018, Applied Mathematics and Computation
Citation Excerpt :
Semi-supervised learning addresses learning by using amount of unlabeled data, together with the labeled data, to build better learning because it requires less human effort and gives higher accuracy (see [34]). To provide theory supports for semi-supervised learning approach, many mathematicians have paid their attentions to the error analysis of the kernel regularized semi-supervised Laplacian learning (see e.g. [3–7,24,34]. Although the framework is complicated, it can actually means that one can accomplish the learning by increasing the unlabeled samples number l.
It is known that the semi-supervised learning deals with learning algorithms with less labeled samples and more unlabeled samples. One of the problems in this field is to show, at what extent, the performance depends upon the unlabeled number. A kind of modified semi-supervised regularized regression with quadratic loss is provided. The convergence rate for the error estimate is given in expectation mean. It is shown that the learning rate is controlled by the number of the unlabeled samples, and the algorithm converges with the increasing of the unlabeled sample number.
Convergence rate of the semi-supervised greedy algorithm
2013, Neural Networks
Citation Excerpt :
The second one is that the learning rates essentially depend on the number of the labeled data even if the number of unlabeled data tends to infinity. Furthermore, our error analysis results rely on weaker conditions than the previous methods which are based on density assumption or manifold assumption in Belkin et al. (2006), Belkin and Niyogi (2004), Chen and Li (2009), Chen et al. (2009), Chen, Li, and Peng (2010), Johnson and Zhang (2007, 2008) and Rigollet (2007). Even for the supervised learning settings, we can achieve faster learning rates than the previous results in Xiao and Zhou (2010), Shi et al. (2011) and Sun and Wu (2011).
This paper proposes a new greedy algorithm combining the semi-supervised learning and the sparse representation with the data-dependent hypothesis spaces. The proposed greedy algorithm is able to use a small portion of the labeled and unlabeled data to represent the target function, and to efficiently reduce the computational burden of the semi-supervised learning. We establish the estimation of the generalization error based on the empirical covering numbers. A detailed analysis shows that the error has $O (n^{- 1})$ decay. Our theoretical result illustrates that the unlabeled data is useful to improve the learning performance under mild conditions.
Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition
2012, Neural Networks
In this work, sub-manifold projections based semi-supervised dimensionality reduction (DR) problem learning from partial constrained data is discussed. Two semi-supervised DR algorithms termed Marginal Semi-Supervised Sub-Manifold Projections ( ${MS}^{3} MP$ ) and orthogonal ${MS}^{3} MP$ ( ${OMS}^{3} MP$ ) are proposed. ${MS}^{3} MP$ in the singular case is also discussed. We also present the weighted least squares view of ${MS}^{3} MP$ . Based on specifying the types of neighborhoods with pairwise constraints (PC) and the defined manifold scatters, our methods can preserve the local properties of all points and discriminant structures embedded in the localized PC. The sub-manifolds of different classes can also be separated. In PC guided methods, exploring and selecting the informative constraints is challenging and random constraint subsets significantly affect the performance of algorithms. This paper also introduces an effective technique to select the informative constraints for DR with consistent constraints. The analytic form of the projection axes can be obtained by eigen-decomposition. The connections between this work and other related work are also elaborated. The validity of the proposed constraint selection approach and DR algorithms are evaluated by benchmark problems. Extensive simulations show that our algorithms can deliver promising results over some widely used state-of-the-art semi-supervised DR techniques.
Convergence rate of SVM for kernel-based robust regression
2019, International Journal of Wavelets, Multiresolution and Information Processing
The performance of semi-supervised Laplacian regularized regression with the least square loss
2017, International Journal of Wavelets, Multiresolution and Information Processing
Convergence rate of semi-supervised gradient learning algorithms
2015, International Journal of Wavelets, Multiresolution and Information Processing

View all citing articles on Scopus

View full text

Semi-supervised learning based on high density region estimation

Abstract

Introduction

Section snippets

Problem setup and preliminaries

Error bounds for known high density regions

Error bounds based on high density region estimation

Conclusion

Acknowledgements

Information Sciences

Neural Networks

Journal of Complexity

Computer and Mathematics with Applications

Regularization and semi-supervised learning on large graphs

Manifold regularization: a geometric framework for learning from labeled and unlabeled examples

Journal of Machine Learning Research

Combining labeled and unlabeled data with co-training

New approaches to statistical learning theory

Annals of the Institute of Statistical Mathematics