Elsevier

Knowledge-Based Systems

Volume 83, July 2015, Pages 58-65
Knowledge-Based Systems

Prior class dissimilarity based linear neighborhood propagation

https://doi.org/10.1016/j.knosys.2015.03.011Get rights and content

Abstract

The insufficiency of labeled training data for representing the distribution of entire dataset is a major obstacle in various practical data mining applications. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, provide possibilities to solve this problem. Graph-based semi-supervised learning has recently become one of the most active research areas. In this paper, a novel graph-based semi-supervised learning approach entitled Class Dissimilarity based Linear Neighborhood Propagation (CD-LNP) is proposed, which assumes that each data point can be linearly reconstructed from its neighborhood. The neighborhood graph of the input data is constructed according to a certain kind of dissimilarity between data points, which is specially designed to integrate the class information. Our algorithm can propagate the labels from the labeled points to entire data set using these linear neighborhoods with sufficient smoothness. Experiment results demonstrate that our approach outperforms other popular graph-based semi-supervised learning methods.

Introduction

During the last years, learning from both labeled and unlabeled samples, known as semi-supervised learning (SSL), has emerged as a booming direction in machine learning research. Detailed survey of its related literatures presented in [1].

As a major family of semi-supervised learning, graph-based methods have attracted more and more research and have been widely applied in many areas, such as text categorization [2], image retrieval [3], and image annotation [4], [5].

Encouraging results have been reported when samples have clearly intrinsic structure and the test data are well sampled, Nevertheless, as can be seen in following sections of this paper, these algorithms are not so powerful when confronted with different class overlapping and data distributed imbalance. They may cause the choice of the neighbors to be unreasonable and destroy label smoothness when constructing the graph in these approaches.

In this paper, we exploited the prior class information in the framework of graph-based semi-supervised learning and proposed a novel method named Class Dissimilarity based Linear Neighborhood Propagation (CD-LNP). Unfamiliar with traditional graph-based semi-supervised learning schemes which mentioned above, CD-LNP utilizes the class labels of the input data to guide the learning process. Thus, the interclass dissimilarity is definitely larger than intraclass dissimilarity, which is a superior property for classification.

The rest of this paper is organized as follows. In Section 2, we briefly introduced traditional graph-based semi-supervised learning schemes and analyzed their limitations; and the proposed CD-LNP strategy was detailed in Section 3. In Section 4, experiments are reported. Finally, in Section 5, conclusions are drawn and several issues for future work are indicated.

Section snippets

Related works

Graph-based schemes are typical approaches of semi-supervised learning [6], such as FAS (Frequent Approximate Subgraph) in [7] and DLP (Dynamic Label Propagation) in [8]. In these methods, labeled and unlabeled sample points are first organized as the nodes of a graph, of which the edge connecting two nodes directly has a weight proportional to the proximity of these two sample points. Then, labels are “propagated” along the weighted edges from labeled nodes to unlabeled ones, in order to get

The algorithm

Graph-based semi-supervised learning starts by constructing a graph from the training data. These algorithms often resort to KNN method when specifying the edge weights. Each vertex defines its k nearest neighbor vertices in Euclidean distance. Therefore, selecting precise neighbor is of great importance. However, in real application, there are always existing data regions with overlapping class and imbalance distribution. They may cause unreasonable choice of the neighbors and destroy label

Experiments

In this section, we provided a set of experiments, which we used CD-LNP for semi-supervised classification. To evaluate the performance of proposed method, we compare CD-LNP with four other popular graph-based semi-supervised learning methods, including Mincut, LGC, GRF and LNP.

Conclusion

In this paper, we presented a novel graph based semi-supervised classification approach, called Class Dissimilarity Linear Neighborhood Propagation. It is novel in the aspect of graph structure construction and weight estimation. This approach can be cast into the second-order intrinsic Gaussian Markov random field framework. It is equivalent to solving a biharmonic equation with Dirichlet boundary conditions. Experimental results demonstrate the effectiveness of proposed method.

We can conclude

Acknowledgement

Both authors would like to acknowledge the support of Key Laboratory of Electronic Restriction and National Natural Science Foundation (No. 61179036, No. 60872113).

References (15)

  • F. Zang et al.

    Label propagation through sparse neighborhood and its applications

    Neurocomputing

    (2012)
  • X. Zhu, Semi-supervised Learning Literature Survey (Technical Report 1530), Computer Sciences, University of...
  • C. Deng et al.

    Manifold adaptive experimental design for text categorization

    IEEE Trans. Knowl. Data Eng.

    (2012)
  • J. Salim et al.

    Hypergraph-based image retrieval for graph-based representation

    Pattern Recogn.

    (2012)
  • Y. Yi et al.

    Web and personal image annotation by mining label correlation with relaxed visual graph embedding

    IEEE Trans. Image Process.

    (2012)
  • L.Y. Wen et al.

    Graph-based semi-supervised learning with multi-modality propagation for large-scale image datasets

    J. Vis. Commun. Image R.

    (2013)
  • X.J. Zhu, A.B. Goldberg, T. Khot, Some new directions in graph-based semi-supervised learning, in: IEEE International...
There are more references available in the full text version of this article.

Cited by (17)

  • Adaptive non-negative projective semi-supervised learning for inductive classification

    2018, Neural Networks
    Citation Excerpt :

    The transductive learning methods aim to estimate the unknown labels of inside unlabeled data, but they cannot predict the unknown labels of outside unlabeled data. Several representative transductive LP learning algorithms consist of SSL using Gaussian Fields and Harmonic Functions (GFHF) (Zhu, Ghahramani, & Lafferty, 2003), Learning with Local and Global Consistency (LLGC) (Zhou, Bousquet, Lal, Weston, & Scholkopf, 2004), Linear Neighborhood Propagation (LNP) (Wang & Zhang, 2006), Special Label Propagation (SLP) (Nie, Xiang, & Liu, 2010), Projective Label Propagation (ProjLP) (Zhang, Jiang, & Li, 2015), Class Dissimilarity based LNP (CD-LNP) (Zhang, Wang, & Li, 2015), Robust Linear Neighborhood Propagation (R-LNP) (Jia, Zhang, & Jiang, 2016), and Sparse Neighborhood Propagation (SparseNP) (Zhang et al., 2015), etc. It is worth noting that several researchers have also incorporated the idea of semi-supervised label propagation learning into the Non-Negative Matrix Factorization (NMF) (Lee, 2001) and the Projective NMF (PNMF) frameworks (Yang & Oja, 2010), termed Semi-Supervised NMF (SSNMF) (Lee, Yoo, & Choi, 2010) and Semi-Supervised PNMF (Semi-PNMF) (Zhang, Guan, Jia, Qiu, & Luo, 2015).

  • Discriminative clustering on manifold for adaptive transductive classification

    2017, Neural Networks
    Citation Excerpt :

    That is, we mainly evaluate our algorithm by quantitative evaluation of image classification and visual observation of image segmentation. Note that the classification performance of our model is mainly compared with several related label propagation models, including SLP (Nie, Xiang et al., 2010), LNP (Wang & Zhang, 2008), LLGC (Zhou et al., 2004), LapLDA (Tang et al., 2006), GFHF (Zhu et al., 2003) and CD-LNP (Zhang et al., 2015). For fair comparison, all experiments are repeated 20 times and the averaged results are illustrated for each method to avoid the bias.

  • Projective label propagation by label embedding: A deep label prediction framework for representation and classification

    2017, Knowledge-Based Systems
    Citation Excerpt :

    Next, we will briefly review the several popular transductive LP criteria and out-of-sample extensions, which are related to our formulations. Several researchers have also proposed two-stage approaches based on LP, i.e., using an independent follow-up step and employing the outputted soft labels to construct the soft scatter matrices for semi-supervised discriminant analysis based image classification and retrieval, e.g., [4,8,28,29]. Thus, a dimension reduction based projection is delivered for embedding new data.

  • Robust triple-matrix-recovery-based auto-weighted label propagation for classification

    2020, IEEE Transactions on Neural Networks and Learning Systems
View all citing articles on Scopus
View full text