Elsevier

Pattern Recognition Letters

Volume 128, 1 December 2019, Pages 16-22
Pattern Recognition Letters

Semi-supervised learning with connectivity-driven convolutional neural networks

https://doi.org/10.1016/j.patrec.2019.08.012Get rights and content

Highlights

  • A new application for the semi-supervised Optimum-Path Forest classifier.

  • New highlights in how to improve deep networks using semi-supervised learning.

  • Promising and accurate results.

  • More contributions to semi-supervised-related literature.

  • An extensive experimental evaluation is conducted.

Abstract

The annotation of large datasets is an issue whose challenge increases as the number of labeled samples available to train the classifier reduces in comparison to the amount of unlabeled data. In this context, semi-supervised learning methods aim at discovering and propagating labels to unlabeled samples, such that their correct labeling can improve the classification performance. In this work, we propose a semi-supervised methodology that explores the optimum connectivity among unlabeled samples through the Optimum-Path Forest (OPF) classifier to improve the learning process of Convolution Neural Networks (CNNs). Our proposal makes use of the OPF to classify an unlabeled training set that is used to pre-train a CNN for further fine-tuning using the limited labeled data only. The proposed approach is experimentally validated on traditional datasets and provides competitive results in comparison to state-of-the-art semi-supervised learning methods.

Introduction

Predicting and clustering samples are crucial tasks in several application domains. Such processes can be performed in different ways and be adopted effectively when the dataset is entirely labeled. Unfortunately, as the number of samples increases, the classifier may lack stability due to the limited amount of labeled samples since most of them are manually labeled, which is a time-consuming and prone to error task. Therefore, a question arises naturally: Can one improve the performance of a classifier using both labeled and unlabeled samples? That is the primary concern that drives techniques based on the semi-supervised learning framework.

Such approaches make use of a labeled set that is extended with unlabeled samples, usually in a more significant amount. Such approaches either perform classification using supervised methods for label propagation [4], [8], [25], [36] or take into account the spatial distribution of the entire training set, i.e., both labeled and unlabeled samples, for learning purposes [1], [19]. These methods include self-training, generative probabilistic models, co-training, graph-based models, and semi-supervised Support Vector Machines, among others.

Deep learning has attracted considerable attention in the past years [6], which corresponds to a relatively broad class of machine learning techniques that employ complex neural architectures to perform classification. Such approaches encode non-linear information through several layers that are hierarchical in nature, thus assimilating problems at different levels of abstraction. However, as argued by several researchers [7], [16], supervised or unsupervised data, working independently, may not be sufficient to provide good performances. Therefore, the use of unsupervised data can be beneficial to improve classification performance.

Several works have recently proposed methods that use unsupervised samples to improve learning in deep neural networks. In a nutshell, the network is pre-trained using unlabeled samples for later adjustment using labeled data [16], [33]. Lee [23] presented a semi-supervised learning method for deep neural networks that consists of the simultaneous learning of both unlabeled and labeled samples, where the former are pre-labeled using softmax outputs. Another approach for pseudo-labeling was discussed by Wu and Prasad [42], where semi-supervised learning was used for the classification of hyperspectral images with pseudo-labeled samples to train a deep recurrent convolutional network. The experimental results demonstrated that the proposed approach exceeds the most recent supervised and semi-supervised learning methods for the classification of hyperspectral images.

Wu et al. [41] proposed a weak semi-supervised deep learning approach for the annotation of multi-label images using CNNs, where the idea was to use images that were weakly or even not labeled to train a deep neural network. A weighted pairwise ranking loss was employed to cope with the weakly labeled images, while a triplet similarity loss was applied to harness unlabeled images. Gao et al. [14] also presented a semi-supervised algorithm using Convolutional Neural Networks (CNNs) but in the context of active learning, which is used to find the most representative unlabeled samples together with a new regularization term in the loss of function. Weston et al. [40] concentrated on the idea of combining an embedding-based regularizer with a supervised learner to perform semi-supervised learning, as used by such techniques as LapSVM [5].

However, we have observed that only recently the works attempted at considering the semi-supervised framework together with deep learning techniques. The approach proposed in this paper tries to fill this gap by considering semi-supervised learning techniques to enhance the performance of Convolutional Neural Networks. Specifically, we showed that our recent approach based on the Optimum-Path Forest classifier (OPF) [27], [30], [31] can outperform several other works in the literature since it considers the optimum connectivity between supervised and unsupervised samples.

In a nutshell, the proposed approach works as follows: first, all available training samples (labeled and unlabeled) are pseudo-labeled using the OPF for further using the entire training set to train a Convolutional Neural Network. The semi-supervised learning approach connects unlabeled and labeled samples as nodes of a minimum-spanning tree and partitions the tree into an optimum-path forest rooted at the labeled nodes. The adjacency relation is defined as the set of arcs of a Minimum-Spanning Tree (MST) of the complete graph, whose nodes are the labeled and unlabeled samples. We then simplify the choice of the forest roots to be all labeled samples and the classifier is created from a single execution of the OPF algorithm, on the topology of the MST. Therefore, labeled nodes will compete with each other, and the pseudo-labels assigned to each unlabeled sample will come from its most closely connected labeled node. Finally, the network is fine-tuned using only the training samples for which one knows the true labels, i.e., those samples that were already labeled since the beginning. Notice that we fine-tuned only the fully-connected layers since we assume the labeled samples are limited in quantity. Therefore, the pre-trained layers and their weights learned from the entire training set are kept the same.

Therefore, the main contribution of this paper is to propose a semi-supervised learning approach that can improve the effectiveness of Convolutional Neural Networks and considers the two main paradigms of semi-supervised learning jointly, i.e., the proposed method takes into account both the spatial distribution of labeled and unlabeled samples and also propagates pseudo-labels to the unlabeled samples. We showed the proposed approach can outperform some state-of-the-art semi-supervised learning algorithms.

The remainder of this paper is organized as follows. Section 2 discusses the Optimum-Path Forest methodology for semi-supervised learning, and Section 3 presents the proposed approach to improve the performance of Convolutional Neural Networks. The experimental results are presented in Sections 4 and 5, and Section 6 states conclusions and future works.

Section snippets

Optimum-path forest

For a given training set with labeled and unlabeled sample subsets, one can devise unsupervised classifiers from the latter [35], supervised classifiers from the former [27], [30], [31], and semi-supervised classifiers from both [1], [2], [3]. In all approaches, one or multiple sequences of the Optimum-Path Forest algorithm [12] can be executed for different choices of weighted graphs G=(N,A,w) (Fig. 1a) and connectivity functions f, where N and A stand for the set of nodes and edges,

The proposed method

Deep neural networks have a considerable number of parameters used in the learning process, which requires a large number of supervised samples. Application examples vary from speech processing, automatic vowel classification, biometrics, well drilling monitoring, and medical image segmentation to object tracking [10], [15], [18], [26], [28], [29], [32]. Most of such applications face a limited amount of labeled samples, thus reflecting the quite expensive and time-consuming labeling task.

Experiments

In this section, we present the datasets, methodology, and discuss the experimental results.

Results

First, the classification results on the three datasets using only labeled samples (i.e., Z1l) for the CNN training (named SUPMNIST, SUPCIFAR10 and SUPCOR) are presented. Notice that the very same architecture was used in all experiments, i.e., supervised and semi-supervised learning. The primary goal is to compare the performance of the proposed work by applying the semi-supervised methodology for pseudo-label propagation. Therefore, we expect to obtain a significant improvement compared to

Conclusion

In this work, we showed how one can improve supervised learning for deep architectures using a semi-supervised methodology. Our method makes use of the unlabeled data with pseudo-labels propagated by the OPFSEMImst semi-supervised method. Thus, we created a CNN(1) network with all labeled and unlabeled samples (with the pseudo-labels) and further applyed a fine-tuning on a new CNN(2) network using only labeled samples with the weights learned from CNN(1).

Experimental results showed that the

Declaration of Competing Interest

None.

Acknowledgments

The authors are grateful to Corumbá Concessões S.A. (ANEEL PD-2262-1602/2016), CNPq grants 427968/2018-6 and 307066/2017-7, as well as FAPESP grants 2013/07375-0, 2014/12236-1, 2015/25739-4, and 2016/19403-6. This material is based upon work supported in part by funds provided by Intel AI Academy program under Fundunesp Grant No.2597.2017.

References (43)

  • Y. Bengio et al.

    Representation learning: a review and new perspectives

    IEEE Trans. Pattern Anal. Mach.Intell.

    (2013)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Proceedings of the 19th International Conference on Neural Information Processing Systems

    (2006)
  • A. Blum et al.

    Combining labeled and unlabeled data with co-training

    Proceedings of the Eleventh Annual Conference on Computational Learning Theory

    (1998)
  • C.C. Chang et al.

    LIBSVM: a library for support vector machines

    ACM Trans. Intell. Syst.Technol.

    (2011)
  • G. Chiachia et al.

    Infrared face recognition by optimum-path forest

    Systems, Signals and Image Processing, 2009. IWSSIP 2009. 16th International Conference on

    (2009)
  • K. Driessens et al.

    Using weighted nearest neighbor to benefit from unlabeled data

  • A.X. Falcão et al.

    The image foresting transform: theory, algorithms, and applications

    IEEE Trans. Pattern Anal. Mach.Intell.

    (2004)
  • M. Friedman

    A comparison of alternative tests of significance for the problem of m rankings

    Ann. Math. Stat.

    (1940)
  • G.E. Hinton et al.

    A fast learning algorithm for deep belief nets

    Neural Comput.

    (2006)
  • G. Huang et al.

    Semi-supervised and unsupervised extreme learning machines

    IEEE Trans. Cybern.

    (2014)
  • T. Joachims

    Transductive inference for text classification using support vector machines

    Proceedings of the Sixteenth International Conference on Machine Learning

    (1999)
  • Cited by (0)

    View full text