Elsevier

Pattern Recognition Letters

Volume 65, 1 November 2015, Pages 137-144
Pattern Recognition Letters

Split and merge algorithm for deep learning and its application for additional classes

https://doi.org/10.1016/j.patrec.2015.07.024Get rights and content

Highlights

  • We present a novel GA based feature extractor for network optimal initialization.

  • Our method selects more dominant feature extractor in merge phase using GA.

  • Results show improvements recognition performance in comparison with DBNs.

  • We also suggest a new approach for retraining additional classes as its application.

  • Our approach for retraining can add output classes at lower error rate than DBNs.

Abstract

In this paper, we propose a novel split training and merge algorithm for deep learning. The proposed algorithm improves recognition accuracy and suggests a new approach for retraining. The algorithm is motivated by the genetic algorithm (GA) and is composed of two procedures. The first procedure initializes two individual networks using deep belief networks (DBNs), and the second procedure merges the two networks using the GA. Biases and weights of the network that is trained using DBNs are represented as a matrix between each layer, and each row of this matrix is used as a chromosome in the merge procedure. To evaluate the performance, we conduct two set of experiments. The first set is to recognize accuracy of the proposed algorithm, and the second set is for a new retraining approach. The results show that the proposed algorithm has a lower average error rate (6.84 ± 4.57%) than the DBNs, and it can add classes at a lower average error rate (9.06 ± 6.17% and 10.17 ± 4.51%) without pre-training using the restrict Boltzmann machines (RBMs) for existing classes data.

Introduction

Back propagation (BP) is a typical learning method for deep neural networks (DNNs), and is used in conjunction with an optimization method such as gradient descent [17]. However, it has problems such as local minima, slow convergence speed as the number of hidden layers increase, and overfitting [9]. To overcome these problems, Hinton proposed deep belief networks (DBNs) with a pre-training method that uses restrict Boltzmann machines (RBMs) [7].

DBNs consist of a number of RBM layers trained in a greedy layer-wise manner [15]. RBMs are composed of one visible layer and one hidden layer. RBMs have a restricted condition where the neurons of the same layer are not connected to each other as they are connected in Boltzmann machines (BMs) [4], [18]. Because of such constraint, the training time in RBMs is reduced compared with BMs.

RBMs are trained using an unsupervised learning method for pre-training DBNs. A joint probability between the visible layer and the hidden layer is defined using an energy function in RBMs [11]. This probability distribution is used to obtain weights and biases between the visible layer and the hidden layer via the steepest descent method. To obtain the expected model values, this process starts from any random state of the visible units and takes a long time to perform the alternating Gibbs sampling [6]. However, Hinton assigned the states of the visible units to a training vector and performed alternating Gibbs sampling only once to achieve a faster learning procedure. This one-step contrastive divergence algorithm (CD-1) [8] generated satisfactory training results and was used for RBMs training.

DBNs solved the problems of the conventional DNNs by applying the RBMs pre-training to DNNs for network initialization and increased recognition performance. Hence, recently several approaches have been developed for deep learning that start with DBNs [12], [20].

Previous learning algorithms often require substantial efforts for hand-designing features. However, the RBMs can automatically learn features representations using an unsupervised learning method. After RBMs pre-training for network initialization, dominant features are selected in learning procedure. This process is important to networks performance for several reasons [3], which are as follows:

  • The features used for each layer’s input determine the search space to be explored in learning phase. The irrelevant and redundant features make search space larger, increase the learning time, and reduce the recognition performance.

  • If the selected feature does not contain all the information needed to determine pattern for distinguish classes, the recognition performance can be unsatisfactory.

  • Improperly selected irrelevant and redundant features may make the learning process ineffective.

therefore, to select more dominant features than the RBMs and improve performance of DBNs, we propose a split and merge algorithm for deep learning based on a genetic algorithm (GA). In the past, a study about combining the neural networks with GA have been proposed to improve classification accuracy [1]. However, this method is different from our proposed algorithm; our proposed algorithm use the GA as a feature extractor for neural network initial weights optimization, while this method use the GA as a post-processor for image segmentation after image processing using the neural networks.

Main idea of the proposed algorithm is that extractor for more dominant features can be selected in the merge phase using the GA after the split network training. The proposed algorithm is comprised of two procedures: split and merge. In the split process, the entire training data set is divided into two training data sets. These data sets are trained by DBNs. In the merge process, the two trained networks are merged using the GA.

In the merge phase, the proposed algorithm uses the existing trained information of networks, i.e., weights and biases. Thus, the training data or classes are added to the networks while the original DBNs algorithm is used to pre-train the entire data composed of original and additional data. Then, the proposed algorithm is created with new networks by only pre-training the additional data and using the existing network information. After pre-training the additional data, the result is merged with the existing network using the GA. Consequently, the proposed algorithm will be able to suggest a new approach for retraining.

Recently, there have been several studies on deep neural networks (DNNs). However, most proposed methods are dedicated to enhance the recognition rate for a single complete training set [16]. The goal of the current study is to evaluate a novel split training and merge algorithm designed to improve DBNs recognition performance and suggest a new approach for retraining additional data or classes.

Section snippets

GA process for the proposed algorithm

The proposed algorithm is based on the GA, which is the typical method that uses the principles of evolution to solve various problems. The GA generates solutions for optimization problems using techniques that are inspired by natural evolution such as inheritance, mutation, selection and crossover [5], [10]. These techniques are used to select dominant features in the proposed algorithms as follows:

  • Weights matrices and biases vectors between each layer of the network are used as chromosomes,

Proposed algorithm

The proposed algorithm is comprised of two procedures. First, two networks are initiated with their respective DBNs. Second, these two networks are merged using the GA. The DBNs input data are divided into two datasets: The set (Straining), used for training the networks, and the test set (Stest), which evaluates the performance of the networks in the test phase. It is important not to use the Stest to train the networks. In the proposed algorithm, Straining is divided into two subsets (S

Experiments and results

In this section, we describe two different experimental procedures and represent three experimental results for each network structure. First, we compared the recognition accuracy of the proposed algorithm with the original DBNs (experiment 1), then we studied a new approach for retraining using the proposed algorithm when new classes training data were added to the network (experiments 2 and 3). The experiments 2 and 3 were distinguished according to the type of their additional class. The

Conclusion

This study proposes a novel split and merge algorithm for deep learning. The trained networks with different training sets have different feature extractors as shown Fig. 9, and the proposed algorithm finds a more suitable combination of feature extractors for the entire training set while merging these different networks. To evaluate the proposed algorithm, we performed two types of experiments, and we found that the proposed algorithm had a lower error rate compared with the DBNs and could

Acknowledgment

This work was supported by the MSIP of Korea, under the C-ITRC (IITP-2015-H8601-15-1003) supervised by the IITP and Basic Science Research Program through the NRF of Korea funded by the Ministry of Education (2010–0020163).

References (20)

  • C. De Stefano et al.

    A ga-based feature selection approach with an application to handwritten character recognition

    Pattern Recogn. Lett.

    (2014)
  • C.-X. Zhang et al.

    Learning ensemble classifiers via restricted boltzmann machines

    Pattern Recogn. Lett.

    (2014)
  • M. Awad et al.

    Multicomponent image segmentation using a genetic algorithm and artificial neural network

    IEEE Geosci. Remote Sens. Lett.

    (2007)
  • O.E. David et al.

    Genetic algorithms for evolving deep neural networks

    Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion

    (2014)
  • Y. Freund et al.

    Unsupervised learning of distributions on binary vectors using two layer networks

    Proceedings of the Advances in Neural Information Processing Systems

    (1992)
  • D.E. Goldberg et al.

    Genetic algorithms and machine learning

    Mach. Learn.

    (1988)
  • G. Hinton

    A practical guide to training restricted boltzmann machines

    Momentum

    (2010)
  • G. Hinton et al.

    A fast learning algorithm for deep belief nets

    Neural Comput.

    (2006)
  • G.E. Hinton

    Training products of experts by minimizing contrastive divergence

    Neural Comput.

    (2002)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
There are more references available in the full text version of this article.

Cited by (3)

  • A non-fuzzy interferometric phase estimation algorithm based on modified Fully Convolutional Network

    2019, Pattern Recognition Letters
    Citation Excerpt :

    To reduce the detection error probability, Chen et al. [6] detect layovers based on eigenvalue decomposition, which combines image segmentation with local frequency estimation. Hitherto, deep learning [2] has been developing rapidly and widely used in target detection, which provides a new way to reduce the detection error probability. Convolutional neural network (CNN) [3] takes a layer-by-layer transfer learning mechanism, so it has strong mapping approximation ability and is able to classify pictures precisely.

This paper has been recommended for acceptance by Dr. Y. Liu.

View full text