Split and merge algorithm for deep learning and its application for additional classes☆
Introduction
Back propagation (BP) is a typical learning method for deep neural networks (DNNs), and is used in conjunction with an optimization method such as gradient descent [17]. However, it has problems such as local minima, slow convergence speed as the number of hidden layers increase, and overfitting [9]. To overcome these problems, Hinton proposed deep belief networks (DBNs) with a pre-training method that uses restrict Boltzmann machines (RBMs) [7].
DBNs consist of a number of RBM layers trained in a greedy layer-wise manner [15]. RBMs are composed of one visible layer and one hidden layer. RBMs have a restricted condition where the neurons of the same layer are not connected to each other as they are connected in Boltzmann machines (BMs) [4], [18]. Because of such constraint, the training time in RBMs is reduced compared with BMs.
RBMs are trained using an unsupervised learning method for pre-training DBNs. A joint probability between the visible layer and the hidden layer is defined using an energy function in RBMs [11]. This probability distribution is used to obtain weights and biases between the visible layer and the hidden layer via the steepest descent method. To obtain the expected model values, this process starts from any random state of the visible units and takes a long time to perform the alternating Gibbs sampling [6]. However, Hinton assigned the states of the visible units to a training vector and performed alternating Gibbs sampling only once to achieve a faster learning procedure. This one-step contrastive divergence algorithm (CD-1) [8] generated satisfactory training results and was used for RBMs training.
DBNs solved the problems of the conventional DNNs by applying the RBMs pre-training to DNNs for network initialization and increased recognition performance. Hence, recently several approaches have been developed for deep learning that start with DBNs [12], [20].
Previous learning algorithms often require substantial efforts for hand-designing features. However, the RBMs can automatically learn features representations using an unsupervised learning method. After RBMs pre-training for network initialization, dominant features are selected in learning procedure. This process is important to networks performance for several reasons [3], which are as follows:
- •
The features used for each layer’s input determine the search space to be explored in learning phase. The irrelevant and redundant features make search space larger, increase the learning time, and reduce the recognition performance.
- •
If the selected feature does not contain all the information needed to determine pattern for distinguish classes, the recognition performance can be unsatisfactory.
- •
Improperly selected irrelevant and redundant features may make the learning process ineffective.
therefore, to select more dominant features than the RBMs and improve performance of DBNs, we propose a split and merge algorithm for deep learning based on a genetic algorithm (GA). In the past, a study about combining the neural networks with GA have been proposed to improve classification accuracy [1]. However, this method is different from our proposed algorithm; our proposed algorithm use the GA as a feature extractor for neural network initial weights optimization, while this method use the GA as a post-processor for image segmentation after image processing using the neural networks.
Main idea of the proposed algorithm is that extractor for more dominant features can be selected in the merge phase using the GA after the split network training. The proposed algorithm is comprised of two procedures: split and merge. In the split process, the entire training data set is divided into two training data sets. These data sets are trained by DBNs. In the merge process, the two trained networks are merged using the GA.
In the merge phase, the proposed algorithm uses the existing trained information of networks, i.e., weights and biases. Thus, the training data or classes are added to the networks while the original DBNs algorithm is used to pre-train the entire data composed of original and additional data. Then, the proposed algorithm is created with new networks by only pre-training the additional data and using the existing network information. After pre-training the additional data, the result is merged with the existing network using the GA. Consequently, the proposed algorithm will be able to suggest a new approach for retraining.
Recently, there have been several studies on deep neural networks (DNNs). However, most proposed methods are dedicated to enhance the recognition rate for a single complete training set [16]. The goal of the current study is to evaluate a novel split training and merge algorithm designed to improve DBNs recognition performance and suggest a new approach for retraining additional data or classes.
Section snippets
GA process for the proposed algorithm
The proposed algorithm is based on the GA, which is the typical method that uses the principles of evolution to solve various problems. The GA generates solutions for optimization problems using techniques that are inspired by natural evolution such as inheritance, mutation, selection and crossover [5], [10]. These techniques are used to select dominant features in the proposed algorithms as follows:
- •
Weights matrices and biases vectors between each layer of the network are used as chromosomes,
Proposed algorithm
The proposed algorithm is comprised of two procedures. First, two networks are initiated with their respective DBNs. Second, these two networks are merged using the GA. The DBNs input data are divided into two datasets: The set (Straining), used for training the networks, and the test set (Stest), which evaluates the performance of the networks in the test phase. It is important not to use the Stest to train the networks. In the proposed algorithm, Straining is divided into two subsets (S
Experiments and results
In this section, we describe two different experimental procedures and represent three experimental results for each network structure. First, we compared the recognition accuracy of the proposed algorithm with the original DBNs (experiment 1), then we studied a new approach for retraining using the proposed algorithm when new classes training data were added to the network (experiments 2 and 3). The experiments 2 and 3 were distinguished according to the type of their additional class. The
Conclusion
This study proposes a novel split and merge algorithm for deep learning. The trained networks with different training sets have different feature extractors as shown Fig. 9, and the proposed algorithm finds a more suitable combination of feature extractors for the entire training set while merging these different networks. To evaluate the proposed algorithm, we performed two types of experiments, and we found that the proposed algorithm had a lower error rate compared with the DBNs and could
Acknowledgment
This work was supported by the MSIP of Korea, under the C-ITRC (IITP-2015-H8601-15-1003) supervised by the IITP and Basic Science Research Program through the NRF of Korea funded by the Ministry of Education (2010–0020163).
References (20)
- et al.
A ga-based feature selection approach with an application to handwritten character recognition
Pattern Recogn. Lett.
(2014) - et al.
Learning ensemble classifiers via restricted boltzmann machines
Pattern Recogn. Lett.
(2014) - et al.
Multicomponent image segmentation using a genetic algorithm and artificial neural network
IEEE Geosci. Remote Sens. Lett.
(2007) - et al.
Genetic algorithms for evolving deep neural networks
Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion
(2014) - et al.
Unsupervised learning of distributions on binary vectors using two layer networks
Proceedings of the Advances in Neural Information Processing Systems
(1992) - et al.
Genetic algorithms and machine learning
Mach. Learn.
(1988) A practical guide to training restricted boltzmann machines
Momentum
(2010)- et al.
A fast learning algorithm for deep belief nets
Neural Comput.
(2006) Training products of experts by minimizing contrastive divergence
Neural Comput.
(2002)- et al.
Reducing the dimensionality of data with neural networks
Science
(2006)
Cited by (3)
A non-fuzzy interferometric phase estimation algorithm based on modified Fully Convolutional Network
2019, Pattern Recognition LettersCitation Excerpt :To reduce the detection error probability, Chen et al. [6] detect layovers based on eigenvalue decomposition, which combines image segmentation with local frequency estimation. Hitherto, deep learning [2] has been developing rapidly and widely used in target detection, which provides a new way to reduce the detection error probability. Convolutional neural network (CNN) [3] takes a layer-by-layer transfer learning mechanism, so it has strong mapping approximation ability and is able to classify pictures precisely.
On Adaptive Learning Framework for Deep Weighted Sparse Autoencoder: A Multiobjective Evolutionary Algorithm
2022, IEEE Transactions on Cybernetics
- ☆
This paper has been recommended for acceptance by Dr. Y. Liu.