A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited–memory BFGS optimization algorithms

doi:10.1016/j.neucom.2017.05.061

Neurocomputing

Volume 266, 29 November 2017, Pages 506-526

https://doi.org/10.1016/j.neucom.2017.05.061 Get rights and content

Abstract

Working up with deep learning techniques requires profound understanding of the mechanisms underlying the optimization of the internal parameters of complex structures. The major factor limiting this understanding is that there exist only a few optimization methods such as gradient descent and Limited–memory Broyden–Fletcher–Goldfarb–Shannon (L-BFGS) to find the best local minima of the problem space for these complex structures such as deep neural network (DNN). Therefore, in this paper, we represent a new training approach named hybrid artificial bee colony based training strategy (HABCbTS) to tune the parameters of a DNN structure, which includes one or more autoencoder layers cascaded to a softmax classification layer. In this strategy, a derivative-free optimization algorithm “ABC” is combined with a derivative-based algorithm “L-BFGS” to construct “HABC”, which is used in the HABCbTS. Detailed simulation results supported by statistical analysis show that the proposed training strategy results in better classification performance compared to the DNN classifier trained with the L-BFGS, ABC and modified ABC. The obtained classification results are also compared with the state-of-the-art classifiers, including MLP, SVM, KNN, DT and NB on 15 data sets with different dimensions and sizes.

Introduction

Deep neural networks (DNNs) were first proposed in early 1980s [1] and in 2002 [2]. However, due to training difficulties in deep architectures, DNNs could not receive the attention that they deserve until the years 2006 and 2007 [3], [4], [5]. Nowadays, DNNs are employed in a wide variety of application areas, such as speech processing [6], pattern recognition [7], classification [8], medical applications [9], [10], [11], etc. [12]. The main reason of this popularity is the attractive properties of DNNs such as good classification performance for complex data sets, applicability to large size data (also for big data), well-established design criteria for complex systems and so on [13], [14]. Besides, the most significant property of DNNs is the capability of producing most convenient features in classification of data sets. In practice, however, it has been proved that the training of a DNN structure is a challenging task [15].

In the literature, there are many techniques to optimize a function [16], [17] or to train different types of network based models [18], however, only a few of these techniques are suitable to find the local minima of a DNN [19]. Derivative-based optimization algorithms including gradient descent (GD), stochastic gradient descent (SGD), limited memory Broyden–Fletcher–Goldfarb–Shannon (L-BFGS) and conjugate gradient (CG) algorithms are generally used to find the optimum parameter values of the DNNs and variants [19]. Although the GD is usually preferred due to its simplicity and speed benefits, it can produce good results only if the initialization of the parameters are close to an optimum solution [3]. Besides, the SGD has better optimization performance than the GD because the SGD needs fewer training samples to optimize the DNN. However, the SGD has several tuning parameters such as learning rate, convergence criteria, etc. Moreover, the L-BFGS and CG utilize approximate second-order derivative information with the goal of dealing with nonlinearity and ill-conditioning. Therefore, the L-BFGS and CG have relatively better performances than the SGD and GD, while the L-BFGS being slightly better than the CG for training of the DNN [19].

The above-mentioned optimization algorithms have more or less some drawbacks such as getting trapped at a local minima, slow convergence rates, complex mathematical backgrounds, and instability. Furthermore, the performance of these algorithms rely on one or more tuning parameters. These parameters are heuristically chosen and externally supplied by the user, since there is no analytical method to determine their optimal values.

In addition to the derivative-based optimization algorithms discussed above, a lot of derivative-free optimization algorithms such as genetic algorithm and artificial bee colony (ABC) algorithm have also been presented to solve optimization problems like training neural network (NN) [20], [21], [22], [23]. Although derivative-free optimization algorithms are highly competitive or sometimes superior to the derivative-based optimization algorithms for the optimization of the complex problems and models, especially for NNs, their performances are usually not very satisfactory to find the optimum values of the weights of DNN structures due to high dimensionality. Compared to the other derivative-free optimization algorithms, the ABC is a highly efficient optimization algorithm to train the DNNs because the ABC has fewer tuning parameters, convergence speed benefits, and better performance than many other derivative-free optimization algorithms [24].

In this paper, in order to overcome the above-mentioned problems, the combination of a derivative-based optimization algorithm “L-BFGS” and a derivative-free optimization algorithm “ABC” is proposed to optimize DNN structures consisting of one or more autoencoders with a softmax classification layer. The proposed hybrid optimization strategy is named Hybrid Artificial Bee Colony based Training Strategy (HABCbTS) to train the DNN. The HABCbTS offers the ability of L-BFGS’s exploring local minima of search space and the capability of ABC’s discovering new possible solution candidates, which are far away from current point where the algorithm is trapped. The HABCbTS is tested with benchmark data sets. Simulation results supported with statistical analysis show that the HABCbTS is not only better than the DNN trained with the ABC but also better than the DNN trained with the L-BFGS. To the best of our knowledge, our method is the first one to study the optimization of the DNN based structures with a hybrid optimization method.

The rest of the paper is organized as follows: In Section 2, we briefly overview the deep learning techniques and formulate the autoencoder, stacked autoencoder and softmax classification unit. In Section 3, we describe the proposed strategy for training of the DNN. In Section 4, experimental results are presented and some aspects of the proposed strategy are discussed by interpreting its time complexity and statistical analysis. In Section 5, the paper is concluded.

Section snippets

Deep learning

Deep learning techniques have the ability to extract the high-level features from input of a data set [12]. Features obtained from deep neural network can be utilized to boost the performance of the classification algorithms. One of the most commonly used deep learning tools is a deep neural network classifier structure which is constructed by combining a stacked autoencoder networks with a softmax classifier layer [25], [26].

Proposed hybrid artificial bee colony based training strategy

In this section, we proposed an hybrid artificial bee colony based training strategy (HABCbTS) for training of the DNN. The HABCbTS offers the capability of L-BFGS’s searching high dimensional space and the ability of ABC’s discovering new possible candidate solutions. The remainder of this section presents the detail explanation concerning algorithms, including L-BFGS and HABC used in HABCbTS.

Experimental results and discussion

In this section, to demonstrate the viability of our approach, the DNN is trained with four different strategies including, the L-BFGSbTS, ABCbTS, MABCbTS and HABCbTS. The classification performance of the trained DNN with the HABCbTS is tested with several simulations and compared with the classification performance of the trained DNN with the L-BFGSbTS, ABCbTS and MABCbTS and the classification performance of the state-of-the-art classifiers, including multi-layer perceptron (MLP), support

Conclusion

In this study, a new training strategy based on a hybrid optimization method called HABCbTS is proposed to tune the internal parameters of the DNN. The HABCbTS is based on the HABC, which combines the evolutionary optimization algorithm the ABC and derivative-based method the L-BFGS. In order to show the superiority of the HABCbTS, the different strategies including L-BFGSbTS, ABCbTS and MABCbTS are presented to optimize the internal parameter of the DNN. The performance of the HABCbTS is

Hasan BADEM received B.Sc. degree in 2009 from the Department of Education of Computer and Control Technology at Marmara University and M.Sc. degree in 2012 from Education of Computer, Sutcu Imam University. He is currently Ph.D. candidate in Erciyes University, Department of Computer Engineering, Kayseri, Turkey. His research interests include machine learning in biomedical applications, parallel and distributed computations.

References (50)

G.E. Hinton
Connectionist Learning Procedures
Artif. Intel.
(1989)
P.E. Utgoff et al.
Many-Layered Learning
Neural Comput.
(2002)
G.E. Hinton et al.
Reducing the dimensionality of data with neural networks
Science
(2006)
Y. Bengio et al.
Greedy Layer-Wise Training of Deep Networks
Proceedings of 19th the Advances in Neural Information Processing Systems
(2007)
Y. Bengio et al.
Scaling learning algorithms towards AI
Large-Scale Kernel Mach.
(2007)
HuangZ. et al.
A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
Neurocomputing
(2016)
WuG. et al.
Regional deep learning model for visual tracking
Neurocomputing
(2016)
YuS. et al.
Convolutional neural networks for hyperspectral image classification
Neurocomputing
(2017)
H. Badem et al.
Classification and diagnosis of the Parkinson disease by stacked autoencoder
Proceedings of 9th National Conference on Electrical and Electronics Engineering, ELECO
(2016)
H. Badem et al.
Classification of human activity by using a stacked autoencoder
Proceedings of Medical Technologies National Conference (TIPTEKNO)
(2016)

ZengN. et al.

Deep belief networks for quantitative analysis of a gold immunochromatographic strip

Cognitive Comput.

(2016)

Y. LeCun et al.

Deep learning

Nature

(2015)

Y. Guo et al.

Deep learning for visual understanding: a review

Neurocomputing

(2016)

LiuW. et al.

A survey of deep neural network architectures and their applications

Neurocomputing

(2017)

P. Vincent et al.

Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th International Conference on Machine Learning

(2008)

N. Zeng et al.

A hybrid EKF and switching PSO algorithm for joint state and parameter estimation of lateral flow immunoassay models

IEEE/ACM Trans. Comput. Biol. Bioinform.

(2012)

N. Zeng et al.

A novel switching delayed PSO algorithm for estimating unknown parameters of lateral flow immunoassay

Cognitive Comput.

(2016)

ZengN. et al.

A switching delayed PSO optimized extreme learning machine for short-term load forecasting

Neurocomputing

(2017)

J. Ngiam et al.

On optimization methods for deep learning

Proceedings of the 28th International Conference on Machine Learning (ICML-11)

(2011)

A. Basturk et al.

Performance analysis of the coarse-Grained Parallel model of the artificial bee colony algorithm

Inform. Sci.

(2013)

C. Ozkan et al.

The artificial bee colony algorithm in training artificial neural network for oil spill detection

Neural Netw. World

(2011)

D. Karaboga et al.

Neural networks training by artificial bee colony algorithm on pattern classification

Neural Netw. World

(2009)

F.H.F. Leung et al.

Tuning of the structure and parameters of a neural network using an improved genetic algorithm

IEEE Trans. Neural Netw.

(2003)

D. Karaboga et al.

A comparative study of artificial bee colony algorithm

Appl. Math. Comput.

(2009)

M. Ranzato et al.

Efficient learning of sparse representations with an energy-based model

Advances in Neural Information Processing Systems

(2006)

Cited by (0)

Alper BASTURK received his B.S. degree in Electronics Engineering from Erciyes University, Kayseri, Turkey, in July-1998. He then joined as a research assistant to the Dept. of Electronics Eng. of Erciyes University. He received his M.S. and Ph.D. degrees in both of Electronics Engineering from Erciyes University in August-2001 and November-2006, respectively. In 2006, he joined the Computer Hardware Division, Department of Computer Engineering, Erciyes University, where he is currently an Associate Professor. Between 2010 and 2011, he was a visiting scholar to the Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA. He guest-edited several special issues for various journals and has published more than 50 articles in leading journals and conferences. His research areas are digital signal and image processing, machine learning, deep learning, neural networks, fuzzy and neuro-fuzzy systems, intelligent optimization and applications of these techniques.

Abdullah CALISKAN received B.Sc. degree in 2011 from the Department of Electrical-Electronical Engineering at Gaziantep University. He is currently Ph.D. candidate in Erciyes University, Department of Biomedical Engineering, Kayseri, Turkey. His research interests include machine learning in biomedical applications.

Mehmet Emin YUKSEL received the B.S. degree in electronics and communications engineering from Istanbul Technical University, Istanbul, Turkey, in 1990, and the M.S. and Ph.D. degrees in electronics engineering from Erciyes University, Kayseri, Turkey, in 1993 and 1996, respectively. In 2012, he joined the Department of Biomedical Engineering, Erciyes University, where he is currently a Professor. Between March and December 1995, he was an Academic Visitor to the Signal Processing Section, Department of Electrical Engineering, Imperial College, London, U.K. He guest-edited several special issues for various journals. His current research interests include signal processing, image processing, neural networks, fuzzy systems, deep learning, and applications of these techniques.

View full text

Neurocomputing

A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited–memory BFGS optimization algorithms

Abstract

Introduction

Section snippets

Deep learning

Proposed hybrid artificial bee colony based training strategy

Experimental results and discussion

Conclusion

Connectionist Learning Procedures

Artif. Intel.

Many-Layered Learning

Neural Comput.

Reducing the dimensionality of data with neural networks

Science

Greedy Layer-Wise Training of Deep Networks

Proceedings of 19th the Advances in Neural Information Processing Systems

Scaling learning algorithms towards AI

Large-Scale Kernel Mach.

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

Neurocomputing

Regional deep learning model for visual tracking

Neurocomputing

Convolutional neural networks for hyperspectral image classification

Neurocomputing

Classification and diagnosis of the Parkinson disease by stacked autoencoder

Proceedings of 9th National Conference on Electrical and Electronics Engineering, ELECO

Classification of human activity by using a stacked autoencoder

Proceedings of Medical Technologies National Conference (TIPTEKNO)

Deep belief networks for quantitative analysis of a gold immunochromatographic strip

Cognitive Comput.

Deep learning

Nature

Deep learning for visual understanding: a review

Neurocomputing

A survey of deep neural network architectures and their applications

Neurocomputing

Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th International Conference on Machine Learning

A hybrid EKF and switching PSO algorithm for joint state and parameter estimation of lateral flow immunoassay models

IEEE/ACM Trans. Comput. Biol. Bioinform.

A novel switching delayed PSO algorithm for estimating unknown parameters of lateral flow immunoassay

Cognitive Comput.

A switching delayed PSO optimized extreme learning machine for short-term load forecasting

Neurocomputing

On optimization methods for deep learning

Proceedings of the 28th International Conference on Machine Learning (ICML-11)

Performance analysis of the coarse-Grained Parallel model of the artificial bee colony algorithm

Inform. Sci.

The artificial bee colony algorithm in training artificial neural network for oil spill detection

Neural Netw. World

Neural networks training by artificial bee colony algorithm on pattern classification

Neural Netw. World

Tuning of the structure and parameters of a neural network using an improved genetic algorithm

IEEE Trans. Neural Netw.

A comparative study of artificial bee colony algorithm

Appl. Math. Comput.

Efficient learning of sparse representations with an energy-based model

Advances in Neural Information Processing Systems