Deep neural networks with Elastic Rectified Linear Units for object recognition
Introduction
Deep neural networks (DNNs) [1], [5], [7], [11], [32], [33], [34], [36], [38], [42], [43] have brought tremendous rise in performance to a variety of computer vision tasks, including image classification [12], [18], [27], [29], [39], object detection [4], [6], [9], [17], [28], [30], [47], [49] and image retrieval [44], [48]. This success is mainly attributed to advances in three aspects: more powerful network structures [24], better training strategies and effective regularization techniques against over fitting. Firstly, neural networks are becoming more and more powerful due to increased depth and width, along with sophisticated layer designing techniques. Secondly, training strategies such as stochastic gradient descent (SGD) [20], non-saturating nonlinear activation function and batch normalization (BN) [14] have proven to be effective ways of training deep networks. Lastly, Regularization techniques such as Dropout [18] could effectively combat overfiting.
Among these advances, non-saturating nonlinear activation function (e.g., Rectified Linear Unit (ReLU) [26]) is an important step toward a feasible deep neural network. Deep neural networks with ReLU train much faster than their equivalents with saturating nonlinearity. Fast learning has a great influence on the performance of large models trained on large datasets. ReLU simply keeps the positive part and prunes the negative part to zero. This non-linear function has superiority over saturating nonlinearity in that the derivative of the positive part is a constant value. Therefore, ReLU does not suffer from the vanishing gradients. Recently, there have been more efforts that concentrate on the study of non-saturating nonlinear activation functions. These methods can be categorized into two aspects. On the one hand, some approaches pay special attention to the negative part. They use fixed, or learnable, or randomized coefficients to control the slopes of the negative part. On the other hand, some other methods adopt more complex piecewise linear functions that are formulated by several learnable parameters to deal with the whole input.
In this paper, we propose a novel Elastic Rectified Linear Unit (EReLU) that allows the positive part to fluctuate during training stage. The motivation for this is clear: similar samples are likely to generate similar responses at the same place. Therefore, it is able to strengthen the robustness of networks by making the response fluctuate within a moderate range. In addition, we propose Elastic Parametric Rectified Linear Unit (EPReLU) by fusing EReLU and parametric ReLU (PReLU) [13]. That is, the positive part and the negative part are processed independently using two different activation methods. Further improvement is observed by adopting this compounded activation strategy.
The contributions and merits of this paper are summarized as follows. (1) We propose a novel Elastic Rectified Linear Unit (EReLU) that mainly deals with the positive part of input. EReLU makes the positive part fluctuate within a moderate range in each epoch during training stage, which strengthens the robustness of the network model. (2) We also propose a compounded activation strategy called Elastic Parametric Rectified Linear Unit (EPReLU) by fusing EReLU and parametric ReLU (PReLU). EReLU focuses on the positive part whereas PReLU deals with the negative part. EPReLU brings in a further accuracy gain. (3) We present a new training strategy to train deep neural networks with EPReLU. During training, EReLU and PReLU are alternately used instead of being adopted at the same time. This alternate updating method plays an important role in training network with EPReLU.
This paper is organized as follows. Section 2 reviews the related work. Section 3 presents the proposed method. The experimental results are given in Section 4. Finally, Section 5 concludes this paper.
Section snippets
Related work
Rectified Linear Unit (ReLU) [26], along with several other key components, contributes a lot to the recent success of deep neural networks. ReLU takes the place of saturating activation functions such as sigmoid and tanh that have been used for a long time. This is mainly attributed to its non-saturated property that is able to solve the so called vanishing gradient and at the same time accelerates the convergence speed. Since the advent of rectifier networks, much effort has been paid to
Proposed method
In this section, we first introduce two activation units: ReLU and PReLU, which helps illustrate our method. Next, we describe our Elastic Rectified Linear Unit (EReLU). Then, we present Elastic Parametric Rectified Linear Unit (EPReLU) by fusing EReLU and PReLU.
Experiments
We evaluate the proposed EReLU and EPReU on four standard benchmark datasets: CIFAR10 [19], CIFAR100 [19], SVHN [10] and ImageNet 2012 [18]. We compare EReLU and EPReLU with several other activation functions, including APL (Adaptive Piecewise Linear Unit) [2], SReLU (S-shaped Rectified Linear Unit) [16], PReLU (Parametric ReLU) [13], ELU (Exponential Linear Unit) [3], PELU (Parametric Exponential Linear Unit) [40]. We also compare our methods with other networks that have achieved the
Conclusion
In this paper, we have presented a novel activation unit called Elastic Rectified Linear Unit (EReLU). During training stage, EReLU allows the positive part of input to fluctuate within a moderate rang constrained by a uniform distribution. This kind of fluctuation makes the network robust to the variation within the same category. Furthermore, we have proposed Elastic Parametric Rectified Linear Unit (EPReLU) by fusing EReLU and PReLU. EPReLU is able to further improve the performance via a
Xiaoheng Jiang received the B.S.degree and M.S. degree in electronic engineering from the Tianjin University, China, in 2010 and 2013, respectively. He is currently a Ph.D. candidate in the Tianjin University and his supervisor is Professor Yanwei Pang. His research interests include deep learning, object detection, and image analysis.
References (49)
- et al.
Hierarchical autoassociative polynimial network (HAP net) for pattern recognition
Neurocomputing
(2017) - et al.
Rich feature hierarchies for accurate object detection and semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2014) - et al.
Maxout networks
CoRR
(2013) - et al.
Deeply supervised nets
CoRR
(2014) - et al.
Generalized extreme learning machine autoencoder and a new deep neural network
Neurocomputing
(2017) - et al.
Training very deep networks
Proceedings of the Advances in Neural Information Processing Systems
(2015) - et al.
Matconvnet: convolutional neural networks for matlab
Proceedings of the ACM Conference on Multimedia Conference
(2015) - et al.
Traffic sign detection and recognition using fully convolutional network guided proposals
Neurocomputing
(2016) - et al.
Learning activation functions to improve deep neural networks
CoRR
(2014) - et al.
Fast and accurate deep network learning by exponential linear units (ELUS)
CoRR
(2015)
Pedestrian detection inspired by appearance constancy and shape symmetry
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fast learning in deep neural networks
Neurocomputing
Deep learning for visual understanding: a review
Neurocomputing
Fast R-CNN
Proceedings of the IEEE International Conference on Computer Vision
Multi-digit number recognition from street view imagery using deep convolutional neural networks
CoRR
Multimodal deep autoencoder for human pose recovery
IEEE Trans. Image Process.
Deep residual learning for image recognition
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recogniton
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
Proceedings of the IEEE Conference on Computer Vision
Batch normalization: accelerating deep network training by reducing internal covariate shift
CoRR
Cascaded subpatch networks for effective CNNs
CoRR
Deep learning with s-shaped rectified linear activation units
CoRR
Speed up deep neural network based pedestrian detection by sharing features across multi-scale models
Neurocomputing
Imagenet classification with deep convolutional neural networks
Proceedings of the Advances in Neural Information Processing Systems
Learning multiple layers of features from tiny images
Cited by (91)
A comparative study on bayes classifier for detecting photovoltaic module visual faults using deep learning features
2024, Sustainable Energy Technologies and AssessmentsFirePred: A hybrid multi-temporal convolutional neural network model for wildfire spread prediction
2023, Ecological InformaticsA 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification
2023, Microprocessors and MicrosystemsActivation functions in deep learning: A comprehensive survey and benchmark
2022, Neurocomputing
Xiaoheng Jiang received the B.S.degree and M.S. degree in electronic engineering from the Tianjin University, China, in 2010 and 2013, respectively. He is currently a Ph.D. candidate in the Tianjin University and his supervisor is Professor Yanwei Pang. His research interests include deep learning, object detection, and image analysis.
Yanwei Pang received the Ph.D. degree in electronic engineering from the University of Science and Technology of China in 2004. Currently, he is a professor of the Tianjin University, China. His research interests include object detection, image processing, and deep learning, in which he has published more than 100 scientific papers including 30 IEEE Transactions papers. He was an associate editor and guest editor of the Neurocomputing, International Journal of Image & Graphics, International Journal of Computer Mathematics.
Xuelong Li is a full professor with the Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, PR China.
Jing Pan received her B.S degree in Mechanical Engineering from the North China Institute of Technology (now North University of China), Taiyuan, China, in 2002, and her M.S degree in Precision Instrument and Mechanism from the University of Science and Technology of China, Hefei, China, in 2007. She is currently a Lecturer with the School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin, China. Meanwhile, she is pursuing her Ph.D. degree in the Tianjin University, China. Her research interests include computer vision and pattern recognition.
Yinghong Xie Received her B.S degree from the Shenyang Jianzhu University, Shenyang, China, in 1999 and her M.S degree and Ph.D. degree both from the Northeastern University. She has been a Postdoctoral in the Tianjin University. She is currently an associate professor with the Shenyang University. Her search interests include image processing, object tracking, and machine learning.