Biased ReLU neural networks☆
Introduction
Deep neural networks (NN) have been successfully used in diverse domains, such as object detection and object classification [1], [2]. A key factor that contributes to the success of modern deep learning models is the use of the nonsaturated activation function (e.g., rectified linear unit) to replace its saturated counterpart (e.g., sigmoid and tanh), which solves the “exploding/vanishing gradient problem” [3].
However, the rectified linear unit (ReLU) neuron is not a perfect activation function. For example, the dying ReLU problem, in which ReLU neurons become inactive and only output 0 for any input, is a major challenge when training deep ReLU networks [4]. Because of this problem, ReLU neurons fail to capture the input information, and network performance is adversely affected. Therefore, many researchers have proposed variants of the ReLU neuron to improve network performance [5], [6], [7] et al.
In fact, the output of the ReLU neuron is a piecewise linear (PWL) function, which has been successfully applied to regression, classification and function approximation [8], [9], dynamic system modelling [10], [11], and time-series segmentation [12]. In this paper, we describe a new activation function called biased ReLU (BReLU), which is also a PWL function. Compared with the ReLU neuron, the BReLU neuron divides each dimension of the input variable into different subintervals. BReLU can be expressed as “”, where and represents the candidate set for the bias. Then, the BReLU neuron is introduced into the proposed BReLU neural network (BRNN) as the activation function. The BRNN divides the input space into a greater number of linear regions by using BReLU neurons. To set the bias in the BReLU neuron, the information of the mean and the variance of the input variables is needed, which is obtained through the batch normalization [25]. The bias parameters are not fixed, and they can track the distribution of input variables and change correspondingly. In the simulations conducted in this study, for the examples listed in Section 6, the performance of the BRNN is superior to that of the ReLU network.
The number of linear regions can be considered an indicator of model flexibility [13]. We use the framework developed in recent years (like [13], [14], [15]) to analyze the BRNN and suggest the upper bound for the maximum number of linear regions.
The remainder of this paper is organized as follows. In Section 2, we review the related work in the literature. Section 3 describes the proposed BReLU neuron. The BRNN and the linear regions in the BRNN are described in Sections 4 BReLU neural network, 5 Linear regions and parameters in BRNN, respectively. Our experimental results are given in Section 6. Finally, the conclusions of this study are presented in Section 7.
Section snippets
Related work
ReLU has been widely used as an activation function in deep networks, the output of which is , in which w and b are the weight and bias parameters, respectively. This is exactly the hinge function proposed by [8], the linear combination of which yields the model pf hinging hyperplanes (HH). In order to solve the problem that the HH model can not represent all the PWL functions of high dimensions, the model of generalized hinging hyperplanes (GHH) is proposed in [16], the basis of
BReLU function
The ReLU has been used widely used as an activation function in NNs. One advantage of the ReLU is its nonsaturated nonlinearity. In terms of training time with gradient descent, the ReLU with its nonsaturated nonlinearity is considerably faster than activation functions with saturated nonlinearity such as the sigmoid function [1]. Moreover, the derivative of the ReLU neuron can be implemented simply by thresholding a matrix of activations at zero, which cannot be done for the sigmoid
Structure
In this paper, we assume that the NN has input variables denoted by and one output variable denoted by y. Fig. 3 shows the structure of a BRNN. Suppose for each input variable, there are q BReLU neurons. Assume L is the number of hidden layers and each hidden layer has neurons, the activation functions of which are . Let be the matrix, where each row corresponds to the weights of q neurons in layer l. Let
Linear regions and parameters in BRNN
The complexity or expressiveness of NNs belonging to the family of PWL functions can be analyzed by considering how the network can partition the input space into an exponential number of linear regions [13], [14], [15]. The bounds on the number of linear regions for ReLU networks have been extensively studied. In this section, we also provide the upper bound for the maximum number of linear regions for BRNN.
Experiments
In this section, three UCI (UC Irvine Machine Learning Repository) datasets and 2 image classification benchmark problems are used to fully explore the performance of the proposed method. For the regression problems on the UCI datasets, we select the root-mean-square error (RMSE) or the mean square error (MSE) as the performance metric, depending on the corresponding reference. For the image classification problem based on datasets MINST and CIFAR 10, the classification accuracy of BRNN and
Conclusion
In this paper, we describe an activation function called BReLU and proposed a new neural network called BRNN. We have investigated the upper bound for the maximum number of linear regions in the BRNN. Compares with a ReLU network having the same number of parameters, the BRNN improves the network flexibility. Additionally, it achieves good results on regression problems and image classification problems. The experiments conducted herein verify the effectiveness of the proposed method.
CRediT authorship contribution statement
XingLong Liang: Data curation, Software, Validation, Writing - original draft, Visualization. Jun Xu: Conceptualization, Methodology, Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
XingLong Liang received the B.S. degree in Automation from YanShan University, Qinhuangdao, China, in 2018 and the M.S. degree in Control science and Engineering from Harbin Institute of Technology, Shenzhen, China, in 2020. His current research interests include piecewise linear model and nonlinear system identification.
References (29)
- et al.
Implementation of min–max mpc using hinging hyperplanes. application to a heat exchanger
Control Engineering Practice
(2004) - et al.
Model predictive control based on adaptive hinging hyperplanes model
Journal of Process Control
(2012) - et al.
Adaptive hinging hyperplanes and its applications in dynamic system identification
Automatica
(2009) - A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances...
- et al.
Rich feature hierarchies for accurate object detection and semantic segmentation
- X. Jin, C. Xu, J. Feng, Y. Wei, J. Xiong, S. Yan, Deep learning with s-shaped rectified linear activation units, in:...
- C.-D.B. Trottier L, Gigu P, Parametric exponential linear unit for deep convolutional neural networks, in: 16th IEEE...
- A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: Proc. icml, 2013,...
- et al.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification
- D.-A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units...
Hinging hyperplanes for regression, classification, and function approximation
IEEE Transactions on Information Theory
Hinging hyperplanes for time-series segmentation
IEEE Transactions on Neural Networks and Learning Systems
Cited by (32)
Establishment and validation of a relationship model between nozzle experiments and CFD results based on convolutional neural network
2023, Aerospace Science and TechnologyUAV remote sensing image stitching via improved VGG16 Siamese feature extraction network
2023, Expert Systems with ApplicationsConvex granules and convex covering rough sets
2023, Engineering Applications of Artificial IntelligenceCooperative Traffic Signal Control based on Biased ReLU Neural Network Approximation
2023, IFAC-PapersOnLineDeep learning approach to overcome signal fluctuations in SERS for efficient On-Site trace explosives detection
2023, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyCitation Excerpt :The overall computation time required for training the model was about one hour (on a normal PC) while the time for achieving the out-of-sample prediction was in few minutes. We believe that additional efforts improvements in the deep learning methodologies [41–46] can enhance the appropriateness of our model and pave way for reliable trace detection of explosives (and other hazardous molecules) in the field. We have developed a neural network model, NNAS, based on signal to noise ratio approach to increase the data collection efficiency in the SERS technique.
SCGRU: A general approach for identifying multiple classes of self-admitted technical debt with text generation oversampling
2023, Journal of Systems and Software
XingLong Liang received the B.S. degree in Automation from YanShan University, Qinhuangdao, China, in 2018 and the M.S. degree in Control science and Engineering from Harbin Institute of Technology, Shenzhen, China, in 2020. His current research interests include piecewise linear model and nonlinear system identification.
Jun Xu received her B.S. degree in Control Science and Engineering from Harbin Institute of Technology, Harbin, China, in 2005 and PhD degree in Control science and Engineering from Tsinghua University, China, in 2010. Currently, she is an associate professor in School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China. Her research interests include piecewise linear functions and their applications in machine learning, nonlinear system identification and control.
- ☆
This work is jointly supported by the National Natural Science Foundation of China (U1813224) and Science and Technology Innovation Committee of Shenzhen Municipality (JCYJ20170811155131785).