Elsevier

Neurocomputing

Volume 423, 29 January 2021, Pages 71-79
Neurocomputing

Biased ReLU neural networks

https://doi.org/10.1016/j.neucom.2020.09.050Get rights and content

Abstract

Neural networks (NN) with rectified linear units (ReLU) have been widely implemented since 2012. In this paper, we describe an activation function called the biased ReLU neuron (BReLU), which is similar to the ReLU. Based on this activation function, we propose the BReLU NN (BRNN). The structure of the BRNN is similar to that of the ReLU network. However, the difference between the two is that the BReLU introduces several biases for each input variable. This allows the BRNN to divide the input space into a greater number of linear regions and improve network flexibility. The BRNN parameters to be estimated are the weight matrices and the bias parameters of the BReLU neurons. The weights are obtained using the backpropagation method. Moreover, we propose a method to compute the bias parameters of the BReLU neurons. In this method, batch normalization is applied to the BRNN, and the variance and mean of the input variables are obtained. Based on these two parameters, the bias parameters are estimated. In addition, we investigate the flexibility of the BRNN. Specifically, we study the number of linear regions and provide the upper bound for the maximum number of linear regions. The results indicate that for the same input dimension, the BRNN divides the input space into a greater number of linear regions than the ReLU network. This explains to a certain extent why the BRNN has the superior approximation ability. Experiments are carried out using five datasets, and the results verify the effectiveness of the proposed method.

Introduction

Deep neural networks (NN) have been successfully used in diverse domains, such as object detection and object classification [1], [2]. A key factor that contributes to the success of modern deep learning models is the use of the nonsaturated activation function (e.g., rectified linear unit) to replace its saturated counterpart (e.g., sigmoid and tanh), which solves the “exploding/vanishing gradient problem” [3].

However, the rectified linear unit (ReLU) neuron is not a perfect activation function. For example, the dying ReLU problem, in which ReLU neurons become inactive and only output 0 for any input, is a major challenge when training deep ReLU networks [4]. Because of this problem, ReLU neurons fail to capture the input information, and network performance is adversely affected. Therefore, many researchers have proposed variants of the ReLU neuron to improve network performance [5], [6], [7] et al.

In fact, the output of the ReLU neuron is a piecewise linear (PWL) function, which has been successfully applied to regression, classification and function approximation [8], [9], dynamic system modelling [10], [11], and time-series segmentation [12]. In this paper, we describe a new activation function called biased ReLU (BReLU), which is also a PWL function. Compared with the ReLU neuron, the BReLU neuron divides each dimension of the input variable into different subintervals. BReLU can be expressed as “max(0,x-bk)”, where bkSB and SB represents the candidate set for the bias. Then, the BReLU neuron is introduced into the proposed BReLU neural network (BRNN) as the activation function. The BRNN divides the input space into a greater number of linear regions by using BReLU neurons. To set the bias in the BReLU neuron, the information of the mean and the variance of the input variables is needed, which is obtained through the batch normalization [25]. The bias parameters are not fixed, and they can track the distribution of input variables and change correspondingly. In the simulations conducted in this study, for the examples listed in Section 6, the performance of the BRNN is superior to that of the ReLU network.

The number of linear regions can be considered an indicator of model flexibility [13]. We use the framework developed in recent years (like [13], [14], [15]) to analyze the BRNN and suggest the upper bound for the maximum number of linear regions.

The remainder of this paper is organized as follows. In Section 2, we review the related work in the literature. Section 3 describes the proposed BReLU neuron. The BRNN and the linear regions in the BRNN are described in Sections 4 BReLU neural network, 5 Linear regions and parameters in BRNN, respectively. Our experimental results are given in Section 6. Finally, the conclusions of this study are presented in Section 7.

Section snippets

Related work

ReLU has been widely used as an activation function in deep networks, the output of which is max{0,wTx+b}, in which w and b are the weight and bias parameters, respectively. This is exactly the hinge function proposed by [8], the linear combination of which yields the model pf hinging hyperplanes (HH). In order to solve the problem that the HH model can not represent all the PWL functions of high dimensions, the model of generalized hinging hyperplanes (GHH) is proposed in [16], the basis of

BReLU function

The ReLU max{0,x} has been used widely used as an activation function in NNs. One advantage of the ReLU is its nonsaturated nonlinearity. In terms of training time with gradient descent, the ReLU with its nonsaturated nonlinearity is considerably faster than activation functions with saturated nonlinearity such as the sigmoid function [1]. Moreover, the derivative of the ReLU neuron can be implemented simply by thresholding a matrix of activations at zero, which cannot be done for the sigmoid

Structure

In this paper, we assume that the NN has n0 input variables denoted by x=[x1,x2,,xn0] and one output variable denoted by y. Fig. 3 shows the structure of a BRNN. Suppose for each input variable, there are q BReLU neurons. Assume L is the number of hidden layers and each hidden layer l{1,2,,L} has nl neurons, the activation functions of which are hl={h1l,h2l,,hnll}. Let Wl be the (nl/q)×nl-1 matrix, where each row corresponds to the weights of q neurons in layer l. Letbl=[b1l,b2l,,bql]=SBl

Linear regions and parameters in BRNN

The complexity or expressiveness of NNs belonging to the family of PWL functions can be analyzed by considering how the network can partition the input space into an exponential number of linear regions [13], [14], [15]. The bounds on the number of linear regions for ReLU networks have been extensively studied. In this section, we also provide the upper bound for the maximum number of linear regions for BRNN.

Experiments

In this section, three UCI (UC Irvine Machine Learning Repository) datasets and 2 image classification benchmark problems are used to fully explore the performance of the proposed method. For the regression problems on the UCI datasets, we select the root-mean-square error (RMSE) or the mean square error (MSE) as the performance metric, depending on the corresponding reference. For the image classification problem based on datasets MINST and CIFAR 10, the classification accuracy of BRNN and

Conclusion

In this paper, we describe an activation function called BReLU and proposed a new neural network called BRNN. We have investigated the upper bound for the maximum number of linear regions in the BRNN. Compares with a ReLU network having the same number of parameters, the BRNN improves the network flexibility. Additionally, it achieves good results on regression problems and image classification problems. The experiments conducted herein verify the effectiveness of the proposed method.

CRediT authorship contribution statement

XingLong Liang: Data curation, Software, Validation, Writing - original draft, Visualization. Jun Xu: Conceptualization, Methodology, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

XingLong Liang received the B.S. degree in Automation from YanShan University, Qinhuangdao, China, in 2018 and the M.S. degree in Control science and Engineering from Harbin Institute of Technology, Shenzhen, China, in 2020. His current research interests include piecewise linear model and nonlinear system identification.

References (29)

  • L. Breiman

    Hinging hyperplanes for regression, classification, and function approximation

    IEEE Transactions on Information Theory

    (1993)
  • X. Huang, J. Xu, S. Wang, Operation optimization for centrifugal chiller plants using continuous piecewise linear...
  • X. Huang et al.

    Hinging hyperplanes for time-series segmentation

    IEEE Transactions on Neural Networks and Learning Systems

    (2013)
  • R. Pascanu, G. Montufar, Y. Bengio, On the number of response regions of deep feed forward networks with piece-wise...
  • Cited by (32)

    • Convex granules and convex covering rough sets

      2023, Engineering Applications of Artificial Intelligence
    • Deep learning approach to overcome signal fluctuations in SERS for efficient On-Site trace explosives detection

      2023, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
      Citation Excerpt :

      The overall computation time required for training the model was about one hour (on a normal PC) while the time for achieving the out-of-sample prediction was in few minutes. We believe that additional efforts improvements in the deep learning methodologies [41–46] can enhance the appropriateness of our model and pave way for reliable trace detection of explosives (and other hazardous molecules) in the field. We have developed a neural network model, NNAS, based on signal to noise ratio approach to increase the data collection efficiency in the SERS technique.

    View all citing articles on Scopus

    XingLong Liang received the B.S. degree in Automation from YanShan University, Qinhuangdao, China, in 2018 and the M.S. degree in Control science and Engineering from Harbin Institute of Technology, Shenzhen, China, in 2020. His current research interests include piecewise linear model and nonlinear system identification.

    Jun Xu received her B.S. degree in Control Science and Engineering from Harbin Institute of Technology, Harbin, China, in 2005 and PhD degree in Control science and Engineering from Tsinghua University, China, in 2010. Currently, she is an associate professor in School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China. Her research interests include piecewise linear functions and their applications in machine learning, nonlinear system identification and control.

    This work is jointly supported by the National Natural Science Foundation of China (U1813224) and Science and Technology Innovation Committee of Shenzhen Municipality (JCYJ20170811155131785).

    View full text