Application of constrained learning in making deep networks more transparent, regularized, and biologically plausible

https://doi.org/10.1016/j.engappai.2019.06.022Get rights and content

Abstract

Constrained learning has numerous applications and advantages, especially in circumstances in which hardware implementation imposes some constraints on us by biological justifications. Making neural network comprehensible, faster convergence and learning general properties are of other advantages of constrained learning. In this article we have tried to use constrained learning to reach more plausibility biologically. We will demonstrate that not only does the proposed model have advantages of the previously proposed models potentially, but also it can be used as a technique for regularization of neural network weights and faster convergence. Finally, having used an ensemble method of different networks, the result for MNIST dataset with data augmentation is 99.81% and the results for CIFAR-10 and SVHN datasets without data augmentation are 93.4% and 98.19% respectively.

Introduction

Deep neural networks and the learning algorithm stochastic gradient descent, in particular, are of remarkable achievements in machine learning. These networks can be considered as tools which were biologically inspired and demonstrate the importance of bio-computing techniques. Deep neural networks use stochastic gradient descent algorithm to learn and this algorithm has been highly developed under biological inspirations.

Constrained learning is to impose constraints on the free parameters of the model through learning algorithm (which means optimization algorithm Perantonis and Karras, 1995, Zhang et al., 2018) in its learning phase. Actually, the imposed constraint will not be able to change the structure of the model and just influences the way which the free parameters of the model are determined through.

Using constrained learning and SGD, this article is trying to analyze some aspects of the deep neural networks which make the network more transparent and more biologically plausible. The goal of this article is not that the neural network records constraints, instead, we applied a procedure in which the learning algorithm determines the free parameters under a specific constraint.

Note that in this article imposing constraints on neural networks has advantages like faster convergence and regularization. Regularization is a process (like including additional information Miyato et al., 2018 or including randomness like DropBlock technique Ghiasi et al., 2018) in which the gap between the network error on the test dataset(test error) and the network error on the training dataset(training error) is reduced while the training error is lower than the test error and is close to zero

Using constrained learning, especially in neural networks, has numerous reasons. One would be training neural networks under constraints which are imposed on us by hardware implementation (Yi et al., 2008, Courbariaux et al., 2016, Plagianakos and Vrahatis, 1999, McDonnell, 1992). Decreasing power consumption (Courbariaux et al., 2016), increasing the speed of calculation and accessibility to memory (Courbariaux et al., 2016), and simplifying hardware implementation are of these reasons (Yi et al., 2008, Plagianakos and Vrahatis, 1999).

Another important motivation is dealing with learning invariance in patterns (Le et al., 2010). Computing two-dimensional polynomial (Perantonis et al., 1998), increasing the level of generality and faster convergence (Han et al., 2008), showing the resistance of deep neural networks by making it binary, disarranging and quantizing weights (Merolla et al., 2016) and preventing interference and maintaining prior knowledge (Di Muro and Ferrari, 2008) are of other motivations for dealing with constrained learning in neural networks.

There are two other reasons and motivations for dealing with constrained learning in neural networks which this article is in search of. One of these two is making neural networks more comprehensible and transparent (Ayinde and Zurada, 2017, Chorowski and Zurada, 2015). In other words, features which are mined through constrained learning in deep neural networks become more comprehensible than when there are no constraints. The second reason and motivation is related to biological subjects in neuro-computing. The latter motivation, aiming at unsupervised learning and modeling perception and cognition, has been followed by Testolin et al. (2017). There have been some studies on weight bounding or saturation limit aiming at controlling synapse changes and preventing instability of algorithms which are based on STDP (Humble et al., 2011).

This article is following inspiration behind biological neurons or synapses, with having their limits in mind, so as to move toward creating neural networks with biological plausibility. Additionally, these limits lead to making artificial neural networks more transparent and comprehensible. It is also possible that these limits result in a new way which increases the speed of the convergence of artificial neural networks in some circumstances.

In biological neural networks all synapses are bounded in terms of synaptic plasticity. In other words, their action potential cannot reinforce the post synaptic potential infinitely (Dan and Poo, 2004). This feature in biological synapses can be considered as a constraint on synaptic weights of artificial neural networks. The other constraint is related to the cell membrane. Membrane potential cannot be increased infinitely either. This value in biological neurons, representing the kind, is ranged from −80 mV to +40 mV. On the other hand, there have been several studies in the field of computation in biological neural circuits which demonstrates the cohesion of probabilistic inference in these circuits (Gerstner and Kistler, 2002). These probabilistic computations in biological neural circuits can be considered as a kind of constraint in computational values as well.

Having these cases about biological neural networks in mind, this article is trying to impose such constraints on artificial neural networks through ways for constraining the learning of the free parameters.

Numerous methods have been proposed on applying constraints to learning parameters through learning algorithm. Two algorithms are often used to impose constraint on free parameters of artificial neural networks during the training phase:

  • 1.

    random search based

  • 2.

    gradient descent

In McDonnell (1992), an evolutionary algorithm has been used in order to train a model with constraints which are imposed on the network by hardware. In Yi et al. (2008) and Plagianakos and Vrahatis (1999), a kind of evolutionary algorithm has also been used to train neural network when values of weights are integer, and a genetic algorithm has been used in Kok et al. (1996) as another example of using evolutionary algorithms. In Wang et al. (2015), random projection functions have been used so as to impose constraints (which is the intersection of lots of convex sets).

Gradient based constrained learning can be divided into two major approaches. First, those in which the desired constraint is added to the cost function or the Error function in the training phase, and second, those which use a projection function in order to impose constraints on free parameters (i.e. weights).

An example of adding constraint in the cost function has been proposed in Di Muro and Ferrari (2008) which used Levenberg–Marquardt algorithm to train synaptic weights. In Ayinde and Zurada (2017), a constraint has been added to the Error function and as a result, the weights of the neural network took non-negative values and this leads to forming spars convolution filters. Or in Han et al. (2008), with changing the Error function, the constraint is imposed by adding the first derivative of the activation function so that the second derivative turns up in gradient descent algorithm. And Perantonis et al. (1998) has tried to estimate a kind of polynomial by imposing constraint on the Error function.

Another kind of imposing constraint on weights is done through projecting weights of artificial neural network. In this method, the derivative of the projection function is used after error propagation for updating weights by gradient descent if the projection function is differentiable. One kind of constrained learning which used gradient descent while its projection function was not differentiable has been proposed in Chorowski and Zurada (2015). In this article weights are projected by the ReLU function and there will be no calculation of derivatives after error propagation any longer.

Other studies have used differentiable functions, and if these functions were not differentiable, a differentiable approximation of the projection function was used after error propagation, like the straight-through function which has been used in order to impose the constraint of binary (+1, 1) weights during training a network (Courbariaux et al., 2016). Functions which are differentiable in a specific range have been used in some cases like the semi-linear constraint function which is imposed on every single weight of the network separately (Bouzerdoum and Pattison, 1993). A version of algorithm under gradient descent constraint called CSGD has been proposed in Mu et al. (2013) in which the formula for updating weights is refined by considering the projection function which is in charge of imposing an equality constraint. This version of gradient descent algorithm estimates the convex set of the projection function of the weights over training iterations. Stochastic multiplicative gradient descent has been proposed In order to impose the L1 constraint (Tewari, 2008). The weights in this algorithm are projected so as to impose constraint on them and a new equation is refined for updating gradient descent.

The main focus of this article is on the methods of constrained learning in which SGD algorithm is used without any changes, and depending on the differentiability of the projection function, the calculation of the equation for updating the projection function parameter is made by the chain rule. An example of this approach is proposed in Merolla et al. (2016) in which weights are bound to a specific range. The projection function in this approach works in an element-wise way. The list of constraints which are imposed on free parameters by the projection function is proposed in Section 3 as well.

The following issues are analyzed in relation with MNIST, CIFAR-10 and SVHN datasets:

  • Using constrained learning in order to make artificial neural networks more biologically plausible.

  • Using constrained learning as a regularization technique.

  • Using an ensemble of neural networks which are trained under constrained learning so as to decrease the error of classification.

  • Constraining the weights in a deep neural network might result in an improvement in its learning time.

Section snippets

Constrained neural network

The free parameters of artificial neural networks, which is called synaptic weights W=(w0,w1,,wn), either strengthen or weaken pre-synaptic neurons’ output L=(l0,l1,,ln) so as to send them to the post-synaptic neuron k, while these scaling factors (W) are not bounded (i.e. <W<+) and consequently, the pre-synaptic neurons’ output could be either strengthen or weaken unlimitedly (i.e. <WL<+). It means an artificial neuron calculates a weighted sum of pre-synaptic values as its membrane

Experiments

The following constraints are used in this article for training the neural network:

  • Element-wise constraints:

    • 1.

      binary weights (Courbariaux et al., 2016, Merolla et al., 2016) (1 and −1), which is implemented using a Straight-Through function

    • 2.

      negative weights constraints (a positive version also implemented in Ayinde and Zurada, 2017 and Chorowski and Zurada, 2015), which is implemented using wi=K(θi)=θi2+2|θi|12

    • 3.

      bounded weights, which is implemented using wi=K(θi)=TanH(θi)

  • Non-element-wise

Narrowing deep space

The constraints of synaptic weights have to be a differentiable function whose output may be of different kinds like 1 or 1 (a network with binary weights), 0 or 1 (sparse network), trinary value, even-signed byte variable (a compressed and fast neural network), or networks with low resolution synapses which is useful for neuromorphic computing (Pfeil et al., 2012). Changing the constraint function provides the way of testing some other constraints on synaptic weights. Furthermore, the

Implementation

We used Caffe for deep learning (Jia et al., 2014). To impose constraints on weights, we used Python which injects the network weights through Caffe’s Python API. In other words, our algorithm reads the free parameters of the network (here these free parameters are θ not Ws), imposes the constraint function on each weight and then manipulates the weights of the network using API and then goes for the next training cycle (i.e. the next sample for SGD algorithm).

Training parameters of Caffe are:

Conclusion

This paper uses a meta-heuristic method which imposes constraints on the weights of the artificial neural networks using stochastic gradient descent learning algorithm. Imposing constraints has some advantages which are described in the following list:

  • 1.

    Being more biologically plausible is an important outcome of the method. Weights of the network get values which are in the same range as the biological neural network, which is not able to strengthen the pre-synaptic potential unlimitedly, with

References (42)

  • DanY. et al.

    Spike timing-dependent plasticity of neural circuits

    Neuron

    (2004)
  • HanF. et al.

    Modified constrained learning algorithms incorporating additional functional constraints into neural networks

    Inform. Sci.

    (2008)
  • PerantonisS.J. et al.

    An efficient constrained learning algorithm with momentum acceleration

    Neural Netw.

    (1995)
  • ArdakaniA. et al.

    Sparsely-connected neural networks: Towards efficient VLSI implementation of deep neural networks

  • AyindeB.O. et al.

    Deep learning of constrained autoencoders for enhanced understanding of data

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • Bengio, Y., Léonard, N., Courville, A.C., 2013. Estimating or Propagating Gradients Through Stochastic Neurons for...
  • BouzerdoumA. et al.

    Neural network for quadratic optimization with bound constraints

    IEEE Trans. Neural Netw.

    (1993)
  • ChorowskiJ. et al.

    Learning understandable neural networks with nonnegative weight constraints

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • Ciresan, D., Meier, U., Schmidhuber, J., 2012. Multi-column Deep Neural Networks for Image Classification. arXiv....
  • Courbariaux, M., Bengio, Y., David, J.-P., 2015. BinaryConnect: Training Deep Neural Networks with binary weights...
  • Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y., 2016. Binarized Neural Networks: Training Deep...
  • Di MuroG. et al.

    A constrained-optimization approach to training neural networks for smooth function approximation and system identification

  • ElizondoD. et al.

    A survey of partially connected neural networks

    Int. J. Neural Syst.

    (1997)
  • GerstnerW. et al.

    Spiking neuron models: Single neurons, populations, plasticity

    Book

    (2002)
  • GhiasiG. et al.

    Dropblock: A regularization method for convolutional networks

  • HuangG. et al.

    Densely connected convolutional networks

  • HumbleJ. et al.

    STDP Pattern onset learning depends on background activity

  • JiaY. et al.

    Caffe: Convolutional architecture for fast feature embedding

  • KokJ.N. et al.

    Constraining of weights using regularities

    ESANN

    (1996)
  • KrizhevskyA.

    Learning Multiple Layers of Features from Tiny Images

    (2009)
  • LeQ. et al.

    Tiled convolutional neural networks

    Nips

    (2010)
  • Cited by (0)

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.06.022.

    View full text