Elsevier

Neural Networks

Volume 120, December 2019, Pages 9-31
Neural Networks

2019 Special Issue
Noise-boosted bidirectional backpropagation and adversarial learning

https://doi.org/10.1016/j.neunet.2019.09.016Get rights and content

Abstract

Bidirectional backpropagation trains a neural network with backpropagation in both the backward and forward directions using the same synaptic weights. Special injected noise can then improve the algorithm’s training time and accuracy because backpropagation has a likelihood structure. Training in each direction is a form of generalized expectation–maximization because backpropagation itself is a form of generalized expectation–maximization. This requires backpropagation invariance in each direction: The gradient log-likelihood in each direction must give back the original update equations of the backpropagation algorithm. The special noise makes the current training signal more probable as bidirectional backpropagation climbs the nearest hill of joint probability or log-likelihood. The noise for injection differs for classification and regression even in the same network because of the constraint of backpropagation invariance. The backward pass in a bidirectionally trained classifier estimates the centroid of the input pattern class. So the feedback signal that arrives back at the input layer of a classifier tends to estimate the local pattern-class centroid. Simulations show that noise speeded convergence and improved the accuracy of bidirectional backpropagation on both the MNIST test set of hand-written digits and the CIFAR-10 test set of images. The noise boost further applies to regular and Wasserstein bidirectionally trained adversarial networks. Bidirectionality also greatly reduced the problem of mode collapse in regular adversarial networks.

Section snippets

Introduction: From adaptive resonance to noise-boosted bidirectional backpropagation and adversarial learning

What is a feedback signal that arrives at the input layer of a neural network?

Grossberg answered this question with his adaptive resonance theory or ART: The feedback signal is an expectation (Grossberg, 1976, Grossberg, 1982, Grossberg, 1988). The neural network expects to see this feedback signal or pattern given the current input signal that stimulated the network and given the pattern associations that it has learned. So any synaptic learning should depend on the match or mismatch between

Backpropagation and noise injection

The BP algorithm trains a neural network to approximate some mapping from the input space X to the output space Y (Bishop, 2006, Hinton et al., 1986, Jordan and Mitchell, 2015, LeCun et al., 2015, Werbos, 1974). The mapping is a simple function for an ideal classifier. The BP algorithm is itself a special case of generalized expectation–maximization (GEM) for maximum-likelihood estimation with latent or hidden parameters (Audhkhasi et al., 2016).

The reduction of GEM to BP follows from the key

Bidirectional backpropagation (B-BP)

The B-BP algorithm trains a multilayered neural network with backpropagation in both directions over the same web of synaptic connections (Adigun and Kosko, 2016, Adigun and Kosko, 2019). Algorithm 1 shows the steps in the B-BP algorithm in the more general case if we inject NEM noise into the input and output layers of a classifier network. Similar steps can inject noise in hidden layers and in all the layers of bidirectional regression networks.

B-BP is also a form of maximum likelihood

NEM noise-boosted B-BP learning

This section shows how NEM noise boosts the B-BP algorithm. B-BP seeks to jointly maximize the log-likelihood of the network’s forward pass and its backward pass. So B-BP is also a form of maximum-likelihood estimation.

The above BP-as-GEM theorem states that the gradient of the network log-likelihood equals the gradient of the EM surrogate likelihood Q(Θ|Θn): logp=Q. This gradient equality gives in the bidirectional case Θlogp(y|x,Θ)+Θlogp(x|y,Θ)=ΘQf(Θ|Θ(i))+ΘQb(Θ|Θ(i)) if Qf(Θ|Θ(i)) and Q

Generative adversarial networks and bidirectional training

An adversarial network consists of two or more neural networks that try to trick each other. They use feedback among the neural networks and sometimes within neural networks.

The standard generative adversarial network (GAN) consists of two competing neural networks. One network generates patterns to trick or fool the other network that tries to tell whether a generated pattern is real or fake. The generator network G acts as a type of art forger while the discriminator network D acts as a type

Bidirectional simulation results

We simulated the performance of NEM-noise-boosted B-BP for different bidirectional network structures on the MNIST and CIFAR-10 image data sets. We also simulated multilayer BAMs to test the extent to which they converged or stabilized to BAM fixed points. Stability tended to fall off with an increased number of hidden layers. A general finding was that bipolar coding tended to outperform binary in most cases (Table 2, Table 3, Table 4, Table 5, Table 6). This appears to reflect the

Conclusions

NEM noise injection improved the performance of bidirectional backpropagation on classifiers both in terms of increased accuracy and shorter training time. It often required fewer hidden neurons or layers to achieve the same performance as noiseless B-BP. The NEM noise benefits were even more pronounced for convolutional BAM classifiers. NEM noise injects just that noise that makes the current signal more probable. It almost always outperformed simply injecting blind noise. B-BP also

References (55)

  • AdigunO. et al.

    Using noise to speed up video classification with recurrent backpropagation

  • AdigunO. et al.

    Training generative adversarial networks with bidirectional backpropagation

  • AdigunO. et al.

    Bidirectional backpropagation

    IEEE Transactions on Systems, Man, and Cybernetics: Systems

    (2019)
  • Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In...
  • ArjovskyM. et al.

    Wasserstein GAN

    (2017)
  • BengioY.

    Learning deep architectures for ai

    Foundations and trends in Machine Learning

    (2009)
  • BishopC.M.

    Pattern recognition and machine learning

    (2006)
  • CohenM.A. et al.

    Absolute stability of global pattern formation and parallel memory storage by competitive neural networks

    IEEE Transactions on Systems, Man, and Cybernetics

    (1983)
  • DempsterA.P. et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society. Series B. Statistical Methodology

    (1977)
  • DumoulinV. et al.

    A guide to convolution arithmetic for deep learning

    (2016)
  • FukushimaK.

    Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

    Biological Cybernetics

    (1980)
  • GoodfellowI. et al.

    Generative adversarial nets

  • GravesA. et al.

    Speech recognition with deep recurrent neural networks

  • GrossbergS.

    Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions

    Biological Cybernetics

    (1976)
  • GrossbergS.

    How does a brain build a cognitive code?

  • GubnerJ.A.

    Probability and random processes for electrical and computer engineers

    (2006)
  • GulrajaniI. et al.

    Improved training of wasserstein GANs

    (2017)
  • Cited by (22)

    • Remaining useful life prediction based on intentional noise injection and feature reconstruction

      2021, Reliability Engineering and System Safety
      Citation Excerpt :

      Many theoretical studies and experimental results have demonstrated that injecting noise into neural networks can speed up the convergence of the training process and enhance the output from neural networks [16, 18-21]. The noise can be separately injected into the input, hidden and output layers of a neural network [16, 18, 22]. However, there is few precedent for the RUL prediction boosted by the injection of intentional noise into a neural network.

    • Noise can speed backpropagation learning and deep bidirectional pretraining

      2020, Neural Networks
      Citation Excerpt :

      The BP–EM correspondence still holds for recurrent BP (Adigun & Kosko, 2017). The correspondence also holds for the new bidirectional BP algorithm (Adigun & Kosko, 2016, 2019a) and its application to generative adversarial neural networks trained on CIFAR-10 image data (Adigun & Kosko, 2019b). BP’s forward pass corresponds to EM’s expectation step.

    • A Hybrid Regularized Multilayer Perceptron for Input Noise Immunity

      2024, IEEE Transactions on Artificial Intelligence
    View all citing articles on Scopus
    View full text