Noise-boosted bidirectional backpropagation and adversarial learning

doi:10.1016/j.neunet.2019.09.016

Neural Networks

Volume 120, December 2019, Pages 9-31

https://doi.org/10.1016/j.neunet.2019.09.016 Get rights and content

Abstract

Bidirectional backpropagation trains a neural network with backpropagation in both the backward and forward directions using the same synaptic weights. Special injected noise can then improve the algorithm’s training time and accuracy because backpropagation has a likelihood structure. Training in each direction is a form of generalized expectation–maximization because backpropagation itself is a form of generalized expectation–maximization. This requires backpropagation invariance in each direction: The gradient log-likelihood in each direction must give back the original update equations of the backpropagation algorithm. The special noise makes the current training signal more probable as bidirectional backpropagation climbs the nearest hill of joint probability or log-likelihood. The noise for injection differs for classification and regression even in the same network because of the constraint of backpropagation invariance. The backward pass in a bidirectionally trained classifier estimates the centroid of the input pattern class. So the feedback signal that arrives back at the input layer of a classifier tends to estimate the local pattern-class centroid. Simulations show that noise speeded convergence and improved the accuracy of bidirectional backpropagation on both the MNIST test set of hand-written digits and the CIFAR-10 test set of images. The noise boost further applies to regular and Wasserstein bidirectionally trained adversarial networks. Bidirectionality also greatly reduced the problem of mode collapse in regular adversarial networks.

Section snippets

Introduction: From adaptive resonance to noise-boosted bidirectional backpropagation and adversarial learning

What is a feedback signal that arrives at the input layer of a neural network?

Grossberg answered this question with his adaptive resonance theory or ART: The feedback signal is an expectation (Grossberg, 1976, Grossberg, 1982, Grossberg, 1988). The neural network expects to see this feedback signal or pattern given the current input signal that stimulated the network and given the pattern associations that it has learned. So any synaptic learning should depend on the match or mismatch between

Backpropagation and noise injection

The BP algorithm trains a neural network to approximate some mapping from the input space $X$ to the output space $Y$ (Bishop, 2006, Hinton et al., 1986, Jordan and Mitchell, 2015, LeCun et al., 2015, Werbos, 1974). The mapping is a simple function for an ideal classifier. The BP algorithm is itself a special case of generalized expectation–maximization (GEM) for maximum-likelihood estimation with latent or hidden parameters (Audhkhasi et al., 2016).

The reduction of GEM to BP follows from the key

Bidirectional backpropagation (B-BP)

The B-BP algorithm trains a multilayered neural network with backpropagation in both directions over the same web of synaptic connections (Adigun and Kosko, 2016, Adigun and Kosko, 2019). Algorithm 1 shows the steps in the B-BP algorithm in the more general case if we inject NEM noise into the input and output layers of a classifier network. Similar steps can inject noise in hidden layers and in all the layers of bidirectional regression networks.

B-BP is also a form of maximum likelihood

NEM noise-boosted B-BP learning

This section shows how NEM noise boosts the B-BP algorithm. B-BP seeks to jointly maximize the log-likelihood of the network’s forward pass and its backward pass. So B-BP is also a form of maximum-likelihood estimation.

The above BP-as-GEM theorem states that the gradient of the network log-likelihood equals the gradient of the EM surrogate likelihood $Q (Θ | Θ^{n})$ : $\nabla log p = \nabla Q$ . This gradient equality gives in the bidirectional case $\nabla_{Θ} log p (y | x, Θ) + \nabla_{Θ} log p (x | y, Θ) = \nabla_{Θ} Q_{f} (Θ | Θ^{(i)}) + \nabla_{Θ} Q_{b} (Θ | Θ^{(i)})$ if $Q_{f} (Θ | Θ^{(i)})$ and $Q$

Generative adversarial networks and bidirectional training

An adversarial network consists of two or more neural networks that try to trick each other. They use feedback among the neural networks and sometimes within neural networks.

The standard generative adversarial network (GAN) consists of two competing neural networks. One network generates patterns to trick or fool the other network that tries to tell whether a generated pattern is real or fake. The generator network $G$ acts as a type of art forger while the discriminator network $D$ acts as a type

Bidirectional simulation results

We simulated the performance of NEM-noise-boosted B-BP for different bidirectional network structures on the MNIST and CIFAR-10 image data sets. We also simulated multilayer BAMs to test the extent to which they converged or stabilized to BAM fixed points. Stability tended to fall off with an increased number of hidden layers. A general finding was that bipolar coding tended to outperform binary in most cases (Table 2, Table 3, Table 4, Table 5, Table 6). This appears to reflect the

Conclusions

NEM noise injection improved the performance of bidirectional backpropagation on classifiers both in terms of increased accuracy and shorter training time. It often required fewer hidden neurons or layers to achieve the same performance as noiseless B-BP. The NEM noise benefits were even more pronounced for convolutional BAM classifiers. NEM noise injects just that noise that makes the current signal more probable. It almost always outperformed simply injecting blind noise. B-BP also

References (55)

AliM. et al.
Stochastic stability of neutral-type markovian-jumping BAM neural networks with time varying delays
Journal of Computational and Applied Mathematics
(2019)
AudhkhasiK. et al.
Noise-enhanced convolutional neural networks
Neural Networks
(2016)
BhatiaS. et al.
Bidirectional constraint satisfaction in rational strategic decision making
Journal of Mathematical Psychology
(2019)
GrossbergS.
Nonlinear neural networks: Principles, mechanisms, and architectures
Neural Networks
(1988)
GrossbergS.
Towards solving the hard problem of consciousness: The varieties of brain resonances and the conscious experiences that they support
Neural Networks
(2017)
JainA.K.
Data clustering: 50 years beyond k-means
Pattern Recognition Letters
(2010)
MaharajanC. et al.
Impulsive Cohen–Grossberg BAM neural networks with mixed time-delays: an exponential stability analysis issue
Neurocomputing
(2018)
OsobaO. et al.
Noise-enhanced clustering and competitive learning algorithms
Neural Networks
(2013)
WangF. et al.
pth moment exponential stability of stochastic memristor-based bidirectional associative memory (BAM) neural networks with time delays
Neural Networks
(2018)
Adigun, O., & Kosko, B. (2016). Bidirectional representation and backpropagation learning. In International joint...

AdigunO. et al.

Using noise to speed up video classification with recurrent backpropagation

AdigunO. et al.

Training generative adversarial networks with bidirectional backpropagation

AdigunO. et al.

Bidirectional backpropagation

IEEE Transactions on Systems, Man, and Cybernetics: Systems

(2019)

Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In...

ArjovskyM. et al.

Wasserstein GAN

(2017)

BengioY.

Learning deep architectures for ai

Foundations and trends in Machine Learning

(2009)

BishopC.M.

Pattern recognition and machine learning

(2006)

CohenM.A. et al.

Absolute stability of global pattern formation and parallel memory storage by competitive neural networks

IEEE Transactions on Systems, Man, and Cybernetics

(1983)

DempsterA.P. et al.

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society. Series B. Statistical Methodology

(1977)

DumoulinV. et al.

A guide to convolution arithmetic for deep learning

(2016)

FukushimaK.

Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

Biological Cybernetics

(1980)

GoodfellowI. et al.

Generative adversarial nets

GravesA. et al.

Speech recognition with deep recurrent neural networks

GrossbergS.

Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions

Biological Cybernetics

(1976)

GrossbergS.

How does a brain build a cognitive code?

GubnerJ.A.

Probability and random processes for electrical and computer engineers

(2006)

GulrajaniI. et al.

Improved training of wasserstein GANs

(2017)

Cited by (22)

Noise-boosted recurrent backpropagation
2023, Neurocomputing
A statistical formulation of recurrent backpropagation (RBP) allows direct noise boosting for time-varying classification and regression. The noise boost reduces training iterations and improves accuracy. The injected noise is just that noise that makes the current signal more probable. This noise-boost result extends the two recent results that backpropagation is a special case of the generalized expectation maximization (EM) algorithm and that careful noise injection can always speed the average convergence of the EM algorithm to a local maximum of the log-likelihood surface. The noise-benefit conditions differ for additive and multiplicative noise in RBP. We tested noise-boosted RBP classifiers on 11 classes of sports video clips and tested RBP regressors on predicting the dollar-rupee exchange rate. Injecting noisy-EM (NEM) noise outperformed injecting blind noise or injecting no noise at all. Additive NEM noise usually outperformed multiplicative noise. The best case of NEM noise injection with RBP training of a recurrent neural classification model speeded up its training by 60 $%$ and improved its classification accuracy by $9.51 %$ compared with noiseless RBP training and accuracy. The best performance of the NEM noise with the RBP training of a recurrent neural regression model yielded a 38 $%$ speed-up in training and also reduced the squared error by 49.3 $%$ . The injection of the additive NEM noise in the output and hidden neurons performed best.
Proposal for a computational model of incentive memory
2023, Cognitive Systems Research
One of the objectives of Artificial General Intelligence is to propose programs to solve tasks similarly to the way human beings do. The associative memory plays an essential role in the daily behavior of human beings: If we have a pleasant experience, we learn to associate this pleasant experience with the object or circumstances that caused it. Thus, in the future, we react based on the associated memory when we face a similar object or circumstance. In this article, we propose and show the implementation of an associative memory model. This proposed model will contribute to a cognitive architecture that can produce human-like behaviors. The proposal’s fundamentals come from knowledge from different areas such as psychology, neuroscience, and computing. This knowledge allows us to propose the model’s processes, components, and relationships among components. The resulting model can store and retrieve associations focused on visual stimuli and their incentive value. For its validation, we use a case study covering the stages of memory generation and retrieval. The case study considers noisy inputs, incomplete information, and similar inputs. In this way, part of our contribution is having an associative memory model able to generalize the response to different stimuli and consequently improve the creature’s adaptation to its environment, complementing planning and decision-making processes.
Remaining useful life prediction based on intentional noise injection and feature reconstruction
2021, Reliability Engineering and System Safety
Citation Excerpt :
Many theoretical studies and experimental results have demonstrated that injecting noise into neural networks can speed up the convergence of the training process and enhance the output from neural networks [16, 18-21]. The noise can be separately injected into the input, hidden and output layers of a neural network [16, 18, 22]. However, there is few precedent for the RUL prediction boosted by the injection of intentional noise into a neural network.
The accurate remaining useful life (RUL) prediction is the foundation of prognostics and health management (PHM). The accuracy of RUL prediction model depends on not only the quality and quantity of degradation feature but also the prediction model. In most of the existing deep-learning based RUL prediction models, noise is considered harmful and has to be removed. Further, the correlation among sensory measurements is ignored. However, noise can boost the prediction performance if judiciously used. This paper proposes a new RUL prediction method where noise is intentionally added into a long short-term memory (LSTM) network. Additionally, correlation analysis is conducted among the sensory measurements to construct new degradation features as the inputs of the LSTM network. Validation of the proposed method was carried out on the C-MAPSS aero-engine lifetime dataset. Finally, the proposed RUL prediction model is compared to other the-state-of-the-art techniques.
Noise can speed backpropagation learning and deep bidirectional pretraining
2020, Neural Networks
Citation Excerpt :
The BP–EM correspondence still holds for recurrent BP (Adigun & Kosko, 2017). The correspondence also holds for the new bidirectional BP algorithm (Adigun & Kosko, 2016, 2019a) and its application to generative adversarial neural networks trained on CIFAR-10 image data (Adigun & Kosko, 2019b). BP’s forward pass corresponds to EM’s expectation step.
We show that the backpropagation algorithm is a special case of the generalized Expectation–Maximization (EM) algorithm for iterative maximum likelihood estimation. We then apply the recent result that carefully chosen noise can speed the average convergence of the EM algorithm as it climbs a hill of probability or log-likelihood. Then injecting such noise can speed the average convergence of the backpropagation algorithm for both the training and pretraining of multilayer neural networks. The beneficial noise adds to the hidden and visible neurons and related parameters. The noise also applies to regularized regression networks. This beneficial noise is just that noise that makes the current signal more probable. We show that such noise also tends to improve classification accuracy. The geometry of the noise-benefit region depends on the probability structure of the neurons in a given layer. The noise-benefit region in noise space lies above the noisy-EM (NEM) hyperplane for classification and involves a hypersphere for regression. Simulations demonstrate these noise benefits using MNIST digit classification. The NEM noise benefits substantially exceed those of simply adding blind noise to the neural network. We further prove that the noise speed-up applies to the deep bidirectional pretraining of neural-network bidirectional associative memories (BAMs) or their functionally equivalent restricted Boltzmann machines. We then show that learning with basic contrastive divergence also reduces to generalized EM for an energy-based network probability. The optimal noise adds to the input visible neurons of a BAM in stacked layers of trained BAMs. Global stability of generalized BAMs guarantees rapid convergence in pretraining where neural signals feed back between contiguous layers. Bipolar coding of inputs further improves pretraining performance.
Admiring the Great Mountain: A Celebration Special Issue in Honor of Stephen Grossberg's 80th Birthday
2019, Neural Networks
This editorial summarizes selected key contributions of Prof. Stephen Grossberg and describes the papers in this 80th birthday special issue in his honor. His productivity, creativity, and vision would each be enough to mark a scientist of the first caliber. In combination, they have resulted in contributions that have changed the entire discipline of neural networks. Grossberg has been tremendously influential in engineering, dynamical systems, and artificial intelligence as well. Indeed, he has been one of the most important mentors and role models in my career, and has done so with extraordinary generosity and encouragement. All authors in this special issue have taken great pleasure in hereby commemorating his extraordinary career and contributions.
A Hybrid Regularized Multilayer Perceptron for Input Noise Immunity
2024, IEEE Transactions on Artificial Intelligence

View all citing articles on Scopus

View full text

2019 Special IssueNoise-boosted bidirectional backpropagation and adversarial learning

Abstract

Section snippets

Introduction: From adaptive resonance to noise-boosted bidirectional backpropagation and adversarial learning

Backpropagation and noise injection

Bidirectional backpropagation (B-BP)

NEM noise-boosted B-BP learning

Generative adversarial networks and bidirectional training

Bidirectional simulation results

Conclusions

Journal of Computational and Applied Mathematics

Neural Networks

Journal of Mathematical Psychology

Neural Networks

Neural Networks

Pattern Recognition Letters

Neurocomputing

Neural Networks

Neural Networks

Using noise to speed up video classification with recurrent backpropagation

Training generative adversarial networks with bidirectional backpropagation

Bidirectional backpropagation

IEEE Transactions on Systems, Man, and Cybernetics: Systems

Wasserstein GAN

Learning deep architectures for ai

Foundations and trends in Machine Learning

Pattern recognition and machine learning

Absolute stability of global pattern formation and parallel memory storage by competitive neural networks

IEEE Transactions on Systems, Man, and Cybernetics

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society. Series B. Statistical Methodology

A guide to convolution arithmetic for deep learning

Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

Biological Cybernetics

Generative adversarial nets

Speech recognition with deep recurrent neural networks

Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions

Biological Cybernetics

How does a brain build a cognitive code?

Probability and random processes for electrical and computer engineers

Improved training of wasserstein GANs

2019 Special Issue
Noise-boosted bidirectional backpropagation and adversarial learning