Batch-normalized Mlpconv-wise supervised pre-training network in network

Han, Xiaomeng; Dai, Qun

doi:10.1007/s10489-017-0968-2

Batch-normalized Mlpconv-wise supervised pre-training network in network

Published: 20 June 2017

Volume 48, pages 142–155, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xiaomeng Han¹ &
Qun Dai¹

955 Accesses
20 Citations
Explore all metrics

Abstract

Deep multi-layered neural networks have nonlinear levels that allow them to represent highly varying nonlinear functions compactly. In this paper, we propose a new deep architecture with enhanced model discrimination ability that we refer to as mlpconv-wise supervised pre-training network in network (MPNIN). The process of information abstraction is facilitated within the receptive fields for MPNIN. The proposed architecture uses the framework of the recently developed NIN structure, which slides a universal approximator, such as a multilayer perceptron with rectifier units, across an image to extract features. However, the random initialization of NIN can produce poor solutions to gradient-based optimization. We use mlpconv-wise supervised pre-training to remedy this defect because this pre-training technique may contribute to overcoming the difficulties of training deep networks by better initializing the weights in all the layers. Moreover, batch normalization is applied to reduce internal covariate shift by pre-conditioning the model. Empirical investigations are conducted on the Mixed National Institute of Standards and Technology (MNIST), the Canadian Institute for Advanced Research (CIFAR-10), CIFAR-100, the Street View House Numbers (SVHN), the US Postal (USPS), Columbia University Image Library (COIL20), COIL100 and Olivetti Research Ltd (ORL) datasets, and the results verify the effectiveness of the proposed MPNIN architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

Article 23 July 2023

ResNet: Solving Vanishing Gradient in Deep Networks

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

References

Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
MATH Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol P A, Vincent P, Bengio S (2010) Why Does Unsupervised Pre-training Help Deep Learning?. J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Le Q V, Ngiam J, Chen Z, Chia D J H, Pang W K, Ng A Y (2010) Tiled convolutional neural networks. In: Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems 2010. Proceedings of A Meeting Held 6-9 December 2010, Vancouver, British Columbia, Canada, pp 1279–1287
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105
Google Scholar
Dahl G E, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio Speech Language Process 20: 30–42
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trendsin Mach Learn 2:1–127
Article MATH Google Scholar
Lécun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324
Article Google Scholar
Hinton G E, Osindero S, Teh Y W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Zeiler M D, Fergus R (2013) Visualizing and understanding convolutional networks. Lect Notes Comput Sci 8689:818–833
Article Google Scholar
Gustavsson A, Magnuson A, Blomberg B, Andersson M, Halfvarson J, Tysk C (2012) On the difficulty of training Recurrent Neural Networks. Eprint Arxiv 52:337–345
Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Wan L, Zeiler M, Zhang S, Cun Y L, Fergus R (2013) Regularization of neural networks using DropConnect International conference on machine learning, pp 1058–1066
Lin M, Chen Q, Yan S (2013) Network in network. Computer Science
Lee C Y, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. Eprint Arxiv
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inform Process Syst 19:153–160
Google Scholar
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Infer 90:227–244
Article MathSciNet MATH Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Computer Science
Krizhevsky A (2012) Learning multiple layers of features from tiny images
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y (2011) Reading digits in natural images with unsupervised feature learning. Nips Workshop on Deep Learning & Unsupervised Feature Learning
Hull J J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16:550–554
Article Google Scholar
Cai D, He X, Han J, Huang T S (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33:1548–1560
Google Scholar
Cai D, He X, Hu Y, Han J, Huang T (2007) Learning a spatially smooth subspace for face recognition. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR ’07, pp 1–7
Goodfellow I J, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. Icml
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Article Google Scholar
Le Q V, Karpenko A, Ngiam J, Ng A Y (2015) ICA With reconstruction cost for efficient overcomplete feature learning. Advances in Neural Information Processing Systems
Goodfellow I J (2013) Piecewise linear multilayer perceptrons and dropout. Computer Science
Chang J R, Chen Y S (2015) Batch-normalized maxout network in network. Computer Science
Carandini M, Heeger D J (2011) Normalization as a canonical neural computation. Nat Nat Rev Neurosci 13:51–62
Google Scholar
Glorot X, Bengio Y (2015) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256
Google Scholar
Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 3:212–223
Google Scholar
Vedaldi A, Lenc K (2014) Matconvnet - convolutional neural networks for MATLAB. Eprint Arxiv
Tang J, Deng C, Huang G B (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27:809–821
Article MathSciNet Google Scholar
Zeiler M D, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. eprint arxiv
Yu W, Zhuang F, He Q, Shi Z (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315
Article Google Scholar
Huang G B, Bai Z, Kasun L L C, Vong C M (2015) Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 10:18–29
Article Google Scholar
Coates A, Ng A Y (2011) The importance of encoding versus training with sparse coding and vector quantization. In: International conference on machine learning, ICML 2011, Bellevue, Washington, USA, pp 921–928

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under the Grant no. 61473150.

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Xiaomeng Han & Qun Dai

Authors

Xiaomeng Han
View author publications
You can also search for this author in PubMed Google Scholar
Qun Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Dai.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were carried out in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

For this type of study, formal consent was not required.

All applicable international, national, and/or institutional guidelines for the care and use of animals were followed.

Informed Consent

Informed consent was obtained from all individual participants involved in the study.

Additional informed consent was obtained from all individual participants for whom identifying information is included in this article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Dai, Q. Batch-normalized Mlpconv-wise supervised pre-training network in network. Appl Intell 48, 142–155 (2018). https://doi.org/10.1007/s10489-017-0968-2

Download citation

Published: 20 June 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10489-017-0968-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Batch-normalized Mlpconv-wise supervised pre-training network in network

Abstract

Access this article

Similar content being viewed by others

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

ResNet: Solving Vanishing Gradient in Deep Networks

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Batch-normalized Mlpconv-wise supervised pre-training network in network

Abstract

Access this article

Similar content being viewed by others

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

ResNet: Solving Vanishing Gradient in Deep Networks

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation