Abstract
Large neural networks trained on small datasets are increasingly prone to overfitting. Traditional machine learning methods can reduce overfitting by employing bagging or boosting to train several diverse models. For large neural networks, however, this is prohibitively expensive. To address this issue, we propose a method to leverage the benefits of ensembles without explicitely training several expensive neural network models. In contrast to Dropout, to encourage diversity of our sub-networks, we propose to maximize diversity of individual networks with a loss function: DivLoss. We demonstrate the effectiveness of DivLoss on the challenging CIFAR datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Breiman, L.: Bagging predictors. Mach. Learn. (ML) 24, 123–140 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (JCSS) 55, 119–139 (1997)
Breiman, L.: Random forests. Mach. Learn. (ML) 45, 5–32 (2001)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15, 1929–1958 (2014)
Cogswell, M., Ahmed, F., Girshick, R.B., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)
Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J. (eds.) Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. (2015). arXiv:1512.03385
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 12, 2121–2159 (2011)
Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). http://arxiv.org/abs/1212.5701
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
Mishkin, D., Matas, J.: All you need is a good in it. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Artificial Intelligence and Statistics Conference (AISTATS) (2010)
Krähenbühl, P., Doersch, C., Donahue, J., Darrell, T.: Data-dependent initializations of convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics Conference (AISTATS) (2015)
Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)
Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2011)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). http://arxiv.org/abs/1503.02531
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropConnect. In: Proceedings of the International Conference on Machine Learning (ICML), JMLR Workshop and Conference Proceedings (2013)
Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2013)
Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12, 1399–1404 (1999)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Proceedings of the IEEE International Conference on Neural Networks (ICNN) (1996)
Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009)
The Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions (2016). arXiv.org/abs/1605.02688
Acknowledgement
This work was supported by the Austrian Research Promotion Agency (FFG) project Vision+.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Opitz, M., Possegger, H., Bischof, H. (2017). Efficient Model Averaging for Deep Neural Networks. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-54184-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)