Skip to main content
Log in

Bi-linearly weighted fractional max pooling

An extension to conventional max pooling for deep convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose to extend the flexibility of the commonly used 2 × 2 non-overlapping max pooling for Convolutional Neural Network. We name it as Bi-linearly Weighted Fractional Max-Pooling. This proposed method enables max pooling operation below stride size 2, and is computed based on four bi-linearly weighted neighboring input activations. Currently, in a 2 × 2 non-overlapping max pooling operation, as spatial size is halved in both x and y directions, three-quarter of activations in the feature maps are discarded. As such reduction is too abrupt, amount of said pooling operation within a Convolutional Neural Network is very limited: further increasing the number of pooling operation results in too little activation left for subsequent operations. Using our proposed pooling method, spatial size reduction can be more gradual and can be adjusted flexibly. We applied a few combinations of our proposed pooling method into 50-layered ResNet and 19-layered VGGNet with reduced number of filters, and experimented on FGVC-Aircraft, Oxford-IIIT Pet, STL-10 and CIFAR-100 datasets. Even with reduced memory usage, our proposed methods showed reasonable improvement in classification accuracy with 50-layered ResNet. Additionally, with flexibility of our proposed pooling method, we change the reduction rate dynamically every training iteration, and our evaluation results indicated potential regularization effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Coates A, Lee H, Ng AY (2011) An analysis of single-layer networks in unsupervised feature learning. In: AISTATS 2011, vol 1001

  2. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol 9, pp 249–256

  3. Graham B (2014) Fractional max-pooling. arXiv:1412.6071

  4. Han D, Kim J, Kim J (2016) Deep pyramidal residual networks. arXiv:1610.02915

  5. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: IEEE international conference on computer vision (ICCV) 2015

    Google Scholar 

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  7. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, lille, France, 6-11 July 2015, pp 448–456

    Google Scholar 

  8. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  9. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

  10. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  11. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  12. Parkhi OM, Vedaldi A, Zisserman A, Jawahar C (2012) Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3498–3505

  13. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. doi:10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  14. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR 2014). CBLS, p 16

  15. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations

    Google Scholar 

  16. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

    Google Scholar 

  18. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. In: Deep learning workshop, ICML 2015

    Google Scholar 

  19. Zeiler M, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: Proceedings of the international conference on learning representation (ICLR)

    Google Scholar 

  20. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer International Publishing, pp 818–833

Download references

Acknowledgements

We would like to thank MEXT KAKENHI, Grant-in-Aid for Challenging Exploratory Research, Grant Number 15K12027 for partial support of our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siang Thye Hang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hang, S.T., Aono, M. Bi-linearly weighted fractional max pooling. Multimed Tools Appl 76, 22095–22117 (2017). https://doi.org/10.1007/s11042-017-4840-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4840-5

Keywords

Navigation