Skip to main content
Log in

On the use of variable stride in convolutional neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper explores the idea of changing the stride value in convolutional neural networks depending on the position of the pixel within the image: a smaller stride value is used when processing the center of the image, while a larger one is used for pixels close to the edges. We show several examples of image classification tasks where the proposed approach outperforms a baseline solution of same computational cost using fixed stride and several counterexamples where it does not – and explain why this is so. The proposed method has been successfully tested using several contemporary datasets and can be easily implemented and extended to other image classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. In our experiments the same basic idea was adapted and implemented with different image sizes, depending on the dataset used.

  2. The actual values will, of course, change depending on the size of the input image for each dataset/experiment in Section 4.

  3. This is the layer in which we modify the stride parameter.

  4. The number of nodes in this layer will vary depending on the experiment: experiments 1 and 2 involve 8-class classifiers whereas experiments 3 through 5 consist of 10-class classifiers.

References

  1. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning, pp 17–36

  2. Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. CoRR, arXiv:1609.07093

  3. Chandra S, Kokkinos I (2016) Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs. CoRR, arXiv:1603.08358

    Chapter  Google Scholar 

  4. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. CoRR, arXiv:1706.05587

  5. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  6. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. CoRR, arXiv:1802.02611

  7. Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A ̂ 3ncf: an adaptive aspect attention model for rating prediction. IJCAI, pp 3748–3754

  8. Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. 2015 7th International Conference on Games and Virtual Worlds for Serious Applications (VS-games) pp 3992–4000

  9. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature

  10. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  11. Gao Z, Han TT, Zhu L, Zhang H, Wang Y (2018) Exploring the cross-domain action recognition problem by deep feature learning and cross-domain learning. IEEE Access 6:68989–69008

    Article  Google Scholar 

  12. Gao Z, Wang DY, Xue YB, Xu GP, Zhang H, Wang YL (2018) 3d object recognition based on pairwise multi-view convolutional neural networks. J Vis Commun Image Represent 56:305–315

    Article  Google Scholar 

  13. Guo C, Liu Y-I, Jiao X (2018) Study on the influence of variable stride scale change on image recognition in CNN. Multimed Tools Appl. ISSN 1573-7721. https://doi.org/10.1007/s11042-018-6861-0

    Article  Google Scholar 

  14. Hariharan B, Arbeláez P.A., Girshick RB, Malik J (2017) Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans Pattern Anal Mach Intell 39(4):627, 13

    Article  Google Scholar 

  15. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 2017–2025. http://papers.nips.cc/paper/5854-spatial-transformer-networks

  16. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Citeseer

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  18. Langner O, Dotsch R, Bijlstra G, Wigboldus DHJ, Hawk ST, van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cogn Emotion 24(8):1377–1388. https://doi.org/10.1080/02699930903485076

    Article  Google Scholar 

  19. Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113

    Article  Google Scholar 

  20. LeCun Y, Bottou L, Bengio Y, Haffner P, et al. (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  21. Ledig C, Theis L, Huszar F, Caballero J, Aitken AP, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 105–114

  22. Milletari F, Ahmadi S-A, Kroll C, Plate A, Rozanski V, Maiostre J, Levin J, Dietrich O, Ertl-Wagner B, Bötzel K, et al. (2017) Hough-CNN: deep learning for segmentation of deep brain regions in MRI and ultrasound. Comput Vis Image Underst 164:92–102

    Article  Google Scholar 

  23. Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  24. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747

  25. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. CoRR, arXiv:1511.07122

  26. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155

    Article  MathSciNet  Google Scholar 

  27. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. Adv Neural Inf Process Syst, 27 (NIPS)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oge Marques.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zaniolo, L., Marques, O. On the use of variable stride in convolutional neural networks. Multimed Tools Appl 79, 13581–13598 (2020). https://doi.org/10.1007/s11042-019-08385-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08385-4

Keywords

Navigation