Skip to main content
Log in

S-pad: self-learning padding mechanism

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Padding is used to maintain the size of the feature map and reduce information bias against boundaries. The extra information added by these schemes at the boundary implicitly affects the feature extraction. Zero-padding introduces weakly correlated information resulting in weak boundary information, but provides location encoded information. Various padding produces weight asymmetries due to the uneven application in subsampling. With the lack of a universally superior method, it is necessary to manually determine whether the padding method is suitable for a given task. In this paper, we propose a self-learning padding mechanism, S-Pad, which extends the value by learning the boundary information of the image and provides the network with a set of \(1\times 1\) filters. Following the training process, tailored boundary rules adapted to the specific task can be obtained. S-Pad, in turn, effectively mitigates the weakening of boundary information as well as weight asymmetries. Our study is dedicated to probing the effectiveness of S-Pad in the domains of object detection and classification. Through our validation process, we establish the enhanced efficacy of S-Pad, resulting in an overall performance improvement for both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (2015). arXiv: 1409.1556

  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition, pp. 770–778 (2016)

  3. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  4. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs] (2016). arXiv: 1506.02640

  5. Zhou, X., Wang, D., Krähenbühl, P.: Objects as Points. arXiv:1904.07850 [cs] (2019). arXiv: 1904.07850

  6. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  7. Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)

  8. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220

  9. Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs] (2016). arXiv: 1511.07122

  10. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  11. Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)

  12. Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334. PMLR (2019)

  13. Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: 2008 15th IEEE International Conference on Image Processing, pp. 505–508 (2008). https://doi.org/10.1109/ICIP.2008.4711802

  14. Schubert, S., Neubert, P., Pöschmann, J., Protzel, P.: Circular convolutional neural networks for panoramic images and laser data. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 653–660. IEEE (2019)

  15. Islam*, M.A., Jia*, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020)

  16. Alsallakh, B., Kokhlikyan, N., Miglani, V., Yuan, J., Reblitz-Richardson, O.: Mind the Pad-CNNs can develop blind spots. In: International Conference on Learning Representations (2021)

  17. Innamorati, C., Ritschel, T., Weyrich, T., Mitra, N.J.: Learning on the edge: investigating boundary filters in CNNs. Int. J. Comput. Vis. 128(4), 773–782 (2020). https://doi.org/10.1007/s11263-019-01223-y

    Article  MATH  Google Scholar 

  18. Liu, G., Shih, K.J., Wang, T.-C., Reda, F.A., Sapra, K., Yu, Z., Tao, A., Catanzaro, B.: Partial convolution based padding. arXiv:1811.11718 (2018)

  19. Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.: Interacte: improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, Issue: 03, pp. 3009–3016 (2020)

  20. Convolution with even-sized kernels and symmetric padding

  21. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity Invariant CNNs. arXiv:1708.06500 [cs] (2017). arXiv: 1708.06500

  22. Sineesh, A., Raveendranatha Panicker, M.: Edge preserved universal pooling: novel strategies for pooling in convolutional neural networks. Multimedia Syst. 1–14 (2023)

  23. Alrasheedi, F., Zhong, X., Huang, P.-C.: Padding module: learning the padding in deep neural networks. IEEE Access 11, 7348–7357 (2023)

    Article  Google Scholar 

  24. Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? arXiv:1805.12177 [cs] (2019). arXiv: 1805.12177

  25. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–11572 (1999). https://doi.org/10.1109/ICCV.1999.790410

  26. Michaeli, H., Michaeli, T., Soudry, D.: Alias-free convnets: fractional shift invariance via polynomial activations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16333–16342 (2023)

  27. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision-ECCV 2014. Lecture Notes in Computer Science, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  28. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  29. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016). https://doi.org/10.1109/CVPR.2016.350

  30. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning | IEEE Conference Publication | IEEE Xplore

  31. Kayhan, O.S., Gemert, J.C.v.: On translation invariance in cnns: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)

  32. Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Publisher, Citeseer (2009)

    Google Scholar 

  33. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 [cs] (2017). arXiv: 1704.04861

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61972097 and U21A20472, in part by the National Key Research and Development Plan of China under Grant 2021YFB3600503, in part by the Natural Science Foundation of Fujian Province under Grant 2021J01612 and 2020J01494, in part by the Major Science and Technology Project of Fujian Province under Grant 2021HZ022007, in part by the Industry-Academy Cooperation Project of Fujian Province under Grant 2018H6010, in part by the Fujian Collaborative Innovation Center for Big Data Application in Governments, and in part by the Fujian Engineering Research Center of Big Data Analysis and Processing. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinru Lin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ke, X., Lin, X. & Guo, W. S-pad: self-learning padding mechanism. Machine Vision and Applications 34, 102 (2023). https://doi.org/10.1007/s00138-023-01456-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01456-5

Keywords

Navigation