S-pad: self-learning padding mechanism

Ke, Xiao; Lin, Xinru; Guo, Wenzhong

doi:10.1007/s00138-023-01456-5

S-pad: self-learning padding mechanism

Original Paper
Published: 13 September 2023

Volume 34, article number 102, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

95 Accesses
Explore all metrics

Abstract

Padding is used to maintain the size of the feature map and reduce information bias against boundaries. The extra information added by these schemes at the boundary implicitly affects the feature extraction. Zero-padding introduces weakly correlated information resulting in weak boundary information, but provides location encoded information. Various padding produces weight asymmetries due to the uneven application in subsampling. With the lack of a universally superior method, it is necessary to manually determine whether the padding method is suitable for a given task. In this paper, we propose a self-learning padding mechanism, S-Pad, which extends the value by learning the boundary information of the image and provides the network with a set of \(1\times 1\) filters. Following the training process, tailored boundary rules adapted to the specific task can be obtained. S-Pad, in turn, effectively mitigates the weakening of boundary information as well as weight asymmetries. Our study is dedicated to probing the effectiveness of S-Pad in the domains of object detection and classification. Through our validation process, we establish the enhanced efficacy of S-Pad, resulting in an overall performance improvement for both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Simultaneous Edge Alignment and Learning

Holistically-Nested Edge Detection

Article 15 March 2017

Object Boundary Detection and Classification with Image-Level Labels

References

Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (2015). arXiv: 1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition, pp. 770–778 (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs] (2016). arXiv: 1506.02640
Zhou, X., Wang, D., Krähenbühl, P.: Objects as Points. arXiv:1904.07850 [cs] (2019). arXiv: 1904.07850
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs] (2016). arXiv: 1511.07122
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)
Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334. PMLR (2019)
Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: 2008 15th IEEE International Conference on Image Processing, pp. 505–508 (2008). https://doi.org/10.1109/ICIP.2008.4711802
Schubert, S., Neubert, P., Pöschmann, J., Protzel, P.: Circular convolutional neural networks for panoramic images and laser data. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 653–660. IEEE (2019)
Islam*, M.A., Jia*, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020)
Alsallakh, B., Kokhlikyan, N., Miglani, V., Yuan, J., Reblitz-Richardson, O.: Mind the Pad-CNNs can develop blind spots. In: International Conference on Learning Representations (2021)
Innamorati, C., Ritschel, T., Weyrich, T., Mitra, N.J.: Learning on the edge: investigating boundary filters in CNNs. Int. J. Comput. Vis. 128(4), 773–782 (2020). https://doi.org/10.1007/s11263-019-01223-y
Article MATH Google Scholar
Liu, G., Shih, K.J., Wang, T.-C., Reda, F.A., Sapra, K., Yu, Z., Tao, A., Catanzaro, B.: Partial convolution based padding. arXiv:1811.11718 (2018)
Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.: Interacte: improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, Issue: 03, pp. 3009–3016 (2020)
Convolution with even-sized kernels and symmetric padding
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity Invariant CNNs. arXiv:1708.06500 [cs] (2017). arXiv: 1708.06500
Sineesh, A., Raveendranatha Panicker, M.: Edge preserved universal pooling: novel strategies for pooling in convolutional neural networks. Multimedia Syst. 1–14 (2023)
Alrasheedi, F., Zhong, X., Huang, P.-C.: Padding module: learning the padding in deep neural networks. IEEE Access 11, 7348–7357 (2023)
Article Google Scholar
Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? arXiv:1805.12177 [cs] (2019). arXiv: 1805.12177
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–11572 (1999). https://doi.org/10.1109/ICCV.1999.790410
Michaeli, H., Michaeli, T., Soudry, D.: Alias-free convnets: fractional shift invariance via polynomial activations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16333–16342 (2023)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision-ECCV 2014. Lecture Notes in Computer Science, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016). https://doi.org/10.1109/CVPR.2016.350
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning | IEEE Conference Publication | IEEE Xplore
Kayhan, O.S., Gemert, J.C.v.: On translation invariance in cnns: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Publisher, Citeseer (2009)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 [cs] (2017). arXiv: 1704.04861

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61972097 and U21A20472, in part by the National Key Research and Development Plan of China under Grant 2021YFB3600503, in part by the Natural Science Foundation of Fujian Province under Grant 2021J01612 and 2020J01494, in part by the Major Science and Technology Project of Fujian Province under Grant 2021HZ022007, in part by the Industry-Academy Cooperation Project of Fujian Province under Grant 2018H6010, in part by the Fujian Collaborative Innovation Center for Big Data Application in Governments, and in part by the Fujian Engineering Research Center of Big Data Analysis and Processing. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers.

Author information

Xinru Lin and Wenzhong Guo have contributed equally to this work.

Authors and Affiliations

Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, Fujian, China
Xiao Ke, Xinru Lin & Wenzhong Guo
Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou, 350003, Fujian, China
Xiao Ke & Wenzhong Guo

Authors

Xiao Ke
View author publications
You can also search for this author in PubMed Google Scholar
Xinru Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinru Lin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ke, X., Lin, X. & Guo, W. S-pad: self-learning padding mechanism. Machine Vision and Applications 34, 102 (2023). https://doi.org/10.1007/s00138-023-01456-5

Download citation

Received: 10 April 2022
Revised: 20 August 2023
Accepted: 22 August 2023
Published: 13 September 2023
DOI: https://doi.org/10.1007/s00138-023-01456-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

S-pad: self-learning padding mechanism

Abstract

Access this article

Similar content being viewed by others

Simultaneous Edge Alignment and Learning

Holistically-Nested Edge Detection

Object Boundary Detection and Classification with Image-Level Labels

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

S-pad: self-learning padding mechanism

Abstract

Access this article

Similar content being viewed by others

Simultaneous Edge Alignment and Learning

Holistically-Nested Edge Detection

Object Boundary Detection and Classification with Image-Level Labels

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation