Abstract
Padding is used to maintain the size of the feature map and reduce information bias against boundaries. The extra information added by these schemes at the boundary implicitly affects the feature extraction. Zero-padding introduces weakly correlated information resulting in weak boundary information, but provides location encoded information. Various padding produces weight asymmetries due to the uneven application in subsampling. With the lack of a universally superior method, it is necessary to manually determine whether the padding method is suitable for a given task. In this paper, we propose a self-learning padding mechanism, S-Pad, which extends the value by learning the boundary information of the image and provides the network with a set of \(1\times 1\) filters. Following the training process, tailored boundary rules adapted to the specific task can be obtained. S-Pad, in turn, effectively mitigates the weakening of boundary information as well as weight asymmetries. Our study is dedicated to probing the effectiveness of S-Pad in the domains of object detection and classification. Through our validation process, we establish the enhanced efficacy of S-Pad, resulting in an overall performance improvement for both tasks.
Similar content being viewed by others
References
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (2015). arXiv: 1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition, pp. 770–778 (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs] (2016). arXiv: 1506.02640
Zhou, X., Wang, D., Krähenbühl, P.: Objects as Points. arXiv:1904.07850 [cs] (2019). arXiv: 1904.07850
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs] (2016). arXiv: 1511.07122
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)
Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334. PMLR (2019)
Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: 2008 15th IEEE International Conference on Image Processing, pp. 505–508 (2008). https://doi.org/10.1109/ICIP.2008.4711802
Schubert, S., Neubert, P., Pöschmann, J., Protzel, P.: Circular convolutional neural networks for panoramic images and laser data. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 653–660. IEEE (2019)
Islam*, M.A., Jia*, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020)
Alsallakh, B., Kokhlikyan, N., Miglani, V., Yuan, J., Reblitz-Richardson, O.: Mind the Pad-CNNs can develop blind spots. In: International Conference on Learning Representations (2021)
Innamorati, C., Ritschel, T., Weyrich, T., Mitra, N.J.: Learning on the edge: investigating boundary filters in CNNs. Int. J. Comput. Vis. 128(4), 773–782 (2020). https://doi.org/10.1007/s11263-019-01223-y
Liu, G., Shih, K.J., Wang, T.-C., Reda, F.A., Sapra, K., Yu, Z., Tao, A., Catanzaro, B.: Partial convolution based padding. arXiv:1811.11718 (2018)
Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.: Interacte: improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, Issue: 03, pp. 3009–3016 (2020)
Convolution with even-sized kernels and symmetric padding
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity Invariant CNNs. arXiv:1708.06500 [cs] (2017). arXiv: 1708.06500
Sineesh, A., Raveendranatha Panicker, M.: Edge preserved universal pooling: novel strategies for pooling in convolutional neural networks. Multimedia Syst. 1–14 (2023)
Alrasheedi, F., Zhong, X., Huang, P.-C.: Padding module: learning the padding in deep neural networks. IEEE Access 11, 7348–7357 (2023)
Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? arXiv:1805.12177 [cs] (2019). arXiv: 1805.12177
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–11572 (1999). https://doi.org/10.1109/ICCV.1999.790410
Michaeli, H., Michaeli, T., Soudry, D.: Alias-free convnets: fractional shift invariance via polynomial activations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16333–16342 (2023)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision-ECCV 2014. Lecture Notes in Computer Science, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016). https://doi.org/10.1109/CVPR.2016.350
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning | IEEE Conference Publication | IEEE Xplore
Kayhan, O.S., Gemert, J.C.v.: On translation invariance in cnns: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Publisher, Citeseer (2009)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 [cs] (2017). arXiv: 1704.04861
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61972097 and U21A20472, in part by the National Key Research and Development Plan of China under Grant 2021YFB3600503, in part by the Natural Science Foundation of Fujian Province under Grant 2021J01612 and 2020J01494, in part by the Major Science and Technology Project of Fujian Province under Grant 2021HZ022007, in part by the Industry-Academy Cooperation Project of Fujian Province under Grant 2018H6010, in part by the Fujian Collaborative Innovation Center for Big Data Application in Governments, and in part by the Fujian Engineering Research Center of Big Data Analysis and Processing. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ke, X., Lin, X. & Guo, W. S-pad: self-learning padding mechanism. Machine Vision and Applications 34, 102 (2023). https://doi.org/10.1007/s00138-023-01456-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01456-5