Robustizing Object Detection Networks Using Augmented Feature Pooling

Shibata, Takashi; Tanaka, Masayuki; Okutomi, Masatoshi

doi:10.1007/978-3-031-26348-4_6

Takashi Shibata¹²,
Masayuki Tanaka¹³ &
Masatoshi Okutomi¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13845))

Included in the following conference series:

Asian Conference on Computer Vision

Abstract

This paper presents a framework to robustize object detection networks against large geometric transformation. Deep neural networks rapidly and dramatically have improved object detection performance. Nevertheless, modern detection algorithms are still sensitive to large geometric transformation. Aiming at improving the robustness of the modern detection algorithms against the large geometric transformation, we propose a new feature extraction called augmented feature pooling. The key is to integrate the augmented feature maps obtained from the transformed images before feeding it to the detection head without changing the original network architecture. In this paper, we focus on rotation as a simple-yet-influential case of geometric transformation, while our framework is applicable to any geometric transformations. It is noteworthy that, with only adding a few lines of code from the original implementation of the modern object detection algorithms and applying simple fine-tuning, we can improve the rotation robustness of these original detection algorithms while inheriting modern network architectures’ strengths. Our framework overwhelmingly outperforms typical geometric data augmentation and its variants used to improve robustness against appearance changes due to rotation. We construct a dataset based on MS COCO to evaluate the robustness of the rotation, called COCO-Rot. Extensive experiments on three datasets, including our COCO-Rot, demonstrate that our method can improve the rotation robustness of state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our code of will be available at http://www.ok.sc.e.titech.ac.jp/res/DL/index.html.
2.
Note that the TTA curve assumes that each inference before ensemble is ideal, and thus this occupancy is the upper bound.
3.
The dimensions of feature map \({{\mathbf{{x}}}}^{l}\) are the same as the original backbones.
4.
The details of our dataset are described in our supplemental.
5.
As shown in our supplemental, AP\(_{50}\) and AP\(_{75}\) are also the highest in the proposed method as well as mAP.
6.
Note that, in PASCAL VOC, the standard evaluation metric is AP\(_{50}\).
7.
We also show AP\(_{50}\) and AP\(_{75}\) in our supplementary material.

References

Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283 (2016)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2019). https://doi.org/10.1109/tpami.2019.2956516
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, K., et al.: Hybrid task cascade for instance segmentation. In: IEEE Conference Computer on Vision Pattern Recognition (CVPR) (2019)
Google Scholar
Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2021)
Google Scholar
Cheng, G., Han, J., Zhou, P., Xu, D.: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans. Image Process. 28(1), 265–278 (2018)
Article MathSciNet MATH Google Scholar
Cheng, G., Zhou, P., Han, J.: RIFD-CNN: rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2884–2893 (2016)
Google Scholar
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical automated data augmentation with a reduced search space. In: IEEE Conference on Computer Vision Pattern Recognition Workshop (CVPRW) (2020)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: International Conference on Computer Vision (ICCV), pp. 764–773 (2017)
Google Scholar
Dai, Z., Cai, B., Lin, Y., Chen, J.: Up-detr: unsupervised pre-training for object detection with transformers. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 1601–1610 (2021)
Google Scholar
Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for oriented object detection in aerial images. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2849–2858 (2019)
Google Scholar
Ding, J., et al.: Object detection in aerial images: a large-scale benchmark and challenges (2021)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)
Article Google Scholar
Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 43, 652–662 (2019)
Article Google Scholar
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 7036–7045 (2019)
Google Scholar
Han, J., Ding, J., Xue, N., Xia, G.S.: Redet: a rotation-equivariant detector for aerial object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2016)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inform. Process. Syst. (NeurIPS) 28, 1–9 (2015)
Google Scholar
Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 4201–4209 (2017)
Google Scholar
Kalra, A., Stoppi, G., Brown, B., Agarwal, R., Kadambi, A.: Towards rotation invariance in object detection. In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Laptev, D., Savinov, N., Buhmann, J.M., Pollefeys, M.: Ti-pooling: transformation-invariant pooling for feature learning in convolutional neural networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2016)
Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: International Conference on Machine Learning (ICML) (2007)
Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: European Conference Computer Vision (ECCV), pp. 765–781. Springer, Heidelberg (2018)
Google Scholar
Li, X.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inform. Process. Syst. (NeurIPS) 33, 21002–21012 (2020)
Google Scholar
Lim, S., Kim, I., Kim, T., Kim, C., Kim, S.: Fast autoaugment. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 1–11 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11653–11660 (2020)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Pang, J., et al.: Towards balanced learning for instance recognition. Int. J. Comput. Vis. (IJCV) 129(5), 1376–1393 (2021)
Article Google Scholar
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: towards balanced learning for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 8026–8037 (2019)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. (NeurIPS) 28, 1–9 (2015)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. arXiv preprint arXiv:1904.01355 (2019)
Vu, T., Jang, H., Pham, T.X., Yoo, C.D.: Cascade rpn: delving into high-quality region proposal network with adaptive convolution. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 1–11 (2019)
Google Scholar
Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: International Conference on Computer Vision (ICCV), pp. 379–387 (2017)
Google Scholar
Weng, X., Wu, S., Beainy, F., Kitani, K.M.: Rotational rectification network: enabling pedestrian detection for mobile vision. In: Winter Conference on Applications of Computer Vision (WACV), pp. 1084–1092. IEEE (2018)
Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2017)
Google Scholar
Wu, Y., et al.: Rethinking classification and localization for object detection. arXiv (2019)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xia, G.S., et al.: Dota: a large-scale dataset for object detection in aerial images. In: The IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2018)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Google Scholar
Xu, W., Wang, G., Sullivan, A., Zhang, Z.: Towards learning affine-invariant representations via data-efficient cnns. In: Winter Conference on Applications of Computer Vision (WACV) (2020)
Google Scholar
Yang, S., Pei, Z., Zhou, F., Wang, G.: Rotated faster r-cnn for oriented object detection in aerial images. In: Proceedings of ICRSA (2020)
Google Scholar
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. arXiv preprint arXiv:1912.02424 (2019)
Zhang, Z., Jiang, R., Mei, S., Zhang, S., Zhang, Y.: Rotation-invariant feature learning for object detection in vhr optical remote sensing images by double-net. IEEE Access 8, 20818–20827 (2019)
Article Google Scholar
Zhang, Z., Chen, X., Liu, J., Zhou, K.: Rotated feature network for multi-orientation object detection. arXiv preprint arXiv:1903.09839 (2019)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Oriented response networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2017)
Google Scholar
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 840–849 (2019)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: International Conference on Learning Representation (2021). https://openreview.net/forum?id=gZ9hCDWe6ke

Download references

Author information

Authors and Affiliations

NTT Corporation, Kanagawa, Japan
Takashi Shibata
Tokyo Institute of Technology, Tokyo, Japan
Masayuki Tanaka & Masatoshi Okutomi

Authors

Takashi Shibata
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Masatoshi Okutomi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takashi Shibata .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6315 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shibata, T., Tanaka, M., Okutomi, M. (2023). Robustizing Object Detection Networks Using Augmented Feature Pooling. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-26348-4_6
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics