Abstract
This paper presents a framework to robustize object detection networks against large geometric transformation. Deep neural networks rapidly and dramatically have improved object detection performance. Nevertheless, modern detection algorithms are still sensitive to large geometric transformation. Aiming at improving the robustness of the modern detection algorithms against the large geometric transformation, we propose a new feature extraction called augmented feature pooling. The key is to integrate the augmented feature maps obtained from the transformed images before feeding it to the detection head without changing the original network architecture. In this paper, we focus on rotation as a simple-yet-influential case of geometric transformation, while our framework is applicable to any geometric transformations. It is noteworthy that, with only adding a few lines of code from the original implementation of the modern object detection algorithms and applying simple fine-tuning, we can improve the rotation robustness of these original detection algorithms while inheriting modern network architectures’ strengths. Our framework overwhelmingly outperforms typical geometric data augmentation and its variants used to improve robustness against appearance changes due to rotation. We construct a dataset based on MS COCO to evaluate the robustness of the rotation, called COCO-Rot. Extensive experiments on three datasets, including our COCO-Rot, demonstrate that our method can improve the rotation robustness of state-of-the-art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our code of will be available at http://www.ok.sc.e.titech.ac.jp/res/DL/index.html.
- 2.
Note that the TTA curve assumes that each inference before ensemble is ideal, and thus this occupancy is the upper bound.
- 3.
The dimensions of feature map \({{\mathbf{{x}}}}^{l}\) are the same as the original backbones.
- 4.
The details of our dataset are described in our supplemental.
- 5.
As shown in our supplemental, AP\(_{50}\) and AP\(_{75}\) are also the highest in the proposed method as well as mAP.
- 6.
Note that, in PASCAL VOC, the standard evaluation metric is AP\(_{50}\).
- 7.
We also show AP\(_{50}\) and AP\(_{75}\) in our supplementary material.
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283 (2016)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2019). https://doi.org/10.1109/tpami.2019.2956516
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, K., et al.: Hybrid task cascade for instance segmentation. In: IEEE Conference Computer on Vision Pattern Recognition (CVPR) (2019)
Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2021)
Cheng, G., Han, J., Zhou, P., Xu, D.: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans. Image Process. 28(1), 265–278 (2018)
Cheng, G., Zhou, P., Han, J.: RIFD-CNN: rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2884–2893 (2016)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (ICML) (2016)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical automated data augmentation with a reduced search space. In: IEEE Conference on Computer Vision Pattern Recognition Workshop (CVPRW) (2020)
Dai, J., et al.: Deformable convolutional networks. In: International Conference on Computer Vision (ICCV), pp. 764–773 (2017)
Dai, Z., Cai, B., Lin, Y., Chen, J.: Up-detr: unsupervised pre-training for object detection with transformers. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 1601–1610 (2021)
Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for oriented object detection in aerial images. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2849–2858 (2019)
Ding, J., et al.: Object detection in aerial images: a large-scale benchmark and challenges (2021)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)
Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 43, 652–662 (2019)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 7036–7045 (2019)
Han, J., Ding, J., Xue, N., Xia, G.S.: Redet: a rotation-equivariant detector for aerial object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2016)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inform. Process. Syst. (NeurIPS) 28, 1–9 (2015)
Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 4201–4209 (2017)
Kalra, A., Stoppi, G., Brown, B., Agarwal, R., Kadambi, A.: Towards rotation invariance in object detection. In: International Conference on Computer Vision (ICCV) (2021)
Laptev, D., Savinov, N., Buhmann, J.M., Pollefeys, M.: Ti-pooling: transformation-invariant pooling for feature learning in convolutional neural networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2016)
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: International Conference on Machine Learning (ICML) (2007)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: European Conference Computer Vision (ECCV), pp. 765–781. Springer, Heidelberg (2018)
Li, X.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inform. Process. Syst. (NeurIPS) 33, 21002–21012 (2020)
Lim, S., Kim, I., Kim, T., Kim, C., Kim, S.: Fast autoaugment. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 1–11 (2019)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11653–11660 (2020)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) (2021)
Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: International Conference on Computer Vision (ICCV) (2017)
Pang, J., et al.: Towards balanced learning for instance recognition. Int. J. Comput. Vis. (IJCV) 129(5), 1376–1393 (2021)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: towards balanced learning for object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 8026–8037 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. (NeurIPS) 28, 1–9 (2015)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. arXiv preprint arXiv:1904.01355 (2019)
Vu, T., Jang, H., Pham, T.X., Yoo, C.D.: Cascade rpn: delving into high-quality region proposal network with adaptive convolution. Adv. Neural Inform. Process. Syst. (NeurIPS) 32, 1–11 (2019)
Wang, Z., et al.: Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: International Conference on Computer Vision (ICCV), pp. 379–387 (2017)
Weng, X., Wu, S., Beainy, F., Kitani, K.M.: Rotational rectification network: enabling pedestrian detection for mobile vision. In: Winter Conference on Applications of Computer Vision (WACV), pp. 1084–1092. IEEE (2018)
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2017)
Wu, Y., et al.: Rethinking classification and localization for object detection. arXiv (2019)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xia, G.S., et al.: Dota: a large-scale dataset for object detection in aerial images. In: The IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Xu, W., Wang, G., Sullivan, A., Zhang, Z.: Towards learning affine-invariant representations via data-efficient cnns. In: Winter Conference on Applications of Computer Vision (WACV) (2020)
Yang, S., Pei, Z., Zhou, F., Wang, G.: Rotated faster r-cnn for oriented object detection in aerial images. In: Proceedings of ICRSA (2020)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. arXiv preprint arXiv:1912.02424 (2019)
Zhang, Z., Jiang, R., Mei, S., Zhang, S., Zhang, Y.: Rotation-invariant feature learning for object detection in vhr optical remote sensing images by double-net. IEEE Access 8, 20818–20827 (2019)
Zhang, Z., Chen, X., Liu, J., Zhou, K.: Rotated feature network for multi-orientation object detection. arXiv preprint arXiv:1903.09839 (2019)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Oriented response networks. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2017)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 840–849 (2019)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: International Conference on Learning Representation (2021). https://openreview.net/forum?id=gZ9hCDWe6ke
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shibata, T., Tanaka, M., Okutomi, M. (2023). Robustizing Object Detection Networks Using Augmented Feature Pooling. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-26348-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)