Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

He, Zhenwei; Zhang, Lei; Gao, Xinbo; Zhang, David

doi:10.1007/s11263-022-01728-z

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

Published: 11 December 2022

Volume 131, pages 680–700, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Zhenwei He¹,
Lei Zhang ORCID: orcid.org/0000-0002-5305-8543¹,
Xinbo Gao² &
…
David Zhang³

1066 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, the cross-domain object detection task has been raised by reducing the domain disparity and learning domain invariant features. Inspired by the image-level discrepancy dominated in object detection, we introduce a Multi-Adversarial Faster-RCNN (MAF). Our proposed MAF has two distinct contributions: (1) The Hierarchical Domain Feature Alignment (HDFA) module is introduced to minimize the image-level domain disparity, where Scale Reduction Module (SRM) reduces the feature map size without information loss and increases the training efficiency. (2) Aggregated Proposal Feature Alignment (APFA) module integrates the proposal feature and the detection results to enhance the semantic alignment, in which a weighted GRL (WGRL) layer highlights the hard-confused features rather than the easily-confused features. However, MAF only considers the domain disparity and neglects domain adaptability. As a result, the label-agnostic and inaccurate target distribution leads to the source error collapse, which is harmful to domain adaptation. Therefore, we further propose a Paradigm Teacher (PT) with knowledge distillation and formulated an extensive Paradigm Teacher MAF (PT-MAF), which has two new contributions: (1) The Paradigm Teacher (PT) overcomes source error collapse to improve the adaptability of the model. (2) The Dual-Discriminator HDFA (D\(^{2}\)-HDFA) improves the marginal distribution and achieves better alignment compared to HDFA. Extensive experiments on numerous benchmark datasets, including the Cityscapes, Foggy Cityscapes, Pascal VOC, Clipart, Watercolor, etc. demonstrate the superiority of our approach over SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006.). Analysis of representations for domain adaptation. NeurIPS, 19, 137–144.
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.
Article MathSciNet MATH Google Scholar
Cai, Q., Pan, Y., Ngo, C.-W., Tian, X., Duan, L., & Yao, T. (2019). Exploring object relation in mean teacher for cross-domain detection. In CVPR, pp. 11457–11466.
Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018.). Domain adaptive faster r-cnn for object detection in the wild. In CVPR, pp. 3339–3348.
Chen, C., Zheng, Z., Ding, X., Huang, Y., & Dou, Q. (2020). Harmonizing transferability and discriminability for adapting object detectors. In CVPR, pp. 8869–8878.
Cheng-Yang, F., Liu, W, Ranga, A, & Tyagi, A. (2017). & Alexander C Berg. Dssd: Deconvolutional single shot detector. arXiv.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR.
Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory, pp. 38–53. Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Deng, J., Li, W., Chen, Y., & Duan, L. (2021). Unbiased mean teacher for cross-domain object detection. In CVPR, pp. 4091–4101.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Ganin, Y., & Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. arXiv.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR.
Girshick, R. (2015). Fast r-cnn. Computer Science. ICCV, pp. 1440–1448.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pp. 580–587.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.
He, Z., & Zhang, L. (2019). Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6668–6677.
He, Z., & Zhang, L. (2020). Domain adaptive object detection via asymmetric tri-way faster-rcnn. In ECCV 2020, pp. 309–324. Springer.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV, pp. 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv.
Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pp. 5001–5009,
Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S. N., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In ICRA, pp. 746–753.
Khodabandeh, M., Vahdat, A., Ranjbar, M., & Macready, W. G. (2019). A robust learning approach to domain adaptive object detection. In ICCV, pp. 480–490.
Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In ICCV, pp. 6092–6101.
Kim, T., Jeong, M., Kim, S., Choi, S., & Kim, C. (2019). Diversify and match: A domain adaptive representation learning paradigm for object detection. In CVPR, pp. 12456–12465.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. In NIPS.
Krizhevsky, A., Sutskever, I., & Hinton, G E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS,
Li, Z., & Zhou, F. (2017). Fssd: feature fusion single shot multibox detector. arXiv.
Lian, Q., Lv, F., Duan, L., & Gong, B. (2019). Constructing self-motivated pyramid curriculums for cross-domain semantic segmentation: A non-adversarial approach. In ICCV, pp. 6758–6767.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR, pp. 2117–2125.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Alexander Berg, C. (2016). Ssd: Single shot multibox detector. In ECCV, pp. 21–37. Springer.
Liu, H., Long, M., Wang, J., & Jordan, M. (2019). Transferable adversarial training: A general approach to adapting deep classifiers. In ICML, pp. 4013–4022.
Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional adversarial domain adaptation. In NeurIPS, pp. 1640–1650.
Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In NeurIPS, pp. 136–144.
Nguyen, D.-K., Tseng, W.-L., & Shuai, H.-H. (2020). Domain-adaptive object detection via uncertainty-aware distribution alignment. In ACMM, pp. 2499–2507.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Piotr, D., Ron, A., Serge, B., & Pietro, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.
Article Google Scholar
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS.
Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In CVPR, pp. 6956–6965.
Sakaridis, C., Dai, D., & Van Gool, L. (2017). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 11, 1–20.
Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR, pp. 1919–1927.
Shen, Z., Maheshwari, H., Yao, W., & Savvides, M. (2019). Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv.
Simonyan, K. & Andrew, Z. (2014). Very deep convolutional networks for large-scale image recognition. arXiv.
Su, P., Wang, K., Zeng, X., Tang, S., Chen, D., Qiu, D., & Wang, X. (2020). Adapting object detectors with conditional domain normalization. arXiv.
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In ICCV, pp. 9627–9636.
Tommasi, T., Patricia, N., Caputo, B., & Tuytelaars, T. (2017). A deeper look at dataset bias. In Domain adaptation in computer vision applications, pp. 37–55. Springer.
Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2017). Simultaneous deep transfer across domains and tasks. In ICCV.
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In CVPR, pp. 7167–7176.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
MATH Google Scholar
Wang, S., & Zhang, L. (2018). Lstn: Latent subspace transfer network for unsupervised domain adaptation. In PRCV, pp. 273–284.
Wang, H., Shen, T., Zhang, W., Duan, L.-Y., & Mei, T. (2020). Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation. In European conference on computer vision, pp. 642–659. Springer.
Xu, M., Wang, H., Ni, B., Tian, Q., & Zhang, W. (2020). Cross-domain detection via graph-induced prototype alignment. In CVPR.
Xu, C.-D., Zhao, X.-R., Jin, X., & Wei, X.-S. (2020). Exploring categorical regularization for domain adaptive object detection. In CVPR, pp. 11724–11733.
Zhang, Y., David, P. & Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes. In ICCV.
Zhang, Y., David, P., Foroosh, H., & Gong, B. (2019). A curriculum domain adaptation approach to the semantic segmentation of urban scenes. IEEE TPAMI.
Zhang, W., Ouyang, W., Li, W., & Xu, D. (2018). Collaborative & adversarial network for unsupervised domain adaptation. In CVPR, pp. 3801–3809.
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In CVPR.
Zhang, L., Zuo, W., & Zhang, D. (2016). Lsdt: Latent sparse domain transfer learning for visual adaptation. IEEE Transactions on Image Processing, 25(3), 1177–1191.
Article MathSciNet MATH Google Scholar
Zhou, P., Ni, B., Geng, C., Hu, J., & Xu, Y. (2018). Scale-transferrable object detection. In CVPR, pp. 528–537, .
Zhu, X., Pang, J., Yang, C., Shi, J., & Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In CVPR.
Zou, Y., Yu, Z., Vijaya Kumar, B. V. K., & Wang, J. (2018). Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In ECCV, pp. 289–305.

Download references

Acknowledgements

This work was partially supported by National Key R &D Program of China (2021YFB3100800), National Natural Science Fund of China (62271090), Chongqing Natural Science Fund (cstc2021jcyj-jqX0023), CCF Hikvision Open Fund (CCF-HIKVISION OF 20210002), CAAI-Huawei MindSpore Open Fund, and Beijing Academy of Artificial Intelligence (BAAI).

Author information

Authors and Affiliations

School of Microelectronics and Communication Engineering, Chongqing University, Shazheng street No.174, Chongqing, 400044, China
Zhenwei He & Lei Zhang
Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Xinbo Gao
School of Science and Engineering, Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
David Zhang

Authors

Zhenwei He
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinbo Gao
View author publications
You can also search for this author in PubMed Google Scholar
David Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang.

Additional information

Communicated by Wanli Ouyang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, Z., Zhang, L., Gao, X. et al. Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection. Int J Comput Vis 131, 680–700 (2023). https://doi.org/10.1007/s11263-022-01728-z

Download citation

Received: 23 January 2022
Accepted: 30 November 2022
Published: 11 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11263-022-01728-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation