Skip to main content
Log in

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recently, the cross-domain object detection task has been raised by reducing the domain disparity and learning domain invariant features. Inspired by the image-level discrepancy dominated in object detection, we introduce a Multi-Adversarial Faster-RCNN (MAF). Our proposed MAF has two distinct contributions: (1) The Hierarchical Domain Feature Alignment (HDFA) module is introduced to minimize the image-level domain disparity, where Scale Reduction Module (SRM) reduces the feature map size without information loss and increases the training efficiency. (2) Aggregated Proposal Feature Alignment (APFA) module integrates the proposal feature and the detection results to enhance the semantic alignment, in which a weighted GRL (WGRL) layer highlights the hard-confused features rather than the easily-confused features. However, MAF only considers the domain disparity and neglects domain adaptability. As a result, the label-agnostic and inaccurate target distribution leads to the source error collapse, which is harmful to domain adaptation. Therefore, we further propose a Paradigm Teacher (PT) with knowledge distillation and formulated an extensive Paradigm Teacher MAF (PT-MAF), which has two new contributions: (1) The Paradigm Teacher (PT) overcomes source error collapse to improve the adaptability of the model. (2) The Dual-Discriminator HDFA (D\(^{2}\)-HDFA) improves the marginal distribution and achieves better alignment compared to HDFA. Extensive experiments on numerous benchmark datasets, including the Cityscapes, Foggy Cityscapes, Pascal VOC, Clipart, Watercolor, etc. demonstrate the superiority of our approach over SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006.). Analysis of representations for domain adaptation. NeurIPS, 19, 137–144.

  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, Q., Pan, Y., Ngo, C.-W., Tian, X., Duan, L., & Yao, T. (2019). Exploring object relation in mean teacher for cross-domain detection. In CVPR, pp. 11457–11466.

  • Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018.). Domain adaptive faster r-cnn for object detection in the wild. In CVPR, pp. 3339–3348.

  • Chen, C., Zheng, Z., Ding, X., Huang, Y., & Dou, Q. (2020). Harmonizing transferability and discriminability for adapting object detectors. In CVPR, pp. 8869–8878.

  • Cheng-Yang, F., Liu, W, Ranga, A, & Tyagi, A. (2017). & Alexander C Berg. Dssd: Deconvolutional single shot detector. arXiv.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR.

  • Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory, pp. 38–53. Springer.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.

  • Deng, J., Li, W., Chen, Y., & Duan, L. (2021). Unbiased mean teacher for cross-domain object detection. In CVPR, pp. 4091–4101.

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Ganin, Y., & Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. arXiv.

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR.

  • Girshick, R. (2015). Fast r-cnn. Computer Science. ICCV, pp. 1440–1448.

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pp. 580–587.

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.

  • He, Z., & Zhang, L. (2019). Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6668–6677.

  • He, Z., & Zhang, L. (2020). Domain adaptive object detection via asymmetric tri-way faster-rcnn. In ECCV 2020, pp. 309–324. Springer.

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV, pp. 2961–2969.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv.

  • Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pp. 5001–5009,

  • Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S. N., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In ICRA, pp. 746–753.

  • Khodabandeh, M., Vahdat, A., Ranjbar, M., & Macready, W. G. (2019). A robust learning approach to domain adaptive object detection. In ICCV, pp. 480–490.

  • Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In ICCV, pp. 6092–6101.

  • Kim, T., Jeong, M., Kim, S., Choi, S., & Kim, C. (2019). Diversify and match: A domain adaptive representation learning paradigm for object detection. In CVPR, pp. 12456–12465.

  • Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. In NIPS.

  • Krizhevsky, A., Sutskever, I., & Hinton, G E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS,

  • Li, Z., & Zhou, F. (2017). Fssd: feature fusion single shot multibox detector. arXiv.

  • Lian, Q., Lv, F., Duan, L., & Gong, B. (2019). Constructing self-motivated pyramid curriculums for cross-domain semantic segmentation: A non-adversarial approach. In ICCV, pp. 6758–6767.

  • Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR, pp. 2117–2125.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Alexander Berg, C. (2016). Ssd: Single shot multibox detector. In ECCV, pp. 21–37. Springer.

  • Liu, H., Long, M., Wang, J., & Jordan, M. (2019). Transferable adversarial training: A general approach to adapting deep classifiers. In ICML, pp. 4013–4022.

  • Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional adversarial domain adaptation. In NeurIPS, pp. 1640–1650.

  • Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In NeurIPS, pp. 136–144.

  • Nguyen, D.-K., Tseng, W.-L., & Shuai, H.-H. (2020). Domain-adaptive object detection via uncertainty-aware distribution alignment. In ACMM, pp. 2499–2507.

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

    Article  Google Scholar 

  • Piotr, D., Ron, A., Serge, B., & Pietro, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.

    Article  Google Scholar 

  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS.

  • Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In CVPR, pp. 6956–6965.

  • Sakaridis, C., Dai, D., & Van Gool, L. (2017). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 11, 1–20.

  • Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR, pp. 1919–1927.

  • Shen, Z., Maheshwari, H., Yao, W., & Savvides, M. (2019). Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv.

  • Simonyan, K. & Andrew, Z. (2014). Very deep convolutional networks for large-scale image recognition. arXiv.

  • Su, P., Wang, K., Zeng, X., Tang, S., Chen, D., Qiu, D., & Wang, X. (2020). Adapting object detectors with conditional domain normalization. arXiv.

  • Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In ICCV, pp. 9627–9636.

  • Tommasi, T., Patricia, N., Caputo, B., & Tuytelaars, T. (2017). A deeper look at dataset bias. In Domain adaptation in computer vision applications, pp. 37–55. Springer.

  • Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2017). Simultaneous deep transfer across domains and tasks. In ICCV.

  • Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In CVPR, pp. 7167–7176.

  • van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  • Wang, S., & Zhang, L. (2018). Lstn: Latent subspace transfer network for unsupervised domain adaptation. In PRCV, pp. 273–284.

  • Wang, H., Shen, T., Zhang, W., Duan, L.-Y., & Mei, T. (2020). Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation. In European conference on computer vision, pp. 642–659. Springer.

  • Xu, M., Wang, H., Ni, B., Tian, Q., & Zhang, W. (2020). Cross-domain detection via graph-induced prototype alignment. In CVPR.

  • Xu, C.-D., Zhao, X.-R., Jin, X., & Wei, X.-S. (2020). Exploring categorical regularization for domain adaptive object detection. In CVPR, pp. 11724–11733.

  • Zhang, Y., David, P. & Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes. In ICCV.

  • Zhang, Y., David, P., Foroosh, H., & Gong, B. (2019). A curriculum domain adaptation approach to the semantic segmentation of urban scenes. IEEE TPAMI.

  • Zhang, W., Ouyang, W., Li, W., & Xu, D. (2018). Collaborative & adversarial network for unsupervised domain adaptation. In CVPR, pp. 3801–3809.

  • Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In CVPR.

  • Zhang, L., Zuo, W., & Zhang, D. (2016). Lsdt: Latent sparse domain transfer learning for visual adaptation. IEEE Transactions on Image Processing, 25(3), 1177–1191.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, P., Ni, B., Geng, C., Hu, J., & Xu, Y. (2018). Scale-transferrable object detection. In CVPR, pp. 528–537, .

  • Zhu, X., Pang, J., Yang, C., Shi, J., & Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In CVPR.

  • Zou, Y., Yu, Z., Vijaya Kumar, B. V. K., & Wang, J. (2018). Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In ECCV, pp. 289–305.

Download references

Acknowledgements

This work was partially supported by National Key R &D Program of China (2021YFB3100800), National Natural Science Fund of China (62271090), Chongqing Natural Science Fund (cstc2021jcyj-jqX0023), CCF Hikvision Open Fund (CCF-HIKVISION OF 20210002), CAAI-Huawei MindSpore Open Fund, and Beijing Academy of Artificial Intelligence (BAAI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang.

Additional information

Communicated by Wanli Ouyang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Z., Zhang, L., Gao, X. et al. Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection. Int J Comput Vis 131, 680–700 (2023). https://doi.org/10.1007/s11263-022-01728-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01728-z

Keywords

Navigation