Abstract
Recently, the cross-domain object detection task has been raised by reducing the domain disparity and learning domain invariant features. Inspired by the image-level discrepancy dominated in object detection, we introduce a Multi-Adversarial Faster-RCNN (MAF). Our proposed MAF has two distinct contributions: (1) The Hierarchical Domain Feature Alignment (HDFA) module is introduced to minimize the image-level domain disparity, where Scale Reduction Module (SRM) reduces the feature map size without information loss and increases the training efficiency. (2) Aggregated Proposal Feature Alignment (APFA) module integrates the proposal feature and the detection results to enhance the semantic alignment, in which a weighted GRL (WGRL) layer highlights the hard-confused features rather than the easily-confused features. However, MAF only considers the domain disparity and neglects domain adaptability. As a result, the label-agnostic and inaccurate target distribution leads to the source error collapse, which is harmful to domain adaptation. Therefore, we further propose a Paradigm Teacher (PT) with knowledge distillation and formulated an extensive Paradigm Teacher MAF (PT-MAF), which has two new contributions: (1) The Paradigm Teacher (PT) overcomes source error collapse to improve the adaptability of the model. (2) The Dual-Discriminator HDFA (D\(^{2}\)-HDFA) improves the marginal distribution and achieves better alignment compared to HDFA. Extensive experiments on numerous benchmark datasets, including the Cityscapes, Foggy Cityscapes, Pascal VOC, Clipart, Watercolor, etc. demonstrate the superiority of our approach over SOTA methods.
Similar content being viewed by others
References
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006.). Analysis of representations for domain adaptation. NeurIPS, 19, 137–144.
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.
Cai, Q., Pan, Y., Ngo, C.-W., Tian, X., Duan, L., & Yao, T. (2019). Exploring object relation in mean teacher for cross-domain detection. In CVPR, pp. 11457–11466.
Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018.). Domain adaptive faster r-cnn for object detection in the wild. In CVPR, pp. 3339–3348.
Chen, C., Zheng, Z., Ding, X., Huang, Y., & Dou, Q. (2020). Harmonizing transferability and discriminability for adapting object detectors. In CVPR, pp. 8869–8878.
Cheng-Yang, F., Liu, W, Ranga, A, & Tyagi, A. (2017). & Alexander C Berg. Dssd: Deconvolutional single shot detector. arXiv.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR.
Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory, pp. 38–53. Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Deng, J., Li, W., Chen, Y., & Duan, L. (2021). Unbiased mean teacher for cross-domain object detection. In CVPR, pp. 4091–4101.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Ganin, Y., & Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. arXiv.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR.
Girshick, R. (2015). Fast r-cnn. Computer Science. ICCV, pp. 1440–1448.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pp. 580–587.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.
He, Z., & Zhang, L. (2019). Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6668–6677.
He, Z., & Zhang, L. (2020). Domain adaptive object detection via asymmetric tri-way faster-rcnn. In ECCV 2020, pp. 309–324. Springer.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV, pp. 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv.
Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pp. 5001–5009,
Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S. N., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In ICRA, pp. 746–753.
Khodabandeh, M., Vahdat, A., Ranjbar, M., & Macready, W. G. (2019). A robust learning approach to domain adaptive object detection. In ICCV, pp. 480–490.
Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In ICCV, pp. 6092–6101.
Kim, T., Jeong, M., Kim, S., Choi, S., & Kim, C. (2019). Diversify and match: A domain adaptive representation learning paradigm for object detection. In CVPR, pp. 12456–12465.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. In NIPS.
Krizhevsky, A., Sutskever, I., & Hinton, G E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS,
Li, Z., & Zhou, F. (2017). Fssd: feature fusion single shot multibox detector. arXiv.
Lian, Q., Lv, F., Duan, L., & Gong, B. (2019). Constructing self-motivated pyramid curriculums for cross-domain semantic segmentation: A non-adversarial approach. In ICCV, pp. 6758–6767.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR, pp. 2117–2125.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Alexander Berg, C. (2016). Ssd: Single shot multibox detector. In ECCV, pp. 21–37. Springer.
Liu, H., Long, M., Wang, J., & Jordan, M. (2019). Transferable adversarial training: A general approach to adapting deep classifiers. In ICML, pp. 4013–4022.
Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional adversarial domain adaptation. In NeurIPS, pp. 1640–1650.
Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In NeurIPS, pp. 136–144.
Nguyen, D.-K., Tseng, W.-L., & Shuai, H.-H. (2020). Domain-adaptive object detection via uncertainty-aware distribution alignment. In ACMM, pp. 2499–2507.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Piotr, D., Ron, A., Serge, B., & Pietro, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS.
Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In CVPR, pp. 6956–6965.
Sakaridis, C., Dai, D., & Van Gool, L. (2017). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 11, 1–20.
Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR, pp. 1919–1927.
Shen, Z., Maheshwari, H., Yao, W., & Savvides, M. (2019). Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv.
Simonyan, K. & Andrew, Z. (2014). Very deep convolutional networks for large-scale image recognition. arXiv.
Su, P., Wang, K., Zeng, X., Tang, S., Chen, D., Qiu, D., & Wang, X. (2020). Adapting object detectors with conditional domain normalization. arXiv.
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In ICCV, pp. 9627–9636.
Tommasi, T., Patricia, N., Caputo, B., & Tuytelaars, T. (2017). A deeper look at dataset bias. In Domain adaptation in computer vision applications, pp. 37–55. Springer.
Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2017). Simultaneous deep transfer across domains and tasks. In ICCV.
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In CVPR, pp. 7167–7176.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
Wang, S., & Zhang, L. (2018). Lstn: Latent subspace transfer network for unsupervised domain adaptation. In PRCV, pp. 273–284.
Wang, H., Shen, T., Zhang, W., Duan, L.-Y., & Mei, T. (2020). Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation. In European conference on computer vision, pp. 642–659. Springer.
Xu, M., Wang, H., Ni, B., Tian, Q., & Zhang, W. (2020). Cross-domain detection via graph-induced prototype alignment. In CVPR.
Xu, C.-D., Zhao, X.-R., Jin, X., & Wei, X.-S. (2020). Exploring categorical regularization for domain adaptive object detection. In CVPR, pp. 11724–11733.
Zhang, Y., David, P. & Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes. In ICCV.
Zhang, Y., David, P., Foroosh, H., & Gong, B. (2019). A curriculum domain adaptation approach to the semantic segmentation of urban scenes. IEEE TPAMI.
Zhang, W., Ouyang, W., Li, W., & Xu, D. (2018). Collaborative & adversarial network for unsupervised domain adaptation. In CVPR, pp. 3801–3809.
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In CVPR.
Zhang, L., Zuo, W., & Zhang, D. (2016). Lsdt: Latent sparse domain transfer learning for visual adaptation. IEEE Transactions on Image Processing, 25(3), 1177–1191.
Zhou, P., Ni, B., Geng, C., Hu, J., & Xu, Y. (2018). Scale-transferrable object detection. In CVPR, pp. 528–537, .
Zhu, X., Pang, J., Yang, C., Shi, J., & Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In CVPR.
Zou, Y., Yu, Z., Vijaya Kumar, B. V. K., & Wang, J. (2018). Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In ECCV, pp. 289–305.
Acknowledgements
This work was partially supported by National Key R &D Program of China (2021YFB3100800), National Natural Science Fund of China (62271090), Chongqing Natural Science Fund (cstc2021jcyj-jqX0023), CCF Hikvision Open Fund (CCF-HIKVISION OF 20210002), CAAI-Huawei MindSpore Open Fund, and Beijing Academy of Artificial Intelligence (BAAI).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Wanli Ouyang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, Z., Zhang, L., Gao, X. et al. Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection. Int J Comput Vis 131, 680–700 (2023). https://doi.org/10.1007/s11263-022-01728-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01728-z