Skip to main content
Log in

Object detection based on semi-supervised domain adaptation for imbalanced domain resources

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

On specified scenarios, models trained on specific datasets (source domain) can generalize well to novel scenes (target domain) via knowledge transfer. However, these source detectors might not be perfectly aligned with a low target resource due to the imbalanced and inconsistent domain shift involved. In this paper, we propose a semi-supervised detector that adapts the domain shifts on both appearance and semantic levels. Based on this, two components are introduced as appearance adaptation networks with instance and batch normalization, and semantic adaptation networks where an adversarial transferring procedure is embedded by re-weighting the discriminator loss to improve the feature alignments between the two domains with imbalanced scales. Furthermore, a self-paced training procedure is performed to re-train the detector by alternately generating pseudo-labels in the target domain from easy to hard. In our experiments, an empirical analysis of the proposed framework is conducted by evaluating performance in various datasets such as Cityscapes and VOC0712, and the results verify the higher accuracy and effectiveness of the proposed detector in comparison with state-of-the-art detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)

  2. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  3. Girshick, R.B.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  4. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39, 1137–1149 (2015)

    Article  Google Scholar 

  5. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3213–3223 (2016)

  6. Sakaridis, C., Dai, D., Gool, L.V.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 126(9), 973–992 (2018)

    Article  Google Scholar 

  7. Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1081–1089 (2015)

  8. Song, H.O., Lee, Y.J., Jegelka, S., Darrell, T.: Weakly-supervised discovery of visual pattern configurations. In: Advances in Neural Information Processing Systems (NIPS), pp. 1637–1645 (2014)

  9. Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3059–3067 (2017)

  10. Li, D., Huang, J., Li, Y., Wang, S., Yang, M.: Weakly supervised object localization with progressive domain adaptation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3512–3520 (2016)

  11. Kantorov, V., Oquab, M., Cho, M., Laptev, I.: Contextlocnet: context-aware deep network models for weakly supervised localization. In: ECCV 2016—-14th European Conference on Computer Vision, Amsterdam, 11–14 October 2016, Proceedings, Part V, pp. 350–365 (2016)

  12. Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2846–2854 (2016)

  13. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 12–17 February 2016, Phoenix, Arizona, pp. 2058–2065 (2016)

  14. Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, 6–11 July 2015, pp. 97–105 (2015)

  15. Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, 6–11 August 2017, pp. 2208–2217 (2017)

  16. Long, M., Zhu, H., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with residual transfer networks. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5–10 December 2016, Barcelona, pp. 136–144 (2016)

  17. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance (2014). arXiv preprint arXiv:1412.3474

  18. Peng, X., Usman, B., Saito, K., Kaushik, N., Hoffman, J., Saenko, K.: Syn2real: a new benchmark for synthetic-to-real visual domain adaptation (2018). arXiv preprint arXiv:1806.09755

  19. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)

  20. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, 21–26 July 2017, pp. 2962–2971 (2017)

  21. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.S.: Domain-adversarial training of neural networks. In: Domain Adaptation in Computer Vision Applications., pp. 189–209 (2017)

  22. Pan, X., Luo, P., Shi, J., Tang, X.: Two at once: enhancing learning and generalization capacities via IBN-Net. In: ECCV 2018—15th European Conference on Computer Vision, Munich, 8–14 September 2018, Proceedings, Part IV, pp. 484–500 (2018)

  23. Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, 22–29 October 2017, pp. 2242–2251 (2017)

  24. Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, 18–22 June 2018, pp. 5001–5009 (2018)

  25. Wang, K., Yan, X., Zhang, D., Lin, L.: Towards human-machine cooperation: self-supervised sample mining for object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1605–1613 (2018)

  26. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: ECCV 2016—14th European Conference on Computer Vision, Amsterdam, 11–14 October 2016, Proceedings, Part I, pp. 21–37 (2016)

  27. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02767

  28. Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.): ECCV 2014–13th European Conference on Computer Vision, Zurich, 6–12 September, 2014, Proceedings, Part V. Lecture Notes in Computer Science, vol. 8693. Springer (2014)

  29. van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, 6–13 November 2011, pp. 1879–1886 (2011)

  30. Csurka, G.: A comprehensive survey on domain adaptation for visual applications. In: Domain Adaptation in Computer Vision Applications, pp. 1–35 (2017)

  31. Ganin, Y., Lempitsky, V.S.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, 6–11 July 2015, pp. 1180–1189 (2015)

  32. Chadha, A., Andreopoulos, Y.: Improving adversarial discriminative domain adaptation (2018). arXiv preprint arXiv:1809.03625

  33. Shu, R., Bui, H.H., Narui, H., Ermon, S.: A DIRT-T approach to unsupervised domain adaptation. In: International Conference on Learning Representations (ICLR) (2018)

  34. Romijnders, R., Meletis, P., Dubbelman, G.: A domain agnostic normalization layer for unsupervised adversarial domain adaptation. In: Winter Conference on Applications of Computer Vision (WACV), pp. 1866–1875 (2019)

  35. Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, 22–29 October 2017, pp. 2039–2049 (2017)

  36. Hoffman, J., Wang, D., Yu, F., Darrell, T.: Fcns in the wild: pixel-level adversarial and constraint-based adaptation (2016). arXiv preprint arXiv:1612.02649

  37. Sankaranarayanan, S., Balaji, Y., Jain, A., Lim, S., Chellappa, R.: Unsupervised domain adaptation for semantic segmentation with gans (2017). arXiv preprint arxiv:1711.06969

  38. Zhu, X., Zhou, H., Yang, C., Shi, J., Lin, D.: Penalizing top performers: conservative loss for semantic segmentation adaptation. In: ECCV 2018—15th European Conference on Computer Vision, Munich, 8–14 September 2018, Proceedings, Part VII, pp. 587–603 (2018)

  39. Zhang, Y., Qiu, Z., Yao, T., Liu, D., Mei, T.: Fully convolutional adaptation networks for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, 18–22 June 2018, pp. 6810–6818 (2018)

  40. Hung, W., Tsai, Y., Liou, Y., Lin, Y., Yang, M.: Adversarial learning for semi-supervised semantic segmentation. In: British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, 3–6 September 2018, p. 65 (2018)

  41. Tsai, Y., Hung, W., Schulter, S., Sohn, K., Yang, M., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, 18–22 June 2018, pp. 7472–7481 (2018)

  42. Hoffman, J., Tzeng, E., Park, T., Zhu, J., Isola, P., Saenko, K., Efros, A.A., Darrell, T.: Cycada: cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, 10–15 July 2018, pp. 1994–2003 (2018)

  43. Hoffman, J., Guadarrama, S., Tzeng, E., Donahue, J., Girshick, R., Darrell, T., Saenko, K.: LSDA: Large scale detection through adaptation. In: Advances in Neural Information Processing Systems (NIPS), vol. 4, pp. 3536–3544 (2014)

  44. Tang, Y., Wang, J., Wang, X., Gao, B., Dellandréa, E., Gaizauskas, R.J., Chen, L.: Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3045–3058 (2018). https://doi.org/10.1109/TPAMI.2017.2771779

    Article  Google Scholar 

  45. Shi, Z., Siva, P., Xiang, T.: Transfer learning by ranking for weakly supervised object annotation. In: British Machine Vision Conference, BMVC 2012, Surrey, 3–7 September 2012, pp. 1–11 (2012)

  46. Hoffman, J., Pathak, D., Darrell, T., Saenko, K.: Detector discovery in the wild: joint multiple instance and representation learning (2014). arXiv preprint arXiv:1412.1135

  47. Chen, Y., Li, W., Sakaridis, C., Dai, D., Gool, L.V.: Domain adaptive faster R-CNN for object detection in the wild. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, 18–22 June 2018, pp. 3339–3348 (2018)

  48. Liu, L., Lin, W., Wu, L., Yu, Y., Yang, M.Y.: Unsupervised deep domain adaptation for pedestrian detection. In: Computer Vision—ECCV 2016 Workshops, Amsterdam, 8–10 and 15–16 October 2016, Proceedings, Part II, pp. 676–691 (2016)

  49. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, pp. 91–99 (2015)

  50. Carlucci, F.M., Porzi, L., Caputo, B., Ricci, E., Bul, S.R.: AutoDIAL: automatic domain alignment layers. In: International Conference on Computer Vision, ICCV , Venice, 22–29 October, pp. 5077–5085 (2017)

  51. Li, Y., Wang, N., Shi, J., Hou, X., Liu, J.: Adaptive batch normalization for practical domain adaptation. Pattern Recognit. 80, 109–117 (2018)

    Article  Google Scholar 

  52. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, 6–11 July 2015, pp. 448–456 (2015)

  53. Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, 21–26 July 2017, pp. 4105–4113 (2017)

  54. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)

  55. Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1189–1197. Curran Associates, Inc. (2010)

  56. Dong, X., Zheng, L., Ma, F., Yang, Y., Meng, D.: Few-shot object detection (2017). arXiv preprint arXiv:1706.0824

  57. Ma, F., Meng, D., Xie, Q., Li, Z., Dong, X.: Self-paced co-training. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, 6–11 August 2017, pp. 2275–2284 (2017)

  58. Zhang, D., Meng, D., Zhao, L., Han, J.: Bridging saliency detection to weakly supervised object detection based on self-paced curriculum learning (2017). arXiv preprint arXiv:1703.01290

  59. Zou, Y., Yu, Z., Vijaya Kumar, B., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: The European Conference on Computer Vision (ECCV) (2018)

  60. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

  61. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, 22–29 October 2017, pp. 2999–3007 (2017)

  62. Tang, K.D., Ramanathan, V., Li, F., Koller, D.: Shifting weights: adapting object detectors from image to video. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, pp. 647–655 (2012)

  63. Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: learning bounds and algorithms. In: COLT 2009—The 22nd Conference on Learning Theory, Montreal, Quebec, 18–21 June 2009 (2009)

  64. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)

    Article  MathSciNet  Google Scholar 

  65. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, Vol. 30, pp. 180–191. VLDB Endowment (2004)

  66. Odena, A.: Semi-supervised learning with generative adversarial networks (2016). arXiv preprint arXiv:1606.01583

  67. Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  68. Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1–6 (2015)

  69. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: The European Conference on Computer Vision (ECCV), pp. 340–353 (2012)

Download references

Acknowledgements

This research was funded by National Natural Science Foundation of China (61563025, 61562053, 61762056), Yunnan Science and Technology Department of Science and Technology Project (2016FB109, 2017FB094) and Scientific Research Foundation of Yunnan Education Department (2017ZZX149).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Wang, M., Wang, H. et al. Object detection based on semi-supervised domain adaptation for imbalanced domain resources. Machine Vision and Applications 31, 18 (2020). https://doi.org/10.1007/s00138-020-01068-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-020-01068-3

Keywords

Navigation