Skip to main content
Log in

A novel feature-based model for zero-shot object detection with simulated attributes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Zero-shot object detection (ZSD) has recently been proposed for detecting objects whose categories have never been seen during training. Existing ZSD works have some drawbacks: (a) the end-to-end methods sacrifice the mean accuracy precision (mAP) on seen classes; (b) the feature-based methods could avoid the above problem but suffer from simple feature construction. Thus, in this paper, we present a succinct but effective feature-based ZSD model whose feature construction naturally leverages the deep feature embedding of the detector itself as the visual features of the detected objects. The features we utilize, named “Detection Feature” (DetFeat), contain not only visual representations but also context and position information, which provide more discriminative information for seen and unseen objects. Additionally, we simulate the construction of the attributes defined by human experts to generate the specific label embedding for the ZSD task, named “Simulated Attributes” (Simu-Attr). We find that Simu-attr promotes better alignment between visual and semantic space for alleviating the problem of the semantic gap. Extensive experiments show that our approach improves the detection performance on unseen classes while maintaining the high detection performance on seen classes. On the challenging COCO dataset, we surpass the best existing transductive ZSD TL-ZSD with about 1% on unseen class and about 10% on seen class using mAP as metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 384–400

  2. Chen H, Luo Y, Cao L, Zhang B, Guo G, Wang C, Li J, Ji R (2019) Generalized zero-shot vehicle detection in remote sensing imagery via coarse-to-fine framework. In: Proceedings of the 28th International joint conference on artificial intelligence. AAAI Press, pp 687–693

  3. Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, Zhang Z, Cheng D, Zhu C, Cheng T, Zhao Q, Li B, Lu X, Zhu R, Wu Y, Dai J, Wang J, Shi J, Ouyang W, Loy CC, Lin D (2019) MMDetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155

  4. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764– 773

  5. Demirel B, Cinbis RG, Ikizler-Cinbis N (2018) Zero-shot object detection by hybrid region embedding

  6. Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: A deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129

  7. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440– 1448

  8. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  9. Li Z, Yao L, Zhang X, Wang X, Kanhere S, Zhang H (2019) Zero-shot object detection with textual descriptions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8690–8697

  10. Lin TY, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  11. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  12. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  13. Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2013) Zero-shot learning by convex combination of semantic embeddings. arXiv:1312.5650

  14. Palatucci M, Pomerleau D, Hinton GE, Mitchell TM (2009) Zero-shot learning with semantic output codes. In: Advances in neural information processing systems, pp 1410–1418

  15. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  16. Rahman S, Khan S, Barnes N (2018) Polarity loss for zero-shot object detection. arXiv:1811.08982

  17. Rahman S, Khan S, Barnes N (2019) Transductive learning for zero-shot object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6082–6091

  18. Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In: Asian conference on computer vision. Springer, pp 547–563

  19. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  20. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  21. Ren S, He K, Girshick R, Zhang X, Sun J (2016) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481

    Article  Google Scholar 

  22. Zhang L, Wang X, Yao L, Wu L, Zheng F (2020) Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence {IJCAI-PRICAI-20}. international joint conferences on artificial intelligence organization

  23. Zhu P, Wang H, Saligrama V (2019) Zero shot detection. IEEE Transactions on Circuits and Systems for Video Technology

  24. Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China under Grant 2019YFC0118200, National Natural Science Foundation of China under Grant 6180332.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuxing Wang or Hong Zhou.

Ethics declarations

Conflict of Interests

We have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Wu, W., Wang, Y. et al. A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52, 6905–6914 (2022). https://doi.org/10.1007/s10489-021-02746-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02746-z

Keywords

Navigation