Skip to main content

Relation Networks for Few-Shot Video Object Detection

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

  • 744 Accesses

Abstract

This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FSVOD500 also contains a validation set with 770 annotated videos with 80 object categories. We do not use this set in the experimentation.

References

  1. Bosquet, B., Cores, D., Seidenari, L., Brea, V.M., Mucientes, M., Bimbo, A.D.: A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 133, 108998 (2022)

    Google Scholar 

  2. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  3. Cao, Y., Wang, J., Lin, Y., Lin, D.: Mini: Mining implicit novel instances for few-shot object detection. arXiv preprint arXiv:2205.03381 (2022)

  4. Chen, C., et al.: RRNet: a hybrid detector for object detection in drone-captured images. In: IEEE International Conference on Computer Vision Workshops (ICCV) (2019)

    Google Scholar 

  5. Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE Trans. Multimed. 25, 291–301 (2021)

    Google Scholar 

  6. Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10337–10346 (2020)

    Google Scholar 

  7. Cores, D., Brea, V.M., Mucientes, M.: Short-term anchor linking and long-term self-guided attention for video object detection. Image Vis. Comput. 110, 104179 (2021)

    Article  Google Scholar 

  8. Cores, D., Brea, V.M., Mucientes, M.: Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos. Appl. Intell. 53, 1205–1217 (2023)

    Google Scholar 

  9. Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., Mei, T.: Relation distillation networks for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 7023–7032 (2019)

    Google Scholar 

  10. Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot video object detection. In: European Conference on Computer Vision (ECCV) (2022)

    Google Scholar 

  11. Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-RPN and multi-relation detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)

    Google Scholar 

  12. Guo, C., et al.: Progressive sparse local attention for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3909–3918 (2019)

    Google Scholar 

  13. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3588–3597 (2018)

    Google Scholar 

  14. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: IEEE International Conference on Computer Vision (ICCV), pp. 8420–8429 (2019)

    Google Scholar 

  15. Kaul, P., Xie, W., Zisserman, A.: Label, verify, correct: a simple few shot object detection method. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14237–14247 (2022)

    Google Scholar 

  16. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  17. Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision (ECCV), pp. 145–161 (2020)

    Google Scholar 

  18. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318 (2003)

    Google Scholar 

  19. Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled Faster R-CNN for few-shot object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 8681–8690 (2021)

    Google Scholar 

  20. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

    Google Scholar 

  21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  22. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  23. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)

    Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)

    Google Scholar 

  25. Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning (ICML), pp. 9919–9928 (2020)

    Google Scholar 

  26. Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 9577–9586 (2019)

    Google Scholar 

  27. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking, p. 6 (2020). arXiv preprint arXiv:2004.01888

  28. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28

    Chapter  Google Scholar 

Download references

Acknowledgment

This research was partially funded by the Spanish Ministerio de Ciencia e Innovación (grant number PID2020-112623GB-I00), and the Galician Consellería de Cultura, Educación e Universidade (grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04). These grants are co-funded by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Cores .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cores, D., Seidenari, L., Bimbo, A.D., Brea, V.M., Mucientes, M. (2023). Relation Networks for Few-Shot Video Object Detection. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36616-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36615-4

  • Online ISBN: 978-3-031-36616-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics