Relation Networks for Few-Shot Video Object Detection

Cores, Daniel; Seidenari, Lorenzo; Bimbo, Alberto Del; Brea, Víctor M.; Mucientes, Manuel

doi:10.1007/978-3-031-36616-1_19

Daniel Cores¹¹,
Lorenzo Seidenari¹²,
Alberto Del Bimbo¹²,
Víctor M. Brea¹¹ &
…
Manuel Mucientes¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1210 Accesses

Abstract

This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FSVOD500 also contains a validation set with 770 annotated videos with 80 object categories. We do not use this set in the experimentation.

References

Bosquet, B., Cores, D., Seidenari, L., Brea, V.M., Mucientes, M., Bimbo, A.D.: A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 133, 108998 (2022)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Cao, Y., Wang, J., Lin, Y., Lin, D.: Mini: Mining implicit novel instances for few-shot object detection. arXiv preprint arXiv:2205.03381 (2022)
Chen, C., et al.: RRNet: a hybrid detector for object detection in drone-captured images. In: IEEE International Conference on Computer Vision Workshops (ICCV) (2019)
Google Scholar
Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE Trans. Multimed. 25, 291–301 (2021)
Google Scholar
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10337–10346 (2020)
Google Scholar
Cores, D., Brea, V.M., Mucientes, M.: Short-term anchor linking and long-term self-guided attention for video object detection. Image Vis. Comput. 110, 104179 (2021)
Article Google Scholar
Cores, D., Brea, V.M., Mucientes, M.: Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos. Appl. Intell. 53, 1205–1217 (2023)
Google Scholar
Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., Mei, T.: Relation distillation networks for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 7023–7032 (2019)
Google Scholar
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot video object detection. In: European Conference on Computer Vision (ECCV) (2022)
Google Scholar
Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-RPN and multi-relation detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)
Google Scholar
Guo, C., et al.: Progressive sparse local attention for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3909–3918 (2019)
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3588–3597 (2018)
Google Scholar
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: IEEE International Conference on Computer Vision (ICCV), pp. 8420–8429 (2019)
Google Scholar
Kaul, P., Xie, W., Zisserman, A.: Label, verify, correct: a simple few shot object detection method. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14237–14247 (2022)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision (ECCV), pp. 145–161 (2020)
Google Scholar
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318 (2003)
Google Scholar
Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled Faster R-CNN for few-shot object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 8681–8690 (2021)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)
Google Scholar
Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning (ICML), pp. 9919–9928 (2020)
Google Scholar
Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 9577–9586 (2019)
Google Scholar
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking, p. 6 (2020). arXiv preprint arXiv:2004.01888
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar

Download references

Acknowledgment

This research was partially funded by the Spanish Ministerio de Ciencia e Innovación (grant number PID2020-112623GB-I00), and the Galician Consellería de Cultura, Educación e Universidade (grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04). These grants are co-funded by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Daniel Cores, Víctor M. Brea & Manuel Mucientes
Media Integration and Communication Center (MICC), University of Florence, Firenze, Italy
Lorenzo Seidenari & Alberto Del Bimbo

Authors

Daniel Cores
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Seidenari
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Del Bimbo
View author publications
You can also search for this author in PubMed Google Scholar
Víctor M. Brea
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Mucientes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Cores .

Editor information

Editors and Affiliations

University of Alicante, Alicante, Spain
Antonio Pertusa
University of Alicante, Alicante, Spain
Antonio Javier Gallego
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez
IPO Porto, Coimbra, Portugal
Inês Domingues

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cores, D., Seidenari, L., Bimbo, A.D., Brea, V.M., Mucientes, M. (2023). Relation Networks for Few-Shot Video Object Detection. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-36616-1_19
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Relation Networks for Few-Shot Video Object Detection