Abstract
This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
FSVOD500 also contains a validation set with 770 annotated videos with 80 object categories. We do not use this set in the experimentation.
References
Bosquet, B., Cores, D., Seidenari, L., Brea, V.M., Mucientes, M., Bimbo, A.D.: A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 133, 108998 (2022)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Cao, Y., Wang, J., Lin, Y., Lin, D.: Mini: Mining implicit novel instances for few-shot object detection. arXiv preprint arXiv:2205.03381 (2022)
Chen, C., et al.: RRNet: a hybrid detector for object detection in drone-captured images. In: IEEE International Conference on Computer Vision Workshops (ICCV) (2019)
Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE Trans. Multimed. 25, 291–301 (2021)
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10337–10346 (2020)
Cores, D., Brea, V.M., Mucientes, M.: Short-term anchor linking and long-term self-guided attention for video object detection. Image Vis. Comput. 110, 104179 (2021)
Cores, D., Brea, V.M., Mucientes, M.: Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos. Appl. Intell. 53, 1205–1217 (2023)
Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., Mei, T.: Relation distillation networks for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 7023–7032 (2019)
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot video object detection. In: European Conference on Computer Vision (ECCV) (2022)
Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-RPN and multi-relation detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)
Guo, C., et al.: Progressive sparse local attention for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3909–3918 (2019)
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3588–3597 (2018)
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: IEEE International Conference on Computer Vision (ICCV), pp. 8420–8429 (2019)
Kaul, P., Xie, W., Zisserman, A.: Label, verify, correct: a simple few shot object detection method. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14237–14247 (2022)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision (ECCV), pp. 145–161 (2020)
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318 (2003)
Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled Faster R-CNN for few-shot object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 8681–8690 (2021)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)
Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning (ICML), pp. 9919–9928 (2020)
Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 9577–9586 (2019)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking, p. 6 (2020). arXiv preprint arXiv:2004.01888
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Acknowledgment
This research was partially funded by the Spanish Ministerio de Ciencia e Innovación (grant number PID2020-112623GB-I00), and the Galician Consellería de Cultura, Educación e Universidade (grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04). These grants are co-funded by the European Regional Development Fund (ERDF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cores, D., Seidenari, L., Bimbo, A.D., Brea, V.M., Mucientes, M. (2023). Relation Networks for Few-Shot Video Object Detection. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)