Adaptive Multi-task Learning for Few-Shot Object Detection

Ren, Yan; Li, Yanling; Kong, Adams Wai-Kin

doi:10.1007/978-3-031-72667-5_17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15065))

Included in the following conference series:

European Conference on Computer Vision

517 Accesses

Abstract

The majority of few-shot object detection methods use a shared feature map for both classification and localization, despite the conflicting requirements of these two tasks. Localization needs scale and positional sensitive features, whereas classification requires features that are robust to scale and positional variations. Although few methods have recognized this challenge and attempted to address it, they may not provide a comprehensive resolution to the issue. To overcome the contradictory preferences between classification and localization in few-shot object detection, an adaptive multi-task learning method, featuring a novel precision-driven gradient balancer, is proposed. This balancer effectively mitigates the conflicts by dynamically adjusting the backward gradient ratios for both tasks. Furthermore, a knowledge distillation and classification refinement scheme based on CLIP is introduced, aiming to enhance individual tasks by leveraging the capabilities of large vision-language models. Experimental results of the proposed method consistently show improvements over strong few-shot detection baselines on benchmark datasets. https://github.com/RY-Paper/MTL-FSOD

Yan Ren and Yanling Li: These authors contributed equally to this work

Yanling Li: This work was fully conducted during the author’s PhD studies at NTU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-task Self-supervised Few-Shot Detection

Few-Shot Object Detection Based on Global Domain Adaptation Strategy

Article Open access 29 January 2025

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

References

Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: AAAI, vol. 32 (2018)
Google Scholar
Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE TMM 25, 291–301 (2021)
Google Scholar
Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning, pp. 794–803. PMLR (2018)
Google Scholar
Cheng, B., Wei, Y., Shi, H., Feris, R., Xiong, J., Huang, T.: Revisiting RCNN: on awakening the classification power of faster R-CNN. In: ECCV, pp. 453–468 (2018)
Google Scholar
Clark, K., Luong, M.T., Khandelwal, U., Manning, C.D., Le, Q.V.: Bam! born-again multi-task networks for natural language understanding. arXiv preprint arXiv:1907.04829 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88, 303–308 (2009)
Article Google Scholar
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot object detection with model calibration. In: ECCV, pp. 720–739. Springer (2022). https://doi.org/10.1007/978-3-031-19800-7_42
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Han, G., Huang, S., Ma, J., He, Y., Chang, S.F.: Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. In: AAAI, vol. 36, pp. 780–789 (2022)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, H., Bai, S., Li, A., Cui, J., Wang, L.: Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR, pp. 10185–10194 (2021)
Google Scholar
Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE TPAMI 45(4), 4071–4089 (2022)
Google Scholar
Jung, M.J., Han, S.D., Kim, J.: Re-scoring using image-language similarity for few-shot object detection. Comput. Vis. Image Underst. 241, 103956 (2024)
Google Scholar
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: ICCV, pp. 8420–8429 (2019)
Google Scholar
Kim, J.U., Kim, S.T., Kim, E.S., Moon, S.K., Ro, Y.M.: Towards high-performance object detection: task-specific design considering classification and localization separation. In: ICASSP, pp. 4317–4321. IEEE (2020)
Google Scholar
Li, A., Li, Z.: Transformation invariant few-shot object detection. In: CVPR, pp. 3094–3102 (2021)
Google Scholar
Li, B., Yang, B., Liu, C., Liu, F., Ji, R., Ye, Q.: Beyond max-margin: class margin equilibrium for few-shot object detection. In: CVPR, pp. 7363–7372 (2021)
Google Scholar
Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., Sun, F.: Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI, vol. 37, pp. 1323–1333 (2023)
Google Scholar
Li, Y., et al.: Few-shot object detection via classification refinement and distractor retreatment. In: CVPR, pp. 15395–15403 (2021)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755. Springer (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, B., Liu, X., Jin, X., Stone, P., Liu, Q.: Conflict-averse gradient descent for multi-task learning. NeurIPS 34, 18878–18890 (2021)
Google Scholar
Liu, L., et al.: Towards impartial multi-task learning. In: ICLR (2020)
Google Scholar
Liu, S., Liang, Y., Gitter, A.: Loss-balanced task weighting to reduce negative transfer in multi-task learning. In: AAAI, vol. 33, pp. 9977–9978 (2019)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., Fu, K.: Breaking immutable: information-coupled prototype elaboration for few-shot object detection. In: AAAI, vol. 37, pp. 1844–1852 (2023)
Google Scholar
Lu, Y., Chen, X., Wu, Z., Yu, J.: Decoupled metric network for single-stage few-shot object detection. IEEE Trans. Cybern. 53(1), 514–525 (2022)
Article Google Scholar
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1930–1939 (2018)
Google Scholar
Ma, J., Han, G., Huang, S., Yang, Y., Chang, S.F.: Few-shot end-to-end object detection via constantly concentrated encoding across heads. In: ECCV, pp. 57–73. Springer (2022). https://doi.org/10.1007/978-3-031-19809-0_4
Navon, A., et al.: Multi-task learning as a bargaining game. arXiv preprint arXiv:2202.01017 (2022)
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE TPAMI 43(10), 3388–3415 (2020)
Article Google Scholar
Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled faster R-CNN for few-shot object detection. In: ICCV, pp. 8681–8690 (2021)
Google Scholar
Quan, Q., Yao, Q., Li, J., Zhou, S.K.: Which images to label for few-shot medical landmark detection? In: CVPR, pp. 20606–20616 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NeurIPS 28 (2015)
Google Scholar
Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR, pp. 11563–11572 (2020)
Google Scholar
Song, H., Dong, L., Zhang, W.N., Liu, T., Wei, F.: CLIP models are few-shot learners: Empirical studies on VQA and visual entailment. arXiv preprint arXiv:2203.07190 (2022)
Sun, B., Li, B., Cai, S., Yuan, Y., Zhang, C.: FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR, pp. 7352–7362 (2021)
Google Scholar
Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning, pp. 9919–9928. PMLR (2020)
Google Scholar
Wang, Y.X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: ICCV, pp. 9925–9934 (2019)
Google Scholar
Wang, Z., Tsvetkov, Y., Firat, O., Cao, Y.: Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874 (2020)
Wertheimer, D., Hariharan, B.: Few-shot learning with localization in realistic settings. In: CVPR, pp. 6558–6567 (2019)
Google Scholar
Wu, J., Liu, S., Huang, D., Wang, Y.: Multi-scale positive sample refinement for few-shot object detection. In: ECCV, pp. 456–472. Springer (2020). https://doi.org/10.1007/978-3-030-58517-4_27
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: CVPR, pp. 10186–10195 (2020)
Google Scholar
Xiao, Y., Lepetit, V., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. IEEE TPAMI 45(3), 3090–3106 (2022)
Google Scholar
Xu, J., Le, H., Samaras, D.: Generating features with increased crop-related diversity for few-shot object detection. In: CVPR, pp. 19713–19722 (2023)
Google Scholar
Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV, pp. 9577–9586 (2019)
Google Scholar
Yang, Y., Wei, F., Shi, M., Li, G.: Restoring negative information in few-shot object detection. NeurIPS 33, 3521–3532 (2020)
Google Scholar
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. NeurIPS 33, 5824–5836 (2020)
Google Scholar
Zhang, G., Cui, K., Wu, R., Lu, S., Tian, Y.: PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3823–3832 (2021)
Google Scholar
Zhang, G., Luo, Z., Cui, K., Lu, S., Xing, E.P.: Meta-DETR: image-level few-shot detection with inter-class correlation exploitation. IEEE TPAMI 45(11), 12832–12843 (2022)
Google Scholar
Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., Li, H.: Tip-adapter: training-free adaption of clip for few-shot classification. In: ECCV, pp. 493–510. Springer (2022). https://doi.org/10.1007/978-3-031-19833-5_29
Zhang, S., Wang, L., Murray, N., Koniusz, P.: Kernelized few-shot object detection with efficient integral aggregation. In: CVPR, pp. 19207–19216 (2022)
Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)
Article Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: ECCV, pp. 94–108. Springer (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: CVPR, pp. 11953–11962 (2022)
Google Scholar
Zhu, C., Chen, F., Ahmed, U., Shen, Z., Savvides, M.: Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR, pp. 8782–8791 (2021)
Google Scholar
Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. Proc. IEEE 111(3), 257–276 (2023)
Google Scholar

Download references

Acknowledgements

This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Yan Ren, Yanling Li & Adams Wai-Kin Kong
The Second Research Institute of CAAC, Chengdu, China
Yanling Li

Authors

Yan Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yanling Li
View author publications
You can also search for this author in PubMed Google Scholar
Adams Wai-Kin Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Ren .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9379 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Y., Li, Y., Kong, A.WK. (2025). Adaptive Multi-task Learning for Few-Shot Object Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-72667-5_17
Published: 29 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72666-8
Online ISBN: 978-3-031-72667-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive Multi-task Learning for Few-Shot Object Detection