Skip to main content

Adaptive Multi-task Learning for Few-Shot Object Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

The majority of few-shot object detection methods use a shared feature map for both classification and localization, despite the conflicting requirements of these two tasks. Localization needs scale and positional sensitive features, whereas classification requires features that are robust to scale and positional variations. Although few methods have recognized this challenge and attempted to address it, they may not provide a comprehensive resolution to the issue. To overcome the contradictory preferences between classification and localization in few-shot object detection, an adaptive multi-task learning method, featuring a novel precision-driven gradient balancer, is proposed. This balancer effectively mitigates the conflicts by dynamically adjusting the backward gradient ratios for both tasks. Furthermore, a knowledge distillation and classification refinement scheme based on CLIP is introduced, aiming to enhance individual tasks by leveraging the capabilities of large vision-language models. Experimental results of the proposed method consistently show improvements over strong few-shot detection baselines on benchmark datasets. https://github.com/RY-Paper/MTL-FSOD

Yan Ren and Yanling Li: These authors contributed equally to this work

Yanling Li: This work was fully conducted during the author’s PhD studies at NTU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: AAAI, vol. 32 (2018)

    Google Scholar 

  2. Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE TMM 25, 291–301 (2021)

    Google Scholar 

  3. Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning, pp. 794–803. PMLR (2018)

    Google Scholar 

  4. Cheng, B., Wei, Y., Shi, H., Feris, R., Xiong, J., Huang, T.: Revisiting RCNN: on awakening the classification power of faster R-CNN. In: ECCV, pp. 453–468 (2018)

    Google Scholar 

  5. Clark, K., Luong, M.T., Khandelwal, U., Manning, C.D., Le, Q.V.: Bam! born-again multi-task networks for natural language understanding. arXiv preprint arXiv:1907.04829 (2019)

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)

    Google Scholar 

  7. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88, 303–308 (2009)

    Article  Google Scholar 

  8. Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot object detection with model calibration. In: ECCV, pp. 720–739. Springer (2022). https://doi.org/10.1007/978-3-031-19800-7_42

  9. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  10. Han, G., Huang, S., Ma, J., He, Y., Chang, S.F.: Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. In: AAAI, vol. 36, pp. 780–789 (2022)

    Google Scholar 

  11. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  12. Hu, H., Bai, S., Li, A., Cui, J., Wang, L.: Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR, pp. 10185–10194 (2021)

    Google Scholar 

  13. Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE TPAMI 45(4), 4071–4089 (2022)

    Google Scholar 

  14. Jung, M.J., Han, S.D., Kim, J.: Re-scoring using image-language similarity for few-shot object detection. Comput. Vis. Image Underst. 241, 103956 (2024)

    Google Scholar 

  15. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: ICCV, pp. 8420–8429 (2019)

    Google Scholar 

  16. Kim, J.U., Kim, S.T., Kim, E.S., Moon, S.K., Ro, Y.M.: Towards high-performance object detection: task-specific design considering classification and localization separation. In: ICASSP, pp. 4317–4321. IEEE (2020)

    Google Scholar 

  17. Li, A., Li, Z.: Transformation invariant few-shot object detection. In: CVPR, pp. 3094–3102 (2021)

    Google Scholar 

  18. Li, B., Yang, B., Liu, C., Liu, F., Ji, R., Ye, Q.: Beyond max-margin: class margin equilibrium for few-shot object detection. In: CVPR, pp. 7363–7372 (2021)

    Google Scholar 

  19. Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., Sun, F.: Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI, vol. 37, pp. 1323–1333 (2023)

    Google Scholar 

  20. Li, Y., et al.: Few-shot object detection via classification refinement and distractor retreatment. In: CVPR, pp. 15395–15403 (2021)

    Google Scholar 

  21. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755. Springer (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  22. Liu, B., Liu, X., Jin, X., Stone, P., Liu, Q.: Conflict-averse gradient descent for multi-task learning. NeurIPS 34, 18878–18890 (2021)

    Google Scholar 

  23. Liu, L., et al.: Towards impartial multi-task learning. In: ICLR (2020)

    Google Scholar 

  24. Liu, S., Liang, Y., Gitter, A.: Loss-balanced task weighting to reduce negative transfer in multi-task learning. In: AAAI, vol. 33, pp. 9977–9978 (2019)

    Google Scholar 

  25. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  26. Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., Fu, K.: Breaking immutable: information-coupled prototype elaboration for few-shot object detection. In: AAAI, vol. 37, pp. 1844–1852 (2023)

    Google Scholar 

  27. Lu, Y., Chen, X., Wu, Z., Yu, J.: Decoupled metric network for single-stage few-shot object detection. IEEE Trans. Cybern. 53(1), 514–525 (2022)

    Article  Google Scholar 

  28. Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1930–1939 (2018)

    Google Scholar 

  29. Ma, J., Han, G., Huang, S., Yang, Y., Chang, S.F.: Few-shot end-to-end object detection via constantly concentrated encoding across heads. In: ECCV, pp. 57–73. Springer (2022). https://doi.org/10.1007/978-3-031-19809-0_4

  30. Navon, A., et al.: Multi-task learning as a bargaining game. arXiv preprint arXiv:2202.01017 (2022)

  31. Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE TPAMI 43(10), 3388–3415 (2020)

    Article  Google Scholar 

  32. Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled faster R-CNN for few-shot object detection. In: ICCV, pp. 8681–8690 (2021)

    Google Scholar 

  33. Quan, Q., Yao, Q., Li, J., Zhou, S.K.: Which images to label for few-shot medical landmark detection? In: CVPR, pp. 20606–20616 (2022)

    Google Scholar 

  34. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  35. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)

    Google Scholar 

  36. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NeurIPS 28 (2015)

    Google Scholar 

  37. Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)

  38. Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR, pp. 11563–11572 (2020)

    Google Scholar 

  39. Song, H., Dong, L., Zhang, W.N., Liu, T., Wei, F.: CLIP models are few-shot learners: Empirical studies on VQA and visual entailment. arXiv preprint arXiv:2203.07190 (2022)

  40. Sun, B., Li, B., Cai, S., Yuan, Y., Zhang, C.: FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR, pp. 7352–7362 (2021)

    Google Scholar 

  41. Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning, pp. 9919–9928. PMLR (2020)

    Google Scholar 

  42. Wang, Y.X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: ICCV, pp. 9925–9934 (2019)

    Google Scholar 

  43. Wang, Z., Tsvetkov, Y., Firat, O., Cao, Y.: Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874 (2020)

  44. Wertheimer, D., Hariharan, B.: Few-shot learning with localization in realistic settings. In: CVPR, pp. 6558–6567 (2019)

    Google Scholar 

  45. Wu, J., Liu, S., Huang, D., Wang, Y.: Multi-scale positive sample refinement for few-shot object detection. In: ECCV, pp. 456–472. Springer (2020). https://doi.org/10.1007/978-3-030-58517-4_27

  46. Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: CVPR, pp. 10186–10195 (2020)

    Google Scholar 

  47. Xiao, Y., Lepetit, V., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. IEEE TPAMI 45(3), 3090–3106 (2022)

    Google Scholar 

  48. Xu, J., Le, H., Samaras, D.: Generating features with increased crop-related diversity for few-shot object detection. In: CVPR, pp. 19713–19722 (2023)

    Google Scholar 

  49. Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV, pp. 9577–9586 (2019)

    Google Scholar 

  50. Yang, Y., Wei, F., Shi, M., Li, G.: Restoring negative information in few-shot object detection. NeurIPS 33, 3521–3532 (2020)

    Google Scholar 

  51. Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. NeurIPS 33, 5824–5836 (2020)

    Google Scholar 

  52. Zhang, G., Cui, K., Wu, R., Lu, S., Tian, Y.: PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3823–3832 (2021)

    Google Scholar 

  53. Zhang, G., Luo, Z., Cui, K., Lu, S., Xing, E.P.: Meta-DETR: image-level few-shot detection with inter-class correlation exploitation. IEEE TPAMI 45(11), 12832–12843 (2022)

    Google Scholar 

  54. Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., Li, H.: Tip-adapter: training-free adaption of clip for few-shot classification. In: ECCV, pp. 493–510. Springer (2022). https://doi.org/10.1007/978-3-031-19833-5_29

  55. Zhang, S., Wang, L., Murray, N., Koniusz, P.: Kernelized few-shot object detection with efficient integral aggregation. In: CVPR, pp. 19207–19216 (2022)

    Google Scholar 

  56. Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)

    Article  Google Scholar 

  57. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: ECCV, pp. 94–108. Springer (2014). https://doi.org/10.1007/978-3-319-10599-4_7

  58. Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: CVPR, pp. 11953–11962 (2022)

    Google Scholar 

  59. Zhu, C., Chen, F., Ahmed, U., Shen, Z., Savvides, M.: Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR, pp. 8782–8791 (2021)

    Google Scholar 

  60. Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. Proc. IEEE 111(3), 257–276 (2023)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Ren .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9379 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, Y., Li, Y., Kong, A.WK. (2025). Adaptive Multi-task Learning for Few-Shot Object Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72667-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72666-8

  • Online ISBN: 978-3-031-72667-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics