Abstract
The majority of few-shot object detection methods use a shared feature map for both classification and localization, despite the conflicting requirements of these two tasks. Localization needs scale and positional sensitive features, whereas classification requires features that are robust to scale and positional variations. Although few methods have recognized this challenge and attempted to address it, they may not provide a comprehensive resolution to the issue. To overcome the contradictory preferences between classification and localization in few-shot object detection, an adaptive multi-task learning method, featuring a novel precision-driven gradient balancer, is proposed. This balancer effectively mitigates the conflicts by dynamically adjusting the backward gradient ratios for both tasks. Furthermore, a knowledge distillation and classification refinement scheme based on CLIP is introduced, aiming to enhance individual tasks by leveraging the capabilities of large vision-language models. Experimental results of the proposed method consistently show improvements over strong few-shot detection baselines on benchmark datasets. https://github.com/RY-Paper/MTL-FSOD
Yan Ren and Yanling Li: These authors contributed equally to this work
Yanling Li: This work was fully conducted during the author’s PhD studies at NTU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: AAAI, vol. 32 (2018)
Chen, T.I., et al.: Dual-awareness attention for few-shot object detection. IEEE TMM 25, 291–301 (2021)
Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning, pp. 794–803. PMLR (2018)
Cheng, B., Wei, Y., Shi, H., Feris, R., Xiong, J., Huang, T.: Revisiting RCNN: on awakening the classification power of faster R-CNN. In: ECCV, pp. 453–468 (2018)
Clark, K., Luong, M.T., Khandelwal, U., Manning, C.D., Le, Q.V.: Bam! born-again multi-task networks for natural language understanding. arXiv preprint arXiv:1907.04829 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE (2009)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88, 303–308 (2009)
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot object detection with model calibration. In: ECCV, pp. 720–739. Springer (2022). https://doi.org/10.1007/978-3-031-19800-7_42
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Han, G., Huang, S., Ma, J., He, Y., Chang, S.F.: Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. In: AAAI, vol. 36, pp. 780–789 (2022)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, H., Bai, S., Li, A., Cui, J., Wang, L.: Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR, pp. 10185–10194 (2021)
Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE TPAMI 45(4), 4071–4089 (2022)
Jung, M.J., Han, S.D., Kim, J.: Re-scoring using image-language similarity for few-shot object detection. Comput. Vis. Image Underst. 241, 103956 (2024)
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: ICCV, pp. 8420–8429 (2019)
Kim, J.U., Kim, S.T., Kim, E.S., Moon, S.K., Ro, Y.M.: Towards high-performance object detection: task-specific design considering classification and localization separation. In: ICASSP, pp. 4317–4321. IEEE (2020)
Li, A., Li, Z.: Transformation invariant few-shot object detection. In: CVPR, pp. 3094–3102 (2021)
Li, B., Yang, B., Liu, C., Liu, F., Ji, R., Ye, Q.: Beyond max-margin: class margin equilibrium for few-shot object detection. In: CVPR, pp. 7363–7372 (2021)
Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., Sun, F.: Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI, vol. 37, pp. 1323–1333 (2023)
Li, Y., et al.: Few-shot object detection via classification refinement and distractor retreatment. In: CVPR, pp. 15395–15403 (2021)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755. Springer (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, B., Liu, X., Jin, X., Stone, P., Liu, Q.: Conflict-averse gradient descent for multi-task learning. NeurIPS 34, 18878–18890 (2021)
Liu, L., et al.: Towards impartial multi-task learning. In: ICLR (2020)
Liu, S., Liang, Y., Gitter, A.: Loss-balanced task weighting to reduce negative transfer in multi-task learning. In: AAAI, vol. 33, pp. 9977–9978 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., Fu, K.: Breaking immutable: information-coupled prototype elaboration for few-shot object detection. In: AAAI, vol. 37, pp. 1844–1852 (2023)
Lu, Y., Chen, X., Wu, Z., Yu, J.: Decoupled metric network for single-stage few-shot object detection. IEEE Trans. Cybern. 53(1), 514–525 (2022)
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1930–1939 (2018)
Ma, J., Han, G., Huang, S., Yang, Y., Chang, S.F.: Few-shot end-to-end object detection via constantly concentrated encoding across heads. In: ECCV, pp. 57–73. Springer (2022). https://doi.org/10.1007/978-3-031-19809-0_4
Navon, A., et al.: Multi-task learning as a bargaining game. arXiv preprint arXiv:2202.01017 (2022)
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE TPAMI 43(10), 3388–3415 (2020)
Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., Zhang, C.: DeFRCN: decoupled faster R-CNN for few-shot object detection. In: ICCV, pp. 8681–8690 (2021)
Quan, Q., Yao, Q., Li, J., Zhou, S.K.: Which images to label for few-shot medical landmark detection? In: CVPR, pp. 20606–20616 (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NeurIPS 28 (2015)
Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR, pp. 11563–11572 (2020)
Song, H., Dong, L., Zhang, W.N., Liu, T., Wei, F.: CLIP models are few-shot learners: Empirical studies on VQA and visual entailment. arXiv preprint arXiv:2203.07190 (2022)
Sun, B., Li, B., Cai, S., Yuan, Y., Zhang, C.: FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR, pp. 7352–7362 (2021)
Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning, pp. 9919–9928. PMLR (2020)
Wang, Y.X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: ICCV, pp. 9925–9934 (2019)
Wang, Z., Tsvetkov, Y., Firat, O., Cao, Y.: Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874 (2020)
Wertheimer, D., Hariharan, B.: Few-shot learning with localization in realistic settings. In: CVPR, pp. 6558–6567 (2019)
Wu, J., Liu, S., Huang, D., Wang, Y.: Multi-scale positive sample refinement for few-shot object detection. In: ECCV, pp. 456–472. Springer (2020). https://doi.org/10.1007/978-3-030-58517-4_27
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: CVPR, pp. 10186–10195 (2020)
Xiao, Y., Lepetit, V., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. IEEE TPAMI 45(3), 3090–3106 (2022)
Xu, J., Le, H., Samaras, D.: Generating features with increased crop-related diversity for few-shot object detection. In: CVPR, pp. 19713–19722 (2023)
Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV, pp. 9577–9586 (2019)
Yang, Y., Wei, F., Shi, M., Li, G.: Restoring negative information in few-shot object detection. NeurIPS 33, 3521–3532 (2020)
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. NeurIPS 33, 5824–5836 (2020)
Zhang, G., Cui, K., Wu, R., Lu, S., Tian, Y.: PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3823–3832 (2021)
Zhang, G., Luo, Z., Cui, K., Lu, S., Xing, E.P.: Meta-DETR: image-level few-shot detection with inter-class correlation exploitation. IEEE TPAMI 45(11), 12832–12843 (2022)
Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., Li, H.: Tip-adapter: training-free adaption of clip for few-shot classification. In: ECCV, pp. 493–510. Springer (2022). https://doi.org/10.1007/978-3-031-19833-5_29
Zhang, S., Wang, L., Murray, N., Koniusz, P.: Kernelized few-shot object detection with efficient integral aggregation. In: CVPR, pp. 19207–19216 (2022)
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: ECCV, pp. 94–108. Springer (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: CVPR, pp. 11953–11962 (2022)
Zhu, C., Chen, F., Ahmed, U., Shen, Z., Savvides, M.: Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR, pp. 8782–8791 (2021)
Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. Proc. IEEE 111(3), 257–276 (2023)
Acknowledgements
This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ren, Y., Li, Y., Kong, A.WK. (2025). Adaptive Multi-task Learning for Few-Shot Object Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-72667-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72666-8
Online ISBN: 978-3-031-72667-5
eBook Packages: Computer ScienceComputer Science (R0)