Skip to main content

Advertisement

Log in

Multiple knowledge embedding for few-shot object detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In the problem of few-shot object detection, class prototype knowledge in previous works is not be fully refined and utilized due to lack of instances. We noticed that the application of the output features of the RoI pooling layer has a great influence on the grasp of the prototype features, which motivates us to focus on how to reuse them. Therefore, we propose a multiple knowledge embedding network, which gets improvement in three places in the fine-tuning stage. Introducing attention mechanism to strengthen feature extraction, Up-CoTNet is used to replace the 3\(\times \)3 convolution in Resnet101. Feature enhancement module is added to enforce the common object reasoning for the output of RoI pooling layer. Then, we propose a contrastive learning branch to grasp the information encoded between different feature regions. Experiments on PASCAL VOC and MS COCO datasets show that our model significantly raises the performance by 1.3\(\%\) (+0.5AP) in average compared with previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Materials Availability

The PASCAL VOC dataset can be downloaded in “https://pjreddie.com/projects/pascal-voc-dataset-mirror/”. And the MS COCO dataset can be downloaded in “https://cocodataset.org/”.

References

  1. Chen, P.-Y., Chang, M.-C., Hsieh, J.-W., Chen, Y.-S.: Parallel residual bi-fusion feature pyramid network for accurate single-shot object detection. IEEE Trans. Image Process. 30, 9099–9111 (2021)

    Article  Google Scholar 

  2. Tang, J., Shu, X., Li, Z., Qi, G.-J., Wang, J.: Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Trans. Multimed. Comput. Commun. Appl. 12(4s) (2016)

  3. Shu, X., Qi, G.-J., Tang, J., Wang, J.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia, MM’15, pp. 35–44 (2015)

  4. Xiong, C., Li, W., Liu, Y., Wang, M.: Multi-dimensional edge features graph neural network on few-shot image classification. IEEE Signal Process. Lett. 28, 573–577 (2021)

    Article  Google Scholar 

  5. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

    Article  Google Scholar 

  6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  7. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8419–8428 (2019)

  8. Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  9. Wang, Y.-X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9924–9933 (2019)

  10. Karlinsky, L., Shtok, J., Harary, S., Schwartz, E., Aides, A., Feris, R., Giryes, R., Bronstein, A.M.: Repmet: Representative-based metric learning for classification and few-shot object detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5192–5201, (2019)

  11. Li, B., Yang, B., Liu, C., Liu, F., Ji, R., Ye, Q.: Beyond max-margin: Class margin equilibrium for few-shot object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7359–7368 (2021)

  12. Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In International Conference on Machine Learning, pp. 9919–9928. PMLR (2020)

  13. Sun, B., Li, B., Cai, S., Yuan, Y., Zhang, C.: FSCE: Few-shot object detection via contrastive proposal encoding. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7348–7358 (2021)

  14. Zhang, W., Wang, Y.-X.: Hallucination improves few-shot object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13003–13012 (2021)

  15. Li, Y., Yao, T., Pan, Y., Mei, T.: Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. p. 1 (2022)

  16. Cheng, G., Wang, J., Li, K., Xie, X., Lang, C., Yao, Y., Han, J.: Anchor-free oriented proposal generator for object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022)

    Google Scholar 

  17. Mukilan, P., Semunigus, W.: Human and object detection using hybrid deep convolutional neural network. Signal Image Video Process. pp. 1–11 (2022)

  18. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  19. Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020)

    Article  Google Scholar 

  20. Feng, R., Zheng, X., Gao, T., Chen, J., Wang, W., Chen, D.Z., Jian, W.: Interactive few-shot learning: limited supervision, better medical image segmentation. IEEE Trans. Med. Imag. 40(10), 2575–2588 (2021)

    Article  Google Scholar 

  21. Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)

    Google Scholar 

  22. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR (2020)

  23. Wang, Y., Wei, Y., Ma, R., Wang, L., Wang, C.: Unsupervised vehicle re-identification based on mixed sample contrastive learning. Signal Image Video Process. 1–9 (2022)

  24. Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)

  25. Zhou, M., Xueyang, F., Huang, J., Zhao, F., Liu, A., Wang, R.: Effective pan-sharpening with transformer and invertible neural network. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)

  26. Viola, I., Kanitsar, A., Groller, M.E.: Importance-driven feature enhancement in volume visualization. IEEE Trans. Vis. Comput. Graphics 11(4), 408–418 (2005)

    Article  Google Scholar 

  27. Han, G., Huang, S., Ma, J., He, Y., Chang, S.-F.: Meta Faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. arXiv preprint arXiv:2104.07719 (2021)

  28. Ravichandran, A., Bhotika, R., Soatto, S.: Few-shot learning with embedded class models and shot-free meta training. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 331–339 (2019)

  29. Lin, J., Cai, Q., Lin, M.: Multi-label classification of fundus images with graph convolutional network and self-supervised learning. IEEE Signal Process. Lett. 28, 454–458 (2021)

    Article  Google Scholar 

  30. Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.-D., McWilliams, B.: The shattered gradients problem: If resnets are the answer, then what is the question? In International Conference on Machine Learning, pp. 342–350. PMLR (2017)

Download references

Funding

This work is supported by National Natural Science Foundation of China, 61771334. This work is also supported by National Key Research and Development Program of China (Titled “Brain-inspired General Vision Models and Applications”).

Author information

Authors and Affiliations

Authors

Contributions

XG contributed to the conception of the study; YC performed the experiments; JW contributed significantly to analysis and manuscript preparation; All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaolin Gong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This paper is not applicable for both human or animal studies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, X., Cai, Y. & Wang, J. Multiple knowledge embedding for few-shot object detection. SIViP 17, 2231–2240 (2023). https://doi.org/10.1007/s11760-022-02438-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02438-2

Keywords