skip to main content
research-article

Boosting Few-shot Object Detection with Discriminative Representation and Class Margin

Published: 10 November 2023 Publication History

Abstract

Classifying and accurately locating a visual category with few annotated training samples in computer vision has motivated the few-shot object detection technique, which exploits transfering the source-domain detection model to the target domain. Under this paradigm, however, such transferred source-domain detection model usually encounters difficulty in the classification of the target domain because of the low data diversity of novel training samples. To combat this, we present a simple yet effective few-shot detector, Transferable RCNN. To transfer general knowledge learned from data-abundant base classes to data-scarce novel classes, we propose a weight transfer strategy to promote model transferability and an attention-based feature enhancement mechanism to learn more robust object proposal feature representations. Further, we ensure strong discrimination by optimizing the contrastive objectives of feature maps via a supervised spatial contrastive loss. Meanwhile, we introduce an angle-guided additive margin classifier to augment instance-level inter-class difference and intra-class compactness, which is beneficial for improving the discriminative power of the few-shot classification head under a few supervisions. Our proposed framework outperforms the current works in various settings of PASCAL VOC and MSCOCO datasets; this demonstrates the effectiveness and generalization ability.

References

[1]
Simone Antonelli, Danilo Avola, Luigi Cinque, Donato Crisostomi, Gian Luca Foresti, Fabio Galasso, Marco Raoul Marini, Alessio Mecca, and Daniele Pannone. 2022. Few-shot object detection: A survey. ACM Computing Surveys (CSUR) 54, 11s (2022), 1–37.
[2]
Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. 2018. LSTD: A low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[3]
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A closer look at few-shot classification. arXiv Preprint arXiv:1904.04232 (2019).
[4]
Xianyu Chen, Ming Jiang, and Qi Zhao. 2020. Leveraging bottom-up and top-down attention for few-shot object detection. arXiv Preprint arXiv:2007.12104 (2020).
[5]
Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, Trevor Darrell, et al. 2020. A new meta-baseline for few-shot learning. arXiv Preprint arXiv:2003.04390 1, 2 (2020), 3.
[6]
Hao Cheng, Dongze Lian, Shenghua Gao, and Yanlin Geng. 2018. Evaluating capability of deep neural networks for image classification via information plane. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 168–182.
[7]
Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4013–4022.
[8]
Zhibo Fan, Yuchen Ma, Zeming Li, and Jian Sun. 2021. Generalized few-shot object detection without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4527–4536.
[9]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126–1135.
[10]
Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. Spottune: Transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4805–4814.
[11]
Guangxing Han, Xuan Zhang, and Chongrong Li. 2017. Revisiting faster R-CNN: A deeper look at region proposal network. In Proceedings of the 24th International Conference on Neural Information Processing (ICONIP ’17), Part III 24. Springer, 14–24.
[12]
Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 831–839.
[13]
Ruyi Ji, Zeyu Liu, Libo Zhang, Jianwei Liu, Xin Zuo, Yanjun Wu, Chen Zhao, Haofeng Wang, and Lin Yang. 2021. Multi-peak graph-based multi-instance learning for weakly supervised object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 2s (2021), 1–21.
[14]
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8420–8429.
[15]
Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M. Bronstein. 2019. Repmet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5197–5206.
[16]
Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. 2019. Striking the right balance with uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 103–112.
[17]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems 33 (2020), 18661–18673.
[18]
Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332–1338.
[19]
Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12576–12584.
[20]
Kai Li, Yulun Zhang, Kunpeng Li, and Yun Fu. 2020. Adversarial feature hallucination networks for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13470–13479.
[21]
Yiting Li, Haiyue Zhu, Yu Cheng, Wenxin Wang, Chek Sing Teo, Cheng Xiang, Prahlad Vadakkepat, and Tong Heng Lee. 2021. Few-shot object detection via classification refinement and distractor retreatment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15395–15403.
[22]
Songtao Liu, Zeming Li, and Jian Sun. 2020. Self-EMD: Self-supervised object detection without Imagenet. arXiv preprint arXiv:2011.13677 (2020).
[23]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16), Part I 14. Springer, 21–37.
[24]
Xiaofan Luo, Fukoeng Wong, and Haifeng Hu. 2020. FIN: Feature integrated network for object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1–18.
[25]
Alex Nichol and John Schulman. 2018. Reptile: A scalable metalearning algorithm. arXiv Preprint arXiv:1803.02999 2, 3 (2018), 4.
[26]
Yassine Ouali, Céline Hudelot, and Myriam Tami. 2021. Spatial contrastive learning for few-shot classification. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD ’21), Part I 21. Springer, 671–686.
[27]
Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Learning Representations.
[28]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[29]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).
[30]
Dinggang Shen, Guorong Wu, and Heung-Il Suk. 2017. Deep learning in medical image analysis. Annual Review of Biomedical Engineering 19 (2017), 221–248.
[31]
Marcel Simon and Erik Rodner. 2015. Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 1143–1151.
[32]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems 30 (2017).
[33]
Bo Sun, Banghuai Li, Shengcai Cai, Ye Yuan, and Chi Zhang. 2021. FSCE: Few-shot object detection via contrastive proposal encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7352–7362.
[34]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199–1208.
[35]
Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. 2020. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20), Part XIV 16. Springer, 266–282.
[36]
Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv Preprint arXiv:1810.03548 (2018).
[37]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).
[38]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5265–5274.
[39]
Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. 2020. Frustratingly simple few-shot object detection. arXiv Preprint arXiv:2003.06957 (2020).
[40]
Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, and Joseph E. Gonzalez. 2019. Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1831–1840.
[41]
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024–3033.
[42]
Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019. Meta-learning to detect rare objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9925–9934.
[43]
Jiaxi Wu, Songtao Liu, Di Huang, and Yunhong Wang. 2020. Multi-scale positive sample refinement for few-shot object detection. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20), Part XVI 16. Springer, 456–472.
[44]
Yang Xiao, Vincent Lepetit, and Renaud Marlet. 2022. Few-shot object detection and viewpoint estimation for objects in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3090–3106.
[45]
Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. 2019. Meta R-CNN: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9577–9586.
[46]
Dongbao Yang, Yu Zhou, Wei Shi, Dayan Wu, and Weiping Wang. 2022. RD-IOD: Two-level residual-distillation-based triple-network for incremental object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1 (2022), 1–23.
[47]
Yukuan Yang, Fangyun Wei, Miaojing Shi, and Guoqi Li. 2020. Restoring negative information in few-shot object detection. Advances in Neural Information Processing Systems 33 (2020), 3521–3532.
[48]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14),Part I 13. Springer, 818–833.

Cited By

View all
  • (2024)Noise-Tolerant Hybrid Prototypical Learning with Noisy Web DataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239620:10(1-19)Online publication date: 8-Jul-2024
  • (2024)Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detectionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104340105(104340)Online publication date: Dec-2024

Index Terms

  1. Boosting Few-shot Object Detection with Discriminative Representation and Class Margin

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 3
    March 2024
    665 pages
    EISSN:1551-6865
    DOI:10.1145/3613614
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2023
    Online AM: 12 July 2023
    Accepted: 29 June 2023
    Revised: 27 February 2023
    Received: 29 March 2022
    Published in TOMM Volume 20, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep learning
    2. few-shot object detection
    3. transfer learning

    Qualifiers

    • Research-article

    Funding Sources

    • Integrated Program of National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)160
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Noise-Tolerant Hybrid Prototypical Learning with Noisy Web DataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239620:10(1-19)Online publication date: 8-Jul-2024
    • (2024)Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detectionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104340105(104340)Online publication date: Dec-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media