research-article

Boosting Few-shot Object Detection with Discriminative Representation and Class Margin

Authors:

Xuehui LiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 3

Article No.: 75, Pages 1 - 19

https://doi.org/10.1145/3608478

Published: 10 November 2023 Publication History

Abstract

Classifying and accurately locating a visual category with few annotated training samples in computer vision has motivated the few-shot object detection technique, which exploits transfering the source-domain detection model to the target domain. Under this paradigm, however, such transferred source-domain detection model usually encounters difficulty in the classification of the target domain because of the low data diversity of novel training samples. To combat this, we present a simple yet effective few-shot detector, Transferable RCNN. To transfer general knowledge learned from data-abundant base classes to data-scarce novel classes, we propose a weight transfer strategy to promote model transferability and an attention-based feature enhancement mechanism to learn more robust object proposal feature representations. Further, we ensure strong discrimination by optimizing the contrastive objectives of feature maps via a supervised spatial contrastive loss. Meanwhile, we introduce an angle-guided additive margin classifier to augment instance-level inter-class difference and intra-class compactness, which is beneficial for improving the discriminative power of the few-shot classification head under a few supervisions. Our proposed framework outperforms the current works in various settings of PASCAL VOC and MSCOCO datasets; this demonstrates the effectiveness and generalization ability.

References

[1]

Simone Antonelli, Danilo Avola, Luigi Cinque, Donato Crisostomi, Gian Luca Foresti, Fabio Galasso, Marco Raoul Marini, Alessio Mecca, and Daniele Pannone. 2022. Few-shot object detection: A survey. ACM Computing Surveys (CSUR) 54, 11s (2022), 1–37.

Digital Library

[2]

Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. 2018. LSTD: A low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[3]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A closer look at few-shot classification. arXiv Preprint arXiv:1904.04232 (2019).

[4]

Xianyu Chen, Ming Jiang, and Qi Zhao. 2020. Leveraging bottom-up and top-down attention for few-shot object detection. arXiv Preprint arXiv:2007.12104 (2020).

[5]

Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, Trevor Darrell, et al. 2020. A new meta-baseline for few-shot learning. arXiv Preprint arXiv:2003.04390 1, 2 (2020), 3.

[6]

Hao Cheng, Dongze Lian, Shenghua Gao, and Yanlin Geng. 2018. Evaluating capability of deep neural networks for image classification via information plane. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 168–182.

Digital Library

[7]

Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4013–4022.

[8]

Zhibo Fan, Yuchen Ma, Zeming Li, and Jian Sun. 2021. Generalized few-shot object detection without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4527–4536.

[9]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126–1135.

Digital Library

[10]

Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. Spottune: Transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4805–4814.

[11]

Guangxing Han, Xuan Zhang, and Chongrong Li. 2017. Revisiting faster R-CNN: A deeper look at region proposal network. In Proceedings of the 24th International Conference on Neural Information Processing (ICONIP ’17), Part III 24. Springer, 14–24.

[12]

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 831–839.

[13]

Ruyi Ji, Zeyu Liu, Libo Zhang, Jianwei Liu, Xin Zuo, Yanjun Wu, Chen Zhao, Haofeng Wang, and Lin Yang. 2021. Multi-peak graph-based multi-instance learning for weakly supervised object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 2s (2021), 1–21.

Digital Library

[14]

Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8420–8429.

[15]

Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M. Bronstein. 2019. Repmet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5197–5206.

[16]

Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. 2019. Striking the right balance with uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 103–112.

[17]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems 33 (2020), 18661–18673.

[18]

Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332–1338.

[19]

Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12576–12584.

[20]

Kai Li, Yulun Zhang, Kunpeng Li, and Yun Fu. 2020. Adversarial feature hallucination networks for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13470–13479.

[21]

Yiting Li, Haiyue Zhu, Yu Cheng, Wenxin Wang, Chek Sing Teo, Cheng Xiang, Prahlad Vadakkepat, and Tong Heng Lee. 2021. Few-shot object detection via classification refinement and distractor retreatment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15395–15403.

[22]

Songtao Liu, Zeming Li, and Jian Sun. 2020. Self-EMD: Self-supervised object detection without Imagenet. arXiv preprint arXiv:2011.13677 (2020).

[23]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16), Part I 14. Springer, 21–37.

[24]

Xiaofan Luo, Fukoeng Wong, and Haifeng Hu. 2020. FIN: Feature integrated network for object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1–18.

Digital Library

[25]

Alex Nichol and John Schulman. 2018. Reptile: A scalable metalearning algorithm. arXiv Preprint arXiv:1803.02999 2, 3 (2018), 4.

[26]

Yassine Ouali, Céline Hudelot, and Myriam Tami. 2021. Spatial contrastive learning for few-shot classification. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD ’21), Part I 21. Springer, 671–686.

Digital Library

[27]

Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Learning Representations.

[28]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[29]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).

[30]

Dinggang Shen, Guorong Wu, and Heung-Il Suk. 2017. Deep learning in medical image analysis. Annual Review of Biomedical Engineering 19 (2017), 221–248.

[31]

Marcel Simon and Erik Rodner. 2015. Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 1143–1151.

Digital Library

[32]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems 30 (2017).

[33]

Bo Sun, Banghuai Li, Shengcai Cai, Ye Yuan, and Chi Zhang. 2021. FSCE: Few-shot object detection via contrastive proposal encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7352–7362.

[34]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199–1208.

[35]

Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. 2020. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20), Part XIV 16. Springer, 266–282.

Digital Library

[36]

Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv Preprint arXiv:1810.03548 (2018).

[37]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).

[38]

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5265–5274.

[39]

Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. 2020. Frustratingly simple few-shot object detection. arXiv Preprint arXiv:2003.06957 (2020).

[40]

Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, and Joseph E. Gonzalez. 2019. Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1831–1840.

[41]

Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024–3033.

[42]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019. Meta-learning to detect rare objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9925–9934.

[43]

Jiaxi Wu, Songtao Liu, Di Huang, and Yunhong Wang. 2020. Multi-scale positive sample refinement for few-shot object detection. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20), Part XVI 16. Springer, 456–472.

Digital Library

[44]

Yang Xiao, Vincent Lepetit, and Renaud Marlet. 2022. Few-shot object detection and viewpoint estimation for objects in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3090–3106.

[45]

Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. 2019. Meta R-CNN: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9577–9586.

[46]

Dongbao Yang, Yu Zhou, Wei Shi, Dayan Wu, and Weiping Wang. 2022. RD-IOD: Two-level residual-distillation-based triple-network for incremental object detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1 (2022), 1–23.

Digital Library

[47]

Yukuan Yang, Fangyun Wei, Miaojing Shi, and Guoqi Li. 2020. Restoring negative information in few-shot object detection. Advances in Neural Information Processing Systems 33 (2020), 3521–3532.

[48]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14),Part I 13. Springer, 818–833.

Cited By

Liang CZhu LYang ZChen WYang Y(2024)Noise-Tolerant Hybrid Prototypical Learning with Noisy Web DataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239620:10(1-19)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3672396
Zhu SWang Y(2024)Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detectionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104340105(104340)Online publication date: Dec-2024
https://doi.org/10.1016/j.jvcir.2024.104340

Index Terms

Boosting Few-shot Object Detection with Discriminative Representation and Class Margin
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Dual class representation learning for few-shot image classification
Abstract
Few-shot learning (FSL) models are trained on base classes that have many training examples and evaluated on novel classes that have very few training examples. Since these models cannot be properly fine-tuned on the novel classes ...
Highlights
- Proposes dual class representation learning (DCRL) for few-shot image classification.
Margin-based few-shot class-incremental learning with class-level overfitting mitigation
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Few-shot class-incremental learning (FSCIL) is designed to incrementally recognize novel classes with only few training samples after the (pre-)training on base classes with sufficient samples, which focuses on both base-class performance and novel-class ...
Confidence-Rated Multiple Instance Boosting for Object Detection
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition

Over the past years, Multiple Instance Learning (MIL) has proven to be an effective framework for learning with weakly labeled data. Applications of MIL to object detection, however, were limited to handling the uncertainties of manual annotations. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 3

March 2024

665 pages

EISSN:1551-6865

DOI:10.1145/3613614

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2023

Online AM: 12 July 2023

Accepted: 29 June 2023

Revised: 27 February 2023

Received: 29 March 2022

Published in TOMM Volume 20, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Integrated Program of National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
324
Total Downloads

Downloads (Last 12 months)160
Downloads (Last 6 weeks)18

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang CZhu LYang ZChen WYang Y(2024)Noise-Tolerant Hybrid Prototypical Learning with Noisy Web DataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239620:10(1-19)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3672396
Zhu SWang Y(2024)Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detectionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104340105(104340)Online publication date: Dec-2024
https://doi.org/10.1016/j.jvcir.2024.104340

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents