skip to main content
10.1145/3503161.3548412acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method

Published: 10 October 2022 Publication History

Abstract

Few-shot object detection (FSOD) is to detect objects with a few examples. However, existing FSOD methods do not consider hierarchical fine-grained category structures of objects that exist widely in real life. For example, animals are taxonomically classified into orders, families, genera and species etc. In this paper, we propose and solve a new problem called hierarchical few-shot object detection (Hi-FSOD), which aims to detect objects with hierarchical categories in the FSOD paradigm. To this end, on the one hand, we build the first large-scale and high-quality Hi-FSOD benchmark dataset HiFSOD-Bird, which contains 176,350 wild-bird images falling to 1,432 categories. All the categories are organized into a 4-level taxonomy, consisting of 32 orders, 132 families, 572 genera and 1,432 species. On the other hand, we propose the first Hi-FSOD method HiCLPL, where a hierarchical contrastive learning approach is developed to constrain the feature space so that the feature distribution of objects is consistent with the hierarchical taxonomy and the model's generalization power is strengthened. Meanwhile, a probabilistic loss is designed to enable the child nodes to correct the classification errors of their parent nodes in the taxonomy. Extensive experiments on the benchmark dataset HiFSOD-Bird show that our method HiCLPL outperforms the existing FSOD methods.

Supplementary Material

MP4 File (MM22-fp3100.mp4)
Existing FSOD methods do not consider hierarchical fine-grained category structures of objects that exist widely in real life. For example, animals are taxonomically classified into orders, families, genera and species etc. In this paper, we propose and solve a new problem called hierarchical few-shot object detection (Hi-FSOD), which aims to detect objects with hierarchical categories in the FSOD paradigm.

References

[1]
Björn Barz and Joachim Denzler. 2019. Hierarchy-Based Image Embeddings for Semantic Image Retrieval. In 2019 IEEE Winter Conference on Applications of Computer Vision. 638--647. https://doi.org/10.1109/WACV.2019.00073
[2]
Samy Bengio, Jason Weston, and David Grangier. 2010. Label Embedding Trees for Large Multi-Class Tasks. In Advances in Neural Information Processing Systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/file/06138bc5af6023646ede0e1f7c1eac75-Paper.pdf
[3]
Luca Bertinetto, Romain Mueller, Konstantinos Tertikas, Sina Samangooei, and Nicholas A. Lord. 2020. Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
Jia Deng, Sanjeev Satheesh, Alexander Berg, and Fei Li. 2011. Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.), Vol. 24. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2011/file/5a4b25aaed25c2ee1b74de72dc03c14e-Paper.pdf
[5]
Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2018. Maximum-Entropy Fine Grained Classification. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/0c74b7f78409a4022a2c4c5a5ca3ee19-Paper.pdf
[6]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88, 2 (2010), 303--338.
[7]
Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Zhibo Fan, Yuchen Ma, Zeming Li, and Jian Sun. 2021. Generalized Few-Shot Object Detection Without Forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4527--4536.
[9]
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marctextquotesingle Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf
[10]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Yu Gao, Xintong Han, Xun Wang, Weilin Huang, and Matthew Scott. 2020. Channel Interaction Networks for Fine-Grained Image Categorization. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 07 (Apr. 2020), 10818--10825. https://doi.org/10.1609/aaai.v34i07.6712
[12]
Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, and hongsheng Li. 2020. Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 11309--11321. https://proceedings.neurips.cc/paper/2020/file/821fa74b50ba3f7cba1e6c53e8fa6845-Paper.pdf
[13]
Gregory Griffin and Pietro Perona. 2008. Learning and using taxonomies for fast visual categorization. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8. https://doi.org/10.1109/CVPR.2008.4587410
[14]
Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, and Shih-Fu Chang. 2022. Few-Shot Object Detection With Fully Cross-Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5321--5330.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16]
Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-Stacked CNN for Fine-Grained Visual Categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-Shot Object Detection via Feature Reweighting. In ICCV.
[18]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18661--18673. https://proceedings.neurips.cc/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf
[19]
Frank Klinker. 2010. Exponential moving average versus moving exponential average. Mathematische Semesterberichte, Vol. 58, 1 (dec 2010), 97--107. https://doi.org/10.1007/s00591-010-0080--8
[20]
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV.
[22]
Tianying Liu, Lu Zhang, Yang Wang, Jihong Guan, Yanwei Fu, and Shuigeng Zhou. 2022. An Empirical Study and Comparison of Recent Few-Shot Object Detection Algorithms. arXiv preprint arXiv:2203.14205 (2022).
[23]
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2018. Object-Part Attention Model for Fine-Grained Image Classification. IEEE Transactions on Image Processing, Vol. 27, 3 (2018), 1487--1500. https://doi.org/10.1109/TIP.2017.2774041
[24]
Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, and Chi Zhang. 2021. DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8681--8690.
[25]
Yanyun Qu, Li Lin, Fumin Shen, Chang Lu, Yang Wu, Yuan Xie, and Dacheng Tao. 2017. Joint Hierarchical Category Structure Learning and Large-Scale Image Classification. IEEE Transactions on Image Processing, Vol. 26, 9 (2017), 4331--4346. https://doi.org/10.1109/TIP.2016.2615423
[26]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
[27]
Bo Sun, Banghuai Li, Shengcai Cai, Ye Yuan, and Chi Zhang. 2021. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7348--7358. https://doi.org/10.1109/CVPR46437.2021.00727
[28]
Guolei Sun, Hisham Cholakkal, Salman Khan, Fahad Khan, and Ling Shao. 2020. Fine-Grained Recognition: Accounting for Subtle Differences between Similar Classes. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 07 (Apr. 2020), 12047--12054. https://doi.org/10.1609/aaai.v34i07.6882
[29]
Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition. In Proceedings of the European Conference on Computer Vision (ECCV).
[30]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.
[31]
Xin Wang, Thomas Huang, Joseph Gonzalez, Trevor Darrell, and Fisher Yu. 2020. Frustratingly Simple Few-Shot Object Detection. In Proceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Hal Daumé III and Aarti Singh (Eds.), Vol. 119. PMLR, 9919--9928. https://proceedings.mlr.press/v119/wang20j.html
[32]
Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu, and Chunhua Shen. 2018. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, Vol. 76 (2018), 704--714. https://doi.org/10.1016/j.patcog.2017.10.002
[33]
Sam Wiseman and Alexander M. Rush. 2016. Sequence-to-Sequence Learning as Beam-Search Optimization. In EMNLP. 1296--1306. http://aclweb.org/anthology/D/D16/D16--1137.pdf
[34]
Hui Wu, Michele Merler, Rosario Uceda-Sosa, and John R Smith. 2016. Learning to make better mistakes: Semantics-aware visual food recognition. In Proceedings of the 24th ACM international conference on Multimedia. 172--176.
[35]
Jiaxi Wu, Songtao Liu, Di Huang, and Yunhong Wang. 2020a. Multi-scale Positive Sample Refinement for Few-Shot Object Detection. In Computer Vision -- ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 456--472.
[36]
Xiongwei Wu, Doyen Sahoo, and Steven Hoi. 2020b. Meta-RCNN: Meta Learning for Few-Shot Object Detection. Association for Computing Machinery, New York, NY, USA, 1679--1687. https://doi.org/10.1145/3394171.3413832
[37]
Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata. 2019. Zero-Shot Learning-A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 9 (2019), 2251--2265. https://doi.org/10.1109/TPAMI.2018.2857768
[38]
Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. 2021. AP-10K: A Benchmark for Animal Pose Estimation in the Wild. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=rH8yliN6C83
[39]
Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, and Shijian Lu. 2021a. Meta-DETR: Image-Level Few-Shot Object Detection with Inter-Class Correlation Exploitation. https://doi.org/10.48550/ARXIV.2103.11731
[40]
Lu Zhang, Shuigeng Zhou, Jihong Guan, and Ji Zhang. 2021b. Accurate Few-shot Object Detection with Support-Query Mutual Guidance and Hybrid Loss. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14419--14427. https://doi.org/10.1109/CVPR46437.2021.01419
[41]
Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. 2017. Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[42]
Chenchen Zhu, Fangyi Chen, Uzair Ahmed, Zhiqiang Shen, and Marios Savvides. 2021a. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8782--8791.
[43]
Linchao Zhu and Yi Yang. 2020. Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021b. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.
[45]
Peiqin Zhuang, Yali Wang, and Yu Qiao. 2018. WildFish: A Large Benchmark for Fish Recognition in the Wild. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1301--1309. https://doi.org/10.1145/3240508.3240616

Cited By

View all
  • (2025)GCSTG: Generating Class-Confusion-Aware Samples With a Tree-Structure Graph for Few-Shot Object DetectionIEEE Transactions on Image Processing10.1109/TIP.2025.353079234(772-784)Online publication date: 2025
  • (2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
  • (2024)Few-shot object detectionInformation Fusion10.1016/j.inffus.2024.102307107:COnline publication date: 2-Jul-2024
  • Show More Cited By

Index Terms

  1. Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. benchmark
    2. few-shot object detection
    3. hierarchical classification
    4. hierarchical few-shot object detection

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC
    • Key R&D Projects of the Ministry of Science and Technology of China

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)GCSTG: Generating Class-Confusion-Aware Samples With a Tree-Structure Graph for Few-Shot Object DetectionIEEE Transactions on Image Processing10.1109/TIP.2025.353079234(772-784)Online publication date: 2025
    • (2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
    • (2024)Few-shot object detectionInformation Fusion10.1016/j.inffus.2024.102307107:COnline publication date: 2-Jul-2024
    • (2023)Recent Few-shot Object Detection Algorithms: A Survey with Performance ComparisonACM Transactions on Intelligent Systems and Technology10.1145/359358814:4(1-36)Online publication date: 15-Jun-2023
    • (2023)Slowfast Diversity-aware Prototype Learning for Egocentric Action RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612144(7549-7558)Online publication date: 26-Oct-2023
    • (2023)Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and TransformerProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591705(1262-1272)Online publication date: 19-Jul-2023
    • (2023)HiGOD: Hierarchical Multi-granularity Object Detection based on Prior Label Knowledge Structure for Open World Perception2023 5th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI)10.1109/RICAI60863.2023.10488957(1134-1140)Online publication date: 1-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media