research-article

Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training

Authors:

Yuxin PengAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1372 - 1380

https://doi.org/10.1145/3240508.3240557

Published: 15 October 2018 Publication History

Abstract

The progress of fine-grained visual categorization (FGVC) benefits from the application of deep neural networks, especially convolutional neural networks (CNNs), which heavily rely on large amounts of labeled data for training. However, it is hard to obtain the accurate labels of similar fine-grained subcategories because labeling needs professional knowledge, which is labor-consuming and time-consuming. Therefore, it is appealing and significant to recognize these similar fine-grained subcategories with a few labeled samples or even only one for training, which is a highly challenging task. In this paper, we propose OLOS (Only Learn One Sample), a new data augmentation approach for fine-grained visual categorization with only one sample training, and its main novelties are: (1) A 4-stage data augmentation approach is proposed to increase both the volume and variety of the one training image, which provides more visual information with multiple views and scales. It consists of a 2-stage data generation and a 2-stage data selection. (2) The 2-stage data generation approach is proposed to produce image patches relevant to the object and its parts for the one training image, as well as produce new images conditioned on the textual descriptions of the training image. (3) The 2-stage data selection approach is proposed to conduct screening on the generated images in order that useful information is remained and noisy information is eliminated. Experimental results and analyses on fine-grained visual categorization benchmark demonstrate that our proposed OLOS approach can be applied on top of existing methods, and improves their categorization performance.

References

[1]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200--2011 dataset. 2011.

[2]

Maria Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722--729, 2008.

Digital Library

[3]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In International Conference of Computer Vision Workshop (ICCV), pages 554--561, 2013.

Digital Library

[4]

Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft. arxiv:1306.5151, 2013.

[5]

Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 842--850, 2015.

[6]

Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. Part-based r-cnns for fine-grained category detection. In International Conference on Machine Learning (ICML), pages 834--849, 2014.

[7]

Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. Bilinear cnn models for fine-grained visual recognition. In International Conference of Computer Vision (ICCV), pages 1449--1457, 2015.

Digital Library

[8]

Xiangteng He and Yuxin Peng. Fine-grained image classification via combining vision and language. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[9]

Jianlong Fu, Heliang Zheng, and Tao Mei. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[10]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436--444, 2015.

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. pages 248--255, 2009.

[12]

Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 28(4):594--611, 2006.

Digital Library

[13]

Timnit Gebru, Jonathan Krause, Jia Deng, and Li Fei-Fei. Scalable annotation of fine-grained categories without experts. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1877--1881. ACM, 2017.

Digital Library

[14]

Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. Evaluation of output embeddings for fine-grained image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2927--2936, 2015.

[15]

Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(3):453--465, 2014.

Digital Library

[16]

George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39--41, 1995.

Digital Library

[17]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems (NIPS), pages 3111--3119, 2013.

Digital Library

[18]

Lingxi Xie, Qi Tian, Meng Wang, and Bo Zhang. Spatial pooling of heterogeneous features for image classification. IEEE Transactions on Image Processing (TIP), 23(5):1994--2008, 2014.

[19]

Shenghua Gao, Ivor Wai-Hung Tsang, and Yi Ma. Learning category-specific dictionary and shared dictionary for fine-grained image categorization. IEEE Transactions on Image Processing (TIP), 23(2):623--634, 2014.

Digital Library

[20]

David G Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2):91--110, 2004.

Digital Library

[21]

Xiangteng He and Yuxin Peng. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In AAAI Conference on Artificial Intelligence (AAAI), pages 4075--4081, 2017.

[22]

Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, and Minh N Do. Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing (TIP), 25(4):1713--1725, 2016.

Digital Library

[23]

Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. Picking deep filter responses for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1134--1142, 2016.

[24]

Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, and Bo Zhang. Hierarchical part matching for fine-grained visual categorization. In International Conference of Computer Vision (ICCV), pages 1641--1648, 2013.

Digital Library

[25]

Thomas Berg and Peter Belhumeur. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 955--962, 2013.

Digital Library

[26]

Mohamed Elhoseiny, Babak Saleh, and Ahmed Elgammal. Write a classifier: Zero-shot learning using purely textual descriptions. In Proceedings of the IEEE International Conference on Computer Vision, pages 2584--2591, 2013.

Digital Library

[27]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems (NIPS), pages 1097--1105, 2012.

Digital Library

[28]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556, 2014.

[29]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.

[30]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. In International Conference on Machine Learning (ICML), pages 1060--1069, 2016.

Digital Library

[31]

Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2):154--171, 2013.

Digital Library

[32]

Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS), pages 577--584, 2003.

Digital Library

[33]

Jun Yang. Mill: A multiple instance learning library. URL http://www. cs. cmu. edu/juny/MILL, 2008.

[34]

Scott Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. Learning deep representations of fine-grained visual descriptions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 49--58, 2016.

[35]

Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. Compact bilinear pooling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 317--326, 2016.

[36]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1--9, 2015.

[37]

Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134--1142, 1984.

Digital Library

Cited By

Cui SHui B(2024)Dual-Dependency Attention Transformer for Fine-Grained Visual ClassificationSensors10.3390/s2407233724:7(2337)Online publication date: 6-Apr-2024
https://doi.org/10.3390/s24072337
Zhou QZhang KYue FZhang ZYu H(2024)Naming conventions-based multi-label and multi-task learning for fine-grained classificationInternational Conference on Algorithm, Imaging Processing, and Machine Vision (AIPMV 2023)10.1117/12.3014589(114)Online publication date: 9-Jan-2024
https://doi.org/10.1117/12.3014589
Wei JYang YGuan XXu XWang GShen H(2024)Runge-Kutta Guided Feature Augmentation for Few-Sample LearningIEEE Transactions on Multimedia10.1109/TMM.2024.336640426(7349-7358)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3366404
Show More Cited By

Index Terms

Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
ICCMS '19: Proceedings of the 11th International Conference on Computer Modeling and Simulation

Fine-grained visual categorization (FGVC) is challenging mainly due to the large intra-class confusion and small inter-class variance in terms of shape, pose, and appearance. We propose the concept of fine-grained label and that any given label can be ...
Instance-Proxy Loss for Semi-supervised Learning with Coarse Labels
Pattern Recognition and Computer Vision
Abstract
Objects are often organized in a hierarchy where coarse-grained categories are comprised of subordinate fine-grained classes. Comparing with the fine-grained labels, the coarse-grained labels are much affordable to obtain. The coarse-grained ...
Labeled Data Selection for Category Discovery
Computer Vision – ECCV 2024
Abstract
Visual category discovery methods aim to find novel categories in unlabeled visual data. At training time, a set of labeled and unlabeled images are provided, where the labels correspond to the categories present in the images. The labeled data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
253
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cui SHui B(2024)Dual-Dependency Attention Transformer for Fine-Grained Visual ClassificationSensors10.3390/s2407233724:7(2337)Online publication date: 6-Apr-2024
https://doi.org/10.3390/s24072337
Zhou QZhang KYue FZhang ZYu H(2024)Naming conventions-based multi-label and multi-task learning for fine-grained classificationInternational Conference on Algorithm, Imaging Processing, and Machine Vision (AIPMV 2023)10.1117/12.3014589(114)Online publication date: 9-Jan-2024
https://doi.org/10.1117/12.3014589
Wei JYang YGuan XXu XWang GShen H(2024)Runge-Kutta Guided Feature Augmentation for Few-Sample LearningIEEE Transactions on Multimedia10.1109/TMM.2024.336640426(7349-7358)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3366404
Lyu YJing LWang JGuo MWang XYu J(2023)Siamese transformer with hierarchical concept embedding for fine-grained image recognitionScience China Information Sciences10.1007/s11432-022-3586-y66:3Online publication date: 31-Jan-2023
https://doi.org/10.1007/s11432-022-3586-y
Huang HZhang JYu LZhang JWu QXu C(2022)TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled SamplesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.306569332:2(853-866)Online publication date: Feb-2022
https://doi.org/10.1109/TCSVT.2021.3065693
Liu YBai YChe XHe J(2022)Few-Shot Fine-Grained Image Classification: A Survey2022 4th International Conference on Natural Language Processing (ICNLP)10.1109/ICNLP55136.2022.00039(201-211)Online publication date: Mar-2022
https://doi.org/10.1109/ICNLP55136.2022.00039
Liu XWang LHan X(2022)Transformer with peak suppression and knowledge guidance for fine-grained image recognitionNeurocomputing10.1016/j.neucom.2022.04.037492:C(137-149)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.neucom.2022.04.037
Jiao QLiu ZLi GYe LWang Y(2020)Fine-Grained Image Classification with Coarse and Fine Labels on One-Shot Learning2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)10.1109/ICMEW46912.2020.9105959(1-6)Online publication date: Jul-2020
https://doi.org/10.1109/ICMEW46912.2020.9105959
Min SYao HXie HZha ZZhang YAmsaleg LHuet BLarson MGravier GHung HNgo CTsang Ooi W(2019)Domain-Specific Embedding Network for Zero-Shot RecognitionProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3351092(2070-2078)Online publication date: 15-Oct-2019
https://dl.acm.org/doi/10.1145/3343031.3351092

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents