research-article

Deep Learning for Fine-Grained Image Recognition: A Comprehensive Study

Authors:
Yuhua Kang

Harbin Institute of Technology, China

Harbin Institute of Technology, China
View Profile

,
Guoqing Chao

Harbin Institute of Technology, China

Harbin Institute of Technology, China
View Profile

,
Xin Hu

Harbin Institute of Technology, China

Harbin Institute of Technology, China
View Profile

,
Zhiying Tu

Harbin Institute of Technology, China

Harbin Institute of Technology, China
View Profile

,
Dianhui Chu

Harbin Institute of Technology, China

Harbin Institute of Technology, China
View Profile

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology ConferenceJanuary 2022Pages 31–39https://doi.org/10.1145/3512353.3512359

Published:14 March 2022Publication History

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference

Pages 31–39

ABSTRACT

In computer vision, image recognition is a noteworthy and hot research area which develops rapidly. The principal task of this technique is to automatically predict which pre-defined categories an image might belong to. Traditional image recognition targets to classify images into diversified highly distinguished categories. However, Fine-Grained Image Recognition (FGIR) aims to recognize the variances among images categorized in subordinate classes, e.g., species of birds, types of cars or species of flowers, which are equivalent to “species” in Taxonomy in certain aspects. As a result, models of FGIR are required to pick out features from finer granularity. Conventional methods apply special feature encoding to explore discernible attributes, while recent methods of FGIR makes great advancement with assistance of deep learning which has obtained the remarkable development nowadays. In this paper, we provide a new integration of the current leading FGIR models according to how they improve the development of FGIR. We classified them into five main categories and then compared their performance on three popular datasets and analyzed the results. To advance the further development of this topic, we point out some open problems worth further exploring.

References

Irving Biederman “Subordinate-level Object Classification Reexamined”. Psychological Research, 62, 131-153, 1999.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. arXiv, 409.1556, 2015.Google Scholar
Kaiming He “Deep Residual Learning for Image Recognition.” arXiv, 1512.03385, 2015.Google Scholar
Gao Huang “Densely Connected Convolutional Networks”. arXiv, 1608.06993, 2018.Google Scholar
Jie Hu “Squeeze-and-Excitation Networks”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023, 2020.Google ScholarDigital Library
Bo Zhao “A Survey on Deep Learning-based Fine-grained Object Classification and Semantic Segmentation”. International Journal of Automation and Computing, 14, 119-135, 2017.Google ScholarDigital Library
Yafei Wang and Zepeng Wang. “A Survey of Recent Work on Fine-grained Image Classification Techniques”. Journal of Visual Communication and Image Representation, 59, 210-214, 2019.Google ScholarDigital Library
Xiu Shen Wei, Jianxin Wu, and Quan Cui. “Deep Learning for Fine-Grained Image Analysis: A Survey”. arXiv, 1907.03069, 2019.Google Scholar
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNNs for Fine-grained Visual Recognition”. arXiv, 1504.07889, 2017.Google Scholar
Yen-Chi Hsu “ACE: Adaptive Confusion Energy for Natural World Data Distribution”. arXiv, 1910.12423, 2021.Google Scholar
Harald Hanselmann and Hermann Ney. “ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding”. arXiv, 1911.07344, 2019.Google Scholar
Yang Gao “Compact Bilinear Pooling”. arXiv, 1511.06062, 2016.Google Scholar
Y. Cui “Kernel Pooling for Convolutional Neural Networks”. IEEE Conference on Computer Vision and Pattern Recognition, 3049-3058, 2017.Google ScholarCross Ref
Shu Kong and Charless Fowlkes. “Low-rank Bilinear Pooling for Fine-Grained Classification”, IEEE Conference on Computer Vision and Pattern Recognition, 7025-7034, 2017.Google ScholarCross Ref
Tsung-Yu Lin and Subhransu Maji. “Improved Bilinear Pooling with CNNs”, arXiv, 1707.06772, 2017.Google ScholarCross Ref
Eric Mitchell “Higher-Order Function Networks for Learning Composable 3D Object Representations”. arXiv, 1907.10388, 2020.Google Scholar
Yaming Wang, Vlad I. Morariu, and Larry S. Davis. “Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4148-4157, 2018.Google ScholarCross Ref
Peiqin Zhuang, Yali Wang, and Yu Qiao. “Learning Attentive Pairwise Interaction for Fine-Grained Classification”. arXiv, 2002.10191, 2020.Google Scholar
Ning Zhang “Part-based RCNN for Fine Grained Detection”. arXiv, 1407.3867, 2014.Google Scholar
Tianjun Xiao “The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification”. IEEE Conference on Computer Vision and Pattern Recognition, 842-850, 2015.Google Scholar
Y. Zhang “Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation”. IEEE Transactions on Image Processing, 10(13), 4652, 2016.Google Scholar
Jianlong Fu, Heliang Zheng, and Tao Mei. “Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4438-4446, 2017.Google ScholarCross Ref
E. Gavves “Fine-Grained Categorization by Alignments”. IEEE International Conference on Computer Vision, 1713-1720, 2013.Google ScholarDigital Library
Bo Zhao “Diversified Visual Attention Networks for Fine-Grained Object Classification”. IEEE Transactions on Multimedia, 6, 1245–1256, 2017.Google ScholarDigital Library
Heliang Zheng “Learning Rich Part Hierarchies with Progressive Attention Networks for Fine Grained Image Recognition”. IEEE Transactions on Image Processing, 29, 1057-7149, 2020.Google ScholarDigital Library
Weifeng Ge, Xiangru Lin, and Yizhou Yu. “Weakly Supervised Complementary Parts Models for Fine Grained Image Classification from the Bottom Up”. IEEE Conference on Computer Vision and Pattern Recognition, 3029-3038, 2019.Google ScholarCross Ref
Zhang Wei, Chen Yu, Bai Yalong and Mei Tao. “Destruction and Construction Learning for Fine Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 5157-5166, 2019.Google Scholar
Ruoyi Du “Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches”. European Conference on Computer Vision, 23-28, 2020.Google ScholarDigital Library
Shaokang Yang “Re-rank Coarse Classification with Local Region Enhanced Features for Fine Grained Image Recognition”. arXiv, 2102.09875, 2021.Google Scholar
Dongliang Chang “‘Your “Flamingo’ is My ‘Bird’: Fine-Grained, or Not”. arXiv, 2011.09040, 2021.Google Scholar
E. D. Cubuk “Randaugment: Practical Automated Data Augmentation with a Reduced Search Space”. arXiv, 1909.13719, 2020.Google Scholar
Ryuichiro Hataya “Faster AutoAugment: Learning Augmentation Strategies using Backpropagation.” arXiv, 1911.06987, 2019.Google Scholar
Keyu Tian “Improving Auto-Augment via Augmentation-Wise Weight Sharing”. arXiv, 2009.14737v2, 2020.Google Scholar
Barret Zoph “Learning Data Augmentation Strategies for Object Detection”. European Conference on Computer Vision, 566-583, 2020.Google ScholarDigital Library
Longhui Wei “Circumventing Outliers of AutoAugment with Knowledge Distillation”. European Conference on Computer Vision, 608-625, 2020.Google ScholarDigital Library
Ross Girshick “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. arXiv, 1311.2524, 2014.Google Scholar
S. Maji “Fine-Grained Visual Classification of Aircraft”. arXiv, 1306.5151, 2013.Google Scholar
Ekin Dogus Cubuk “AutoAugment: Learning Augmentation Policies from Data”. arXiv, 1805.09501, 2019.Google Scholar
Terrance DeVries and Graham W. Taylor. “Improved Regularization of Convolutional Neural Networks with Cutout”. arXiv, 1708.04552, 2017.Google Scholar
Hiroshi Inoue. “Data Augmentation by Pairing Samples for Images Classification”. arXiv, 1801.02929, 2018.Google Scholar
C. Wah “The Caltech-UCSD Birds-200-2011 Dataset”. California Institute of Technology, 2011.Google Scholar
Jonathan Krause “3D Object Representations for Fine-Grained Categorization”. 4th International IEEE Workshop on 3D Representation and Recognition, 554-561, 2013.Google ScholarDigital Library
Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. 27th International Conference on International Conference on Machine Learning, 807-814, 2010.Google ScholarDigital Library
Christian Szegedy “Inception-v4, Inception ResNet and the Impact of Residual Connections on Learning”. arXiv, 1602.07261, 2016.Google Scholar
Harald Hanselmann and Hermann Ney. “Fine Grained Visual Classification with Efficient End-to-end Localization.” arXiv, 2005.05123, 2020.Google Scholar
H. Zheng “Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE International Conference on Computer Vision, 52-63, 2017.Google ScholarCross Ref
Jiquan Ngiam “Domain Adaptive Transfer Learning with Specialist Models”. arXiv, 1811.07056, 2018.Google Scholar
Guolei Sun “Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes”. arXiv, 1912.06842, 2019.Google Scholar
David Held, Sebastian Thrun, and Silvio Savarese. “Robust Single-View Instance Recognition”. IEEE International Conference on Robotics and Automation, 2152-2159, 2016.Google ScholarDigital Library
FH Hamker. “Life-long Learning Cell Structures Continuously Learning without Catastrophic Interference”. Neural networks: the Official Journal of the International Neural Network Society, 14, 4-5, 2001.Google ScholarDigital Library
Matthias Feurer “Efficient and Robust Automated Machine Learning”. Advances in Neural Information Processing Systems, 113-134, 2015.Google Scholar
Olga Russakovsky “ImageNet Large Scale Visual Recognition Challenge”. International Journal of Computer Vision, 115, 211-252, 2015.Google ScholarDigital Library

Recommendations

Deep learning in food category recognition
Highlights
- We analysed over 350 references from all well-famed databases.
- We provided a ...
Abstract
Integrating artificial intelligence with food category recognition has been a field of interest for research for the past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The modern ...
Read More
YNBIRDS: A System for Fine-Grained Bird Image Recognition
Pattern Recognition and Computer Vision
Abstract
Fine-grained bird image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. This paper ...
Read More
A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
Highlights
- We examine challenges and vicinity distribution to demonstrate the necessity of image augmentation for deep learning.
Abstract
Although deep learning has achieved satisfactory performance in computer vision, a large volume of images is required. However, collecting images is often expensive and challenging. Many image augmentation algorithms have been proposed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference
January 2022
239 pages
ISBN:9781450395571
DOI:10.1145/3512353

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computer Vision
Convolutional Neural Network
Deep Learning
Fine-Grained Image Recognition
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 154
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep Learning for Fine-Grained Image Recognition: A Comprehensive Study

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference

ABSTRACT

References

Cited By

Recommendations

Deep learning in food category recognition

YNBIRDS: A System for Fine-Grained Bird Image Recognition

A Comprehensive Survey of Image Augmentation Techniques for Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Deep Learning for Fine-Grained Image Recognition: A Comprehensive Study

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference

ABSTRACT

References

Cited By

Recommendations

Deep learning in food category recognition

YNBIRDS: A System for Fine-Grained Bird Image Recognition

A Comprehensive Survey of Image Augmentation Techniques for Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media