ABSTRACT
In computer vision, image recognition is a noteworthy and hot research area which develops rapidly. The principal task of this technique is to automatically predict which pre-defined categories an image might belong to. Traditional image recognition targets to classify images into diversified highly distinguished categories. However, Fine-Grained Image Recognition (FGIR) aims to recognize the variances among images categorized in subordinate classes, e.g., species of birds, types of cars or species of flowers, which are equivalent to “species” in Taxonomy in certain aspects. As a result, models of FGIR are required to pick out features from finer granularity. Conventional methods apply special feature encoding to explore discernible attributes, while recent methods of FGIR makes great advancement with assistance of deep learning which has obtained the remarkable development nowadays. In this paper, we provide a new integration of the current leading FGIR models according to how they improve the development of FGIR. We classified them into five main categories and then compared their performance on three popular datasets and analyzed the results. To advance the further development of this topic, we point out some open problems worth further exploring.
- Irving Biederman “Subordinate-level Object Classification Reexamined”. Psychological Research, 62, 131-153, 1999.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. arXiv, 409.1556, 2015.Google Scholar
- Kaiming He “Deep Residual Learning for Image Recognition.” arXiv, 1512.03385, 2015.Google Scholar
- Gao Huang “Densely Connected Convolutional Networks”. arXiv, 1608.06993, 2018.Google Scholar
- Jie Hu “Squeeze-and-Excitation Networks”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023, 2020.Google ScholarDigital Library
- Bo Zhao “A Survey on Deep Learning-based Fine-grained Object Classification and Semantic Segmentation”. International Journal of Automation and Computing, 14, 119-135, 2017.Google ScholarDigital Library
- Yafei Wang and Zepeng Wang. “A Survey of Recent Work on Fine-grained Image Classification Techniques”. Journal of Visual Communication and Image Representation, 59, 210-214, 2019.Google ScholarDigital Library
- Xiu Shen Wei, Jianxin Wu, and Quan Cui. “Deep Learning for Fine-Grained Image Analysis: A Survey”. arXiv, 1907.03069, 2019.Google Scholar
- Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNNs for Fine-grained Visual Recognition”. arXiv, 1504.07889, 2017.Google Scholar
- Yen-Chi Hsu “ACE: Adaptive Confusion Energy for Natural World Data Distribution”. arXiv, 1910.12423, 2021.Google Scholar
- Harald Hanselmann and Hermann Ney. “ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding”. arXiv, 1911.07344, 2019.Google Scholar
- Yang Gao “Compact Bilinear Pooling”. arXiv, 1511.06062, 2016.Google Scholar
- Y. Cui “Kernel Pooling for Convolutional Neural Networks”. IEEE Conference on Computer Vision and Pattern Recognition, 3049-3058, 2017.Google ScholarCross Ref
- Shu Kong and Charless Fowlkes. “Low-rank Bilinear Pooling for Fine-Grained Classification”, IEEE Conference on Computer Vision and Pattern Recognition, 7025-7034, 2017.Google ScholarCross Ref
- Tsung-Yu Lin and Subhransu Maji. “Improved Bilinear Pooling with CNNs”, arXiv, 1707.06772, 2017.Google ScholarCross Ref
- Eric Mitchell “Higher-Order Function Networks for Learning Composable 3D Object Representations”. arXiv, 1907.10388, 2020.Google Scholar
- Yaming Wang, Vlad I. Morariu, and Larry S. Davis. “Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4148-4157, 2018.Google ScholarCross Ref
- Peiqin Zhuang, Yali Wang, and Yu Qiao. “Learning Attentive Pairwise Interaction for Fine-Grained Classification”. arXiv, 2002.10191, 2020.Google Scholar
- Ning Zhang “Part-based RCNN for Fine Grained Detection”. arXiv, 1407.3867, 2014.Google Scholar
- Tianjun Xiao “The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification”. IEEE Conference on Computer Vision and Pattern Recognition, 842-850, 2015.Google Scholar
- Y. Zhang “Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation”. IEEE Transactions on Image Processing, 10(13), 4652, 2016.Google Scholar
- Jianlong Fu, Heliang Zheng, and Tao Mei. “Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4438-4446, 2017.Google ScholarCross Ref
- E. Gavves “Fine-Grained Categorization by Alignments”. IEEE International Conference on Computer Vision, 1713-1720, 2013.Google ScholarDigital Library
- Bo Zhao “Diversified Visual Attention Networks for Fine-Grained Object Classification”. IEEE Transactions on Multimedia, 6, 1245–1256, 2017.Google ScholarDigital Library
- Heliang Zheng “Learning Rich Part Hierarchies with Progressive Attention Networks for Fine Grained Image Recognition”. IEEE Transactions on Image Processing, 29, 1057-7149, 2020.Google ScholarDigital Library
- Weifeng Ge, Xiangru Lin, and Yizhou Yu. “Weakly Supervised Complementary Parts Models for Fine Grained Image Classification from the Bottom Up”. IEEE Conference on Computer Vision and Pattern Recognition, 3029-3038, 2019.Google ScholarCross Ref
- Zhang Wei, Chen Yu, Bai Yalong and Mei Tao. “Destruction and Construction Learning for Fine Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 5157-5166, 2019.Google Scholar
- Ruoyi Du “Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches”. European Conference on Computer Vision, 23-28, 2020.Google ScholarDigital Library
- Shaokang Yang “Re-rank Coarse Classification with Local Region Enhanced Features for Fine Grained Image Recognition”. arXiv, 2102.09875, 2021.Google Scholar
- Dongliang Chang “‘Your “Flamingo’ is My ‘Bird’: Fine-Grained, or Not”. arXiv, 2011.09040, 2021.Google Scholar
- E. D. Cubuk “Randaugment: Practical Automated Data Augmentation with a Reduced Search Space”. arXiv, 1909.13719, 2020.Google Scholar
- Ryuichiro Hataya “Faster AutoAugment: Learning Augmentation Strategies using Backpropagation.” arXiv, 1911.06987, 2019.Google Scholar
- Keyu Tian “Improving Auto-Augment via Augmentation-Wise Weight Sharing”. arXiv, 2009.14737v2, 2020.Google Scholar
- Barret Zoph “Learning Data Augmentation Strategies for Object Detection”. European Conference on Computer Vision, 566-583, 2020.Google ScholarDigital Library
- Longhui Wei “Circumventing Outliers of AutoAugment with Knowledge Distillation”. European Conference on Computer Vision, 608-625, 2020.Google ScholarDigital Library
- Ross Girshick “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. arXiv, 1311.2524, 2014.Google Scholar
- S. Maji “Fine-Grained Visual Classification of Aircraft”. arXiv, 1306.5151, 2013.Google Scholar
- Ekin Dogus Cubuk “AutoAugment: Learning Augmentation Policies from Data”. arXiv, 1805.09501, 2019.Google Scholar
- Terrance DeVries and Graham W. Taylor. “Improved Regularization of Convolutional Neural Networks with Cutout”. arXiv, 1708.04552, 2017.Google Scholar
- Hiroshi Inoue. “Data Augmentation by Pairing Samples for Images Classification”. arXiv, 1801.02929, 2018.Google Scholar
- C. Wah “The Caltech-UCSD Birds-200-2011 Dataset”. California Institute of Technology, 2011.Google Scholar
- Jonathan Krause “3D Object Representations for Fine-Grained Categorization”. 4th International IEEE Workshop on 3D Representation and Recognition, 554-561, 2013.Google ScholarDigital Library
- Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. 27th International Conference on International Conference on Machine Learning, 807-814, 2010.Google ScholarDigital Library
- Christian Szegedy “Inception-v4, Inception ResNet and the Impact of Residual Connections on Learning”. arXiv, 1602.07261, 2016.Google Scholar
- Harald Hanselmann and Hermann Ney. “Fine Grained Visual Classification with Efficient End-to-end Localization.” arXiv, 2005.05123, 2020.Google Scholar
- H. Zheng “Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE International Conference on Computer Vision, 52-63, 2017.Google ScholarCross Ref
- Jiquan Ngiam “Domain Adaptive Transfer Learning with Specialist Models”. arXiv, 1811.07056, 2018.Google Scholar
- Guolei Sun “Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes”. arXiv, 1912.06842, 2019.Google Scholar
- David Held, Sebastian Thrun, and Silvio Savarese. “Robust Single-View Instance Recognition”. IEEE International Conference on Robotics and Automation, 2152-2159, 2016.Google ScholarDigital Library
- FH Hamker. “Life-long Learning Cell Structures Continuously Learning without Catastrophic Interference”. Neural networks: the Official Journal of the International Neural Network Society, 14, 4-5, 2001.Google ScholarDigital Library
- Matthias Feurer “Efficient and Robust Automated Machine Learning”. Advances in Neural Information Processing Systems, 113-134, 2015.Google Scholar
- Olga Russakovsky “ImageNet Large Scale Visual Recognition Challenge”. International Journal of Computer Vision, 115, 211-252, 2015.Google ScholarDigital Library
Recommendations
Deep learning in food category recognition
Highlights- We analysed over 350 references from all well-famed databases.
- We provided a ...
AbstractIntegrating artificial intelligence with food category recognition has been a field of interest for research for the past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The modern ...
YNBIRDS: A System for Fine-Grained Bird Image Recognition
Pattern Recognition and Computer VisionAbstractFine-grained bird image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. This paper ...
A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
Highlights- We examine challenges and vicinity distribution to demonstrate the necessity of image augmentation for deep learning.
AbstractAlthough deep learning has achieved satisfactory performance in computer vision, a large volume of images is required. However, collecting images is often expensive and challenging. Many image augmentation algorithms have been proposed ...
Comments