skip to main content
10.1145/3394171.3413883acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification

Published: 12 October 2020 Publication History

Abstract

Learning subtle discriminative features plays a significant role in fine-grained image classification. Existing methods usually extract the distinguishable parts through the attention module for classification. Although these learned distinguishable parts contain valuable features that are beneficial for classification, part of irrelevant features are also preserved, which may confuse the model to make a correct classification, especially for the fine-grained tasks due to their similarities. How to keep the discriminative features while removing confusable features from the distinguishable parts is an interesting yet changeling task. In this paper, we introduce a novel classification approach, named Logical-based Feature Extraction Model (LAFE for short) to address this issue. The main advantage of LAFE lies in the fact that it can explicitly add the significance of discriminative features and subtract the confusable features. Specifically, LAFE utilizes the region attention modules and channel attention modules to extract discriminative features and confusable features respectively. Based on this, two novel loss functions are designed to automatically induce attention over these features for fine-grained image classification. Our approach demonstrates its robustness, efficiency, and state-of-the-art performance on three benchmark datasets.

References

[1]
Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning. CoRR abs/2007.11622 (2020).
[2]
Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and Construction Learning for Fine-Grained Image Recognition. In CVPR. 5157--5166.
[3]
Yin Cui, Yang Song, Chen Sun, Andrew Howard, and Serge J. Belongie. 2018. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. In CVPR. 4109--4118.
[4]
Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, Yuanqing Lin, and Serge J. Belongie. 2017. Kernel Pooling for Convolutional Neural Networks. In CVPR. 3049--3058.
[5]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In CVPR. 4476--4484.
[6]
Chuang Gan, Tianbao Yang, and Boqing Gong. 2016. Learning Attributes Equals Multi-Source Domain Generalization. In CVPR. 87--97.
[7]
Yu Gao, Xintong Han, XunWang,Weilin Huang, and Matthew Scott. 2020. Channel Interaction Networks for Fine-Grained Image Categorization. In AAAI. 10818--10825.
[8]
Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In CVPR. 580--587.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[10]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In CVPR. 7132--7141.
[11]
Tao Hu and Honggang Qi. 2019. See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. CoRR abs/1901.09891 (2019).
[12]
Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel Dataset for Fine-Grained Image Categorization. In CVPR.
[13]
Jonathan Krause, Hailin Jin, Jianchao Yang, and Fei-Fei Li. 2015. Fine-grained recognition without part annotations. In CVPR. 5546--5555.
[14]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In ICCV. 554--561.
[15]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Commun. ACM. 1106--1114.
[16]
Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, ShileiWen, andWei Xu. 2017. Dynamic Computational Time for Visual Attention. In ICCV. 1199--1209.
[17]
Di Lin, Xiaoyong Shen, Cewu Lu, and Jiaya Jia. 2015. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In CVPR. 1666--1674.
[18]
Tsung-Yu Lin, Aruni Roy Chowdhury, and Subhransu Maji. 2015. Bilinear CNN Models for Fine-Grained Visual Recognition. In ICCV. 1449--1457.
[19]
Xiao Liu, Tian Xia, Jiang Wang, and Yuanqing Lin. 2016. Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition. CoRR abs/1603.06765 (2016).
[20]
Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, and Shilei Wen. 2018. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification. In CVPR. 7834--7843.
[21]
Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry Davis, Jun Li, Jian Yang, and Ser-Nam Lim. 2019. Cross-X Learning for Fine-Grained Visual Categorization. In ICCV. 8241--8250.
[22]
Jinna Lv, Wu Liu, Meng Zhang, He Gong, Bin Wu, and Huadong Ma. 2017. Multi-feature Fusion for Predicting Social Media Popularity. In ACM Multimedia. 1883--1888.
[23]
Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew B. Blaschko, and Andrea Vedaldi. 2013. Fine-Grained Visual Classification of Aircraft. CoRR abs/1306.5151 (2013).
[24]
Amir Rosenfeld and Shimon Ullman. 2016. Visual Concept Recognition and Localization via Iterative Introspection. In ACCV. 264--279.
[25]
Marcel Simon and Erik Rodner. 2015. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks. In ICCV. 1143--1151.
[26]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
[27]
Yicheng Song, Yong-Dong Zhang, Juan Cao, Tian Xia, Wu Liu, and Jin-Tao Li. 2012. Web Video Geolocation by Geotagged Social Resources. IEEE Trans. Multimedia 14, 2 (2012), 456--470.
[28]
Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition. In ECCV. 834--850.
[29]
Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In ICCV. 5348--5357.
[30]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1--9.
[31]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech- UCSD Birds-200--2011 Dataset. Technical Report.
[32]
Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition. In CVPR. 4148--4157.
[33]
Zhihui Wang, Shijie Wang, Pengbo Zhang, Haojie Li, Wei Zhong, and Jianjun Li. 2019. Weakly Supervised Fine-grained Image Classification via Correlationguided Discriminative Learning. In ACM Multimedia. 1851--1860.
[34]
Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu, and Chunhua Shen. 2018. Mask- CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit. 76 (2018), 704--714.
[35]
Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, and Bo Zhang. 2013. Hierarchical Part Matching for Fine-Grained Visual Categorization. In ICCV. 1641--1648.
[36]
Ning Xu, Hanwang Zhang, An-An Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. 2020. Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning. IEEE Trans. Multimedia 22, 5 (2020), 1372--1383.
[37]
Shulin Yang, Liefeng Bo, Jue Wang, and Linda G. Shapiro. 2012. Unsupervised Template Learning for Fine-Grained Object Recognition. In NIPS. 3131--3139.
[38]
Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. 2018. Learning to Navigate for Fine-Grained Classification. In ECCV. 438--454.
[39]
Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed M. Elgammal, and Dimitris N. Metaxas. 2016. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In CVPR. 1143--1152.
[40]
Ning Zhang, Jeff Donahue, Ross B. Girshick, and Trevor Darrell. 2014. Part-Based R-CNNs for Fine-Grained Category Detection. In ECCV. 834--849.

Cited By

View all
  • (2024)Fine-grained image classification method based on hybrid attention moduleFrontiers in Neurorobotics10.3389/fnbot.2024.139179118Online publication date: 3-May-2024
  • (2024)Research on Fine-grained visual classification method based on dual-attention feature complementationIEEE Access10.1109/ACCESS.2024.3420429(1-1)Online publication date: 2024
  • (2024)Multi-directional guidance network for fine-grained visual classificationThe Visual Computer10.1007/s00371-023-03226-w40:11(8113-8124)Online publication date: 29-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention
  2. feature fusion
  3. fine-grained image classification

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Open Project Funding of CAS-NDST Lab

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fine-grained image classification method based on hybrid attention moduleFrontiers in Neurorobotics10.3389/fnbot.2024.139179118Online publication date: 3-May-2024
  • (2024)Research on Fine-grained visual classification method based on dual-attention feature complementationIEEE Access10.1109/ACCESS.2024.3420429(1-1)Online publication date: 2024
  • (2024)Multi-directional guidance network for fine-grained visual classificationThe Visual Computer10.1007/s00371-023-03226-w40:11(8113-8124)Online publication date: 29-Jan-2024
  • (2023)A collaborative gated attention network for fine-grained visual classificationDisplays10.1016/j.displa.2023.10246879(102468)Online publication date: Sep-2023
  • (2021)AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel PruningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475228(648-657)Online publication date: 17-Oct-2021
  • (2021)Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9534004(1-8)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media