research-article

Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification

Authors:

Wu LiuAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 601 - 609

https://doi.org/10.1145/3394171.3413883

Published: 12 October 2020 Publication History

Abstract

Learning subtle discriminative features plays a significant role in fine-grained image classification. Existing methods usually extract the distinguishable parts through the attention module for classification. Although these learned distinguishable parts contain valuable features that are beneficial for classification, part of irrelevant features are also preserved, which may confuse the model to make a correct classification, especially for the fine-grained tasks due to their similarities. How to keep the discriminative features while removing confusable features from the distinguishable parts is an interesting yet changeling task. In this paper, we introduce a novel classification approach, named Logical-based Feature Extraction Model (LAFE for short) to address this issue. The main advantage of LAFE lies in the fact that it can explicitly add the significance of discriminative features and subtract the confusable features. Specifically, LAFE utilizes the region attention modules and channel attention modules to extract discriminative features and confusable features respectively. Based on this, two novel loss functions are designed to automatically induce attention over these features for fine-grained image classification. Our approach demonstrates its robustness, efficiency, and state-of-the-art performance on three benchmark datasets.

References

[1]

Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning. CoRR abs/2007.11622 (2020).

[2]

Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and Construction Learning for Fine-Grained Image Recognition. In CVPR. 5157--5166.

[3]

Yin Cui, Yang Song, Chen Sun, Andrew Howard, and Serge J. Belongie. 2018. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. In CVPR. 4109--4118.

[4]

Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, Yuanqing Lin, and Serge J. Belongie. 2017. Kernel Pooling for Convolutional Neural Networks. In CVPR. 3049--3058.

[5]

Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In CVPR. 4476--4484.

[6]

Chuang Gan, Tianbao Yang, and Boqing Gong. 2016. Learning Attributes Equals Multi-Source Domain Generalization. In CVPR. 87--97.

[7]

Yu Gao, Xintong Han, XunWang,Weilin Huang, and Matthew Scott. 2020. Channel Interaction Networks for Fine-Grained Image Categorization. In AAAI. 10818--10825.

[8]

Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In CVPR. 580--587.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[10]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In CVPR. 7132--7141.

[11]

Tao Hu and Honggang Qi. 2019. See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. CoRR abs/1901.09891 (2019).

[12]

Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel Dataset for Fine-Grained Image Categorization. In CVPR.

[13]

Jonathan Krause, Hailin Jin, Jianchao Yang, and Fei-Fei Li. 2015. Fine-grained recognition without part annotations. In CVPR. 5546--5555.

[14]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In ICCV. 554--561.

[15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Commun. ACM. 1106--1114.

[16]

Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, ShileiWen, andWei Xu. 2017. Dynamic Computational Time for Visual Attention. In ICCV. 1199--1209.

[17]

Di Lin, Xiaoyong Shen, Cewu Lu, and Jiaya Jia. 2015. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In CVPR. 1666--1674.

[18]

Tsung-Yu Lin, Aruni Roy Chowdhury, and Subhransu Maji. 2015. Bilinear CNN Models for Fine-Grained Visual Recognition. In ICCV. 1449--1457.

[19]

Xiao Liu, Tian Xia, Jiang Wang, and Yuanqing Lin. 2016. Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition. CoRR abs/1603.06765 (2016).

[20]

Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, and Shilei Wen. 2018. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification. In CVPR. 7834--7843.

[21]

Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry Davis, Jun Li, Jian Yang, and Ser-Nam Lim. 2019. Cross-X Learning for Fine-Grained Visual Categorization. In ICCV. 8241--8250.

[22]

Jinna Lv, Wu Liu, Meng Zhang, He Gong, Bin Wu, and Huadong Ma. 2017. Multi-feature Fusion for Predicting Social Media Popularity. In ACM Multimedia. 1883--1888.

[23]

Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew B. Blaschko, and Andrea Vedaldi. 2013. Fine-Grained Visual Classification of Aircraft. CoRR abs/1306.5151 (2013).

[24]

Amir Rosenfeld and Shimon Ullman. 2016. Visual Concept Recognition and Localization via Iterative Introspection. In ACCV. 264--279.

[25]

Marcel Simon and Erik Rodner. 2015. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks. In ICCV. 1143--1151.

[26]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[27]

Yicheng Song, Yong-Dong Zhang, Juan Cao, Tian Xia, Wu Liu, and Jin-Tao Li. 2012. Web Video Geolocation by Geotagged Social Resources. IEEE Trans. Multimedia 14, 2 (2012), 456--470.

Digital Library

[28]

Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition. In ECCV. 834--850.

[29]

Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In ICCV. 5348--5357.

[30]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1--9.

[31]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech- UCSD Birds-200--2011 Dataset. Technical Report.

[32]

Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition. In CVPR. 4148--4157.

[33]

Zhihui Wang, Shijie Wang, Pengbo Zhang, Haojie Li, Wei Zhong, and Jianjun Li. 2019. Weakly Supervised Fine-grained Image Classification via Correlationguided Discriminative Learning. In ACM Multimedia. 1851--1860.

[34]

Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu, and Chunhua Shen. 2018. Mask- CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit. 76 (2018), 704--714.

Digital Library

[35]

Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, and Bo Zhang. 2013. Hierarchical Part Matching for Fine-Grained Visual Categorization. In ICCV. 1641--1648.

[36]

Ning Xu, Hanwang Zhang, An-An Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. 2020. Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning. IEEE Trans. Multimedia 22, 5 (2020), 1372--1383.

[37]

Shulin Yang, Liefeng Bo, Jue Wang, and Linda G. Shapiro. 2012. Unsupervised Template Learning for Fine-Grained Object Recognition. In NIPS. 3131--3139.

[38]

Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. 2018. Learning to Navigate for Fine-Grained Classification. In ECCV. 438--454.

[39]

Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed M. Elgammal, and Dimitris N. Metaxas. 2016. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In CVPR. 1143--1152.

[40]

Ning Zhang, Jeff Donahue, Ross B. Girshick, and Trevor Darrell. 2014. Part-Based R-CNNs for Fine-Grained Category Detection. In ECCV. 834--849.

Cited By

Lu WYang YYang L(2024)Fine-grained image classification method based on hybrid attention moduleFrontiers in Neurorobotics10.3389/fnbot.2024.139179118Online publication date: 3-May-2024
https://doi.org/10.3389/fnbot.2024.1391791
Li KHuang MYu XYang C(2024)Research on Fine-grained visual classification method based on dual-attention feature complementationIEEE Access10.1109/ACCESS.2024.3420429(1-1)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420429
Yang SJin YLei JZhang S(2024)Multi-directional guidance network for fine-grained visual classificationThe Visual Computer10.1007/s00371-023-03226-w40:11(8113-8124)Online publication date: 29-Jan-2024
https://doi.org/10.1007/s00371-023-03226-w
Show More Cited By

Index Terms

Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Feature relocation network for fine-grained image classification
Abstract
In fine-grained image classification, there are only very subtle differences between classes. It is challenging to learn local discriminative features and remove distractive features in fine-grained image classification. Existing fine-...
Exploring Category-Shared and Category-Specific Features for Fine-Grained Image Classification
Pattern Recognition and Computer Vision
Abstract
The attention mechanism is one of the most vital branches to solve fine-grained image classification (FGIC) tasks, while most existing attention-based methods only focus on inter-class variance and barely model the intra-class similarity. They ...
Symmetrical irregular local features for fine-grained visual classification
Abstract
Fine-grained visual classification (FGVC) has small inter-class variations and large intra-class variations, therefore, recognizing sub-classes belonging to the same meta-class is a difficult task. Recent studies have primarily addressed this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Open Project Funding of CAS-NDST Lab

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
425
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu WYang YYang L(2024)Fine-grained image classification method based on hybrid attention moduleFrontiers in Neurorobotics10.3389/fnbot.2024.139179118Online publication date: 3-May-2024
https://doi.org/10.3389/fnbot.2024.1391791
Li KHuang MYu XYang C(2024)Research on Fine-grained visual classification method based on dual-attention feature complementationIEEE Access10.1109/ACCESS.2024.3420429(1-1)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420429
Yang SJin YLei JZhang S(2024)Multi-directional guidance network for fine-grained visual classificationThe Visual Computer10.1007/s00371-023-03226-w40:11(8113-8124)Online publication date: 29-Jan-2024
https://doi.org/10.1007/s00371-023-03226-w
Zhu QKuang WLi Z(2023)A collaborative gated attention network for fine-grained visual classificationDisplays10.1016/j.displa.2023.10246879(102468)Online publication date: Sep-2023
https://doi.org/10.1016/j.displa.2023.102468
Zhang HLiu LZhou HHou WSun HZheng NShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel PruningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475228(648-657)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475228
Song JYang R(2021)Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9534004(1-8)Online publication date: 2021
https://doi.org/10.1109/IJCNN52387.2021.9534004

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten