A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

Wang, Wenyong; Cui, Yongcheng; Li, Guangshun; Jiang, Chuntao; Deng, Song

doi:10.1007/s00521-020-05148-3

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

S.I. : Deep Learning Approaches for RealTime Image Super Resolution (DLRSR)
Published: 10 July 2020

Volume 32, pages 14613–14622, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Wenyong Wang¹,
Yongcheng Cui¹,
Guangshun Li ORCID: orcid.org/0000-0001-6147-0637²,
Chuntao Jiang³ &
…
Song Deng⁴

1509 Accesses
3 Altmetric
Explore all metrics

Abstract

Retail products belonging to the same category usually have extremely similar appearance characteristics such as colors, shapes, and sizes, which cannot be distinguished by conventional classification methods. Currently, the most effective way to solve this problem is fine-grained classification methods, which utilize machine vision + scene to perform fine feature representations on a target local region, thereby achieving fine-grained classification. Fine-grained classification methods have been widely used for recognizing birds, cars, airplanes, and many others. However, the existing fine-grained classification methods still have some drawbacks. In this paper, we propose an improved fine-grained classification method based on self-attention destruction and construction learning (SADCL) for retail product recognition. Specifically, the proposed method utilizes a self-attention mechanism in the destruction and construction of image information in an end-to-end fashion so that to calculate a precise fine-grained classification prediction and large information areas in the reasoning process. We test the proposed method on the Retail Product Checkout (RPC) dataset. Experimental results demonstrate that the proposed method achieved an accuracy above 80% in retail commodity recognition reasoning, which is much higher than the results of other fine-grained classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GRVT: Toward Effective Grocery Recognition via Vision Transformer

Large-Scale Product Classification via Spatial Attention Based CNN Learning and Multi-class Regression

Bag of Tricks for Retail Product Image Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018, arXiv:1809.00287
Zheng H, Fu J , Zha Z, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. arXiv:1903.06150
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 5157–5166
Miyato T (2018) Spectral normalization for generative adversarial networks, arXiv:1802.05957v1
Zhu JY, Parck T, Isola P, Efros A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR
Heusel M, Ramsauer H, Unterthiner T, Nesslerm B (2018) GANs trained by a two time-scale update rule converge to a local nash equilibrium. arXiv: 1706.08500v6
Catherine w, Steve B, Peter W, Pietro P, Serge B (2011) The caltech-ucsd birds-200-2011 dataset. (CNS-TR-2011-001)
Zheng H, Wang R, Ji W, Zong M, Wong WK, Lai Z, Lv H (2020) Discriminative deep multi-task learning for facial expression recognition. Inf Sci. https://doi.org/10.1016/j.ins.2020.04.041
Article Google Scholar
Zou F, Xiao W, Ji W, He K, Yang Z, Song J, Zhou H, Li K (2020) Arbitrary-oriented object detection via dense feature fusion and attention model for remote sensing super-resolution image. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04893-9
Article Google Scholar
M.Mirza, S.Osindero. Conditional Generative Adversarial Nets. arXiv:1411.1784v1, 2014
Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. CoRR, abs/1306.5151
Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. 10.1109/ICCV.2017.557
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp 4109– 4118, 2018. 2
Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383
Article Google Scholar
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp 4438–4446
Rodr´ıguez P, Gonfaus JM, Cucurull G, XavierRoca F, Gonzalez J (2018) Attend and rectify: a gated attention mechanism for fine-grained recovery. In: Proceedings of the European conference on computer vision (ECCV), pp 349–364
Wei XS, Cui Q, Yang L, Wang P, Liu L (2019) RPC: a large-scale retail product checkout dataset. arXiv:1901.07249
Mehdi N, Paolo F (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: Computer vision–ECCV 2016, pp 69–84, Cham, 2016. Springer International Publishing
Ming S, Yuchen Y, Feng Z, Errui D (2018) Multi-attention multi-class constraint for fine-grained image recognition. pp 834–850
Peng Y, He X, Zhao J (2018) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500
Article MathSciNet Google Scholar
Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: 2017 IEEE international conference on computer vision, pp 511–520
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: 2015 IEEE international conference on computer vision, pp 1422–1430
Lample G, Conneau A, Denoyer L, Ranzato M (2018) Unsupervised machine translation using monolingual corpora only
Lin T, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: 2015 IEEE international conference on computer vision, pp 1449–1457
Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 2019–2026
Branson S, Horn GV, Belongie SJ, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: BMVC, 2014. 1
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair D, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS
Donahue J, Kr¨ahenb¨uhl P, Darrell T (2017) Adversarial feature learning. In: ICLR
Lin H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. arXiv:1805.08318
Yoshida Y (2017) Spectral norm regularization for improving the generalizability of deep learning, National Institute of Informatics. arXiv: 1705.10941v1, 2017
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473,
Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: ICLR
Dziugaite GK, Ghahramani Z, Roy DM (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. TPAMI 40(4):834–848
Article Google Scholar
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: EMNLP

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61672321, 61771289, 61832012, 61802062, 51977113, and 51507084), Major Basic Research of Natural Science Foundation of Shandong Province (ZR2019ZD10), Shandong Province Key Research and Development Plan (2019GGX101050), and the Project of Department of Education of Guangdong Province (2017KQNCX209).

Author information

Authors and Affiliations

College of Information Science and Technology, Northeast Normal University, Changchun, China
Wenyong Wang & Yongcheng Cui
School of Information Science and Engineering, Qufu Normal University, Rizhao, China
Guangshun Li
School of Mathematics and Big Data, Foshan University, Foshan, China
Chuntao Jiang
Institute of Advanced Technology, Nanjing University of Post and Telecommunication, Nanjing, China
Song Deng

Authors

Wenyong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongcheng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Guangshun Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuntao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Song Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangshun Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Cui, Y., Li, G. et al. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput & Applic 32, 14613–14622 (2020). https://doi.org/10.1007/s00521-020-05148-3

Download citation

Received: 11 January 2020
Accepted: 17 June 2020
Published: 10 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00521-020-05148-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GRVT: Toward Effective Grocery Recognition via Vision Transformer

Large-Scale Product Classification via Spatial Attention Based CNN Learning and Multi-class Regression

Bag of Tricks for Retail Product Image Classification

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now