Abstract
Retail products belonging to the same category usually have extremely similar appearance characteristics such as colors, shapes, and sizes, which cannot be distinguished by conventional classification methods. Currently, the most effective way to solve this problem is fine-grained classification methods, which utilize machine vision + scene to perform fine feature representations on a target local region, thereby achieving fine-grained classification. Fine-grained classification methods have been widely used for recognizing birds, cars, airplanes, and many others. However, the existing fine-grained classification methods still have some drawbacks. In this paper, we propose an improved fine-grained classification method based on self-attention destruction and construction learning (SADCL) for retail product recognition. Specifically, the proposed method utilizes a self-attention mechanism in the destruction and construction of image information in an end-to-end fashion so that to calculate a precise fine-grained classification prediction and large information areas in the reasoning process. We test the proposed method on the Retail Product Checkout (RPC) dataset. Experimental results demonstrate that the proposed method achieved an accuracy above 80% in retail commodity recognition reasoning, which is much higher than the results of other fine-grained classification methods.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018, arXiv:1809.00287
Zheng H, Fu J , Zha Z, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. arXiv:1903.06150
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 5157–5166
Miyato T (2018) Spectral normalization for generative adversarial networks, arXiv:1802.05957v1
Zhu JY, Parck T, Isola P, Efros A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR
Heusel M, Ramsauer H, Unterthiner T, Nesslerm B (2018) GANs trained by a two time-scale update rule converge to a local nash equilibrium. arXiv: 1706.08500v6
Catherine w, Steve B, Peter W, Pietro P, Serge B (2011) The caltech-ucsd birds-200-2011 dataset. (CNS-TR-2011-001)
Zheng H, Wang R, Ji W, Zong M, Wong WK, Lai Z, Lv H (2020) Discriminative deep multi-task learning for facial expression recognition. Inf Sci. https://doi.org/10.1016/j.ins.2020.04.041
Zou F, Xiao W, Ji W, He K, Yang Z, Song J, Zhou H, Li K (2020) Arbitrary-oriented object detection via dense feature fusion and attention model for remote sensing super-resolution image. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04893-9
M.Mirza, S.Osindero. Conditional Generative Adversarial Nets. arXiv:1411.1784v1, 2014
Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. CoRR, abs/1306.5151
Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. 10.1109/ICCV.2017.557
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp 4109– 4118, 2018. 2
Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp 4438–4446
Rodr´ıguez P, Gonfaus JM, Cucurull G, XavierRoca F, Gonzalez J (2018) Attend and rectify: a gated attention mechanism for fine-grained recovery. In: Proceedings of the European conference on computer vision (ECCV), pp 349–364
Wei XS, Cui Q, Yang L, Wang P, Liu L (2019) RPC: a large-scale retail product checkout dataset. arXiv:1901.07249
Mehdi N, Paolo F (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: Computer vision–ECCV 2016, pp 69–84, Cham, 2016. Springer International Publishing
Ming S, Yuchen Y, Feng Z, Errui D (2018) Multi-attention multi-class constraint for fine-grained image recognition. pp 834–850
Peng Y, He X, Zhao J (2018) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500
Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: 2017 IEEE international conference on computer vision, pp 511–520
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: 2015 IEEE international conference on computer vision, pp 1422–1430
Lample G, Conneau A, Denoyer L, Ranzato M (2018) Unsupervised machine translation using monolingual corpora only
Lin T, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: 2015 IEEE international conference on computer vision, pp 1449–1457
Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 2019–2026
Branson S, Horn GV, Belongie SJ, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: BMVC, 2014. 1
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair D, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS
Donahue J, Kr¨ahenb¨uhl P, Darrell T (2017) Adversarial feature learning. In: ICLR
Lin H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. arXiv:1805.08318
Yoshida Y (2017) Spectral norm regularization for improving the generalizability of deep learning, National Institute of Informatics. arXiv: 1705.10941v1, 2017
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473,
Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: ICLR
Dziugaite GK, Ghahramani Z, Roy DM (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. TPAMI 40(4):834–848
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: EMNLP
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61672321, 61771289, 61832012, 61802062, 51977113, and 51507084), Major Basic Research of Natural Science Foundation of Shandong Province (ZR2019ZD10), Shandong Province Key Research and Development Plan (2019GGX101050), and the Project of Department of Education of Guangdong Province (2017KQNCX209).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, W., Cui, Y., Li, G. et al. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput & Applic 32, 14613–14622 (2020). https://doi.org/10.1007/s00521-020-05148-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05148-3