Skip to main content

Advertisement

Log in

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

  • S.I. : Deep Learning Approaches for RealTime Image Super Resolution (DLRSR)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Retail products belonging to the same category usually have extremely similar appearance characteristics such as colors, shapes, and sizes, which cannot be distinguished by conventional classification methods. Currently, the most effective way to solve this problem is fine-grained classification methods, which utilize machine vision + scene to perform fine feature representations on a target local region, thereby achieving fine-grained classification. Fine-grained classification methods have been widely used for recognizing birds, cars, airplanes, and many others. However, the existing fine-grained classification methods still have some drawbacks. In this paper, we propose an improved fine-grained classification method based on self-attention destruction and construction learning (SADCL) for retail product recognition. Specifically, the proposed method utilizes a self-attention mechanism in the destruction and construction of image information in an end-to-end fashion so that to calculate a precise fine-grained classification prediction and large information areas in the reasoning process. We test the proposed method on the Retail Product Checkout (RPC) dataset. Experimental results demonstrate that the proposed method achieved an accuracy above 80% in retail commodity recognition reasoning, which is much higher than the results of other fine-grained classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  2. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018, arXiv:1809.00287

  3. Zheng H, Fu J , Zha Z, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. arXiv:1903.06150

  4. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 5157–5166

  5. Miyato T (2018) Spectral normalization for generative adversarial networks, arXiv:1802.05957v1

  6. Zhu JY, Parck T, Isola P, Efros A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR

  7. Heusel M, Ramsauer H, Unterthiner T, Nesslerm B (2018) GANs trained by a two time-scale update rule converge to a local nash equilibrium. arXiv: 1706.08500v6

  8. Catherine w, Steve B, Peter W, Pietro P, Serge B (2011) The caltech-ucsd birds-200-2011 dataset. (CNS-TR-2011-001)

  9. Zheng H, Wang R, Ji W, Zong M, Wong WK, Lai Z, Lv H (2020) Discriminative deep multi-task learning for facial expression recognition. Inf Sci. https://doi.org/10.1016/j.ins.2020.04.041

    Article  Google Scholar 

  10. Zou F, Xiao W, Ji W, He K, Yang Z, Song J, Zhou H, Li K (2020) Arbitrary-oriented object detection via dense feature fusion and attention model for remote sensing super-resolution image. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04893-9

    Article  Google Scholar 

  11. M.Mirza, S.Osindero. Conditional Generative Adversarial Nets. arXiv:1411.1784v1, 2014

  12. Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. CoRR, abs/1306.5151

  13. Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765

  14. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. 10.1109/ICCV.2017.557

  15. Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp 4109– 4118, 2018. 2

  16. Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383

    Article  Google Scholar 

  17. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp 4438–4446

  18. Rodr´ıguez P, Gonfaus JM, Cucurull G, XavierRoca F, Gonzalez J (2018) Attend and rectify: a gated attention mechanism for fine-grained recovery. In: Proceedings of the European conference on computer vision (ECCV), pp 349–364

  19. Wei XS, Cui Q, Yang L, Wang P, Liu L (2019) RPC: a large-scale retail product checkout dataset. arXiv:1901.07249

  20. Mehdi N, Paolo F (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: Computer vision–ECCV 2016, pp 69–84, Cham, 2016. Springer International Publishing

  21. Ming S, Yuchen Y, Feng Z, Errui D (2018) Multi-attention multi-class constraint for fine-grained image recognition. pp 834–850

  22. Peng Y, He X, Zhao J (2018) Object-part attention model for fine-grained image classification. IEEE Trans Image Process 27(3):1487–1500

    Article  MathSciNet  Google Scholar 

  23. Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: 2017 IEEE international conference on computer vision, pp 511–520

  24. Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: 2015 IEEE international conference on computer vision, pp 1422–1430

  25. Lample G, Conneau A, Denoyer L, Ranzato M (2018) Unsupervised machine translation using monolingual corpora only

  26. Lin T, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: 2015 IEEE international conference on computer vision, pp 1449–1457

  27. Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 2019–2026

  28. Branson S, Horn GV, Belongie SJ, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: BMVC, 2014. 1

  29. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair D, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS

  30. Donahue J, Kr¨ahenb¨uhl P, Darrell T (2017) Adversarial feature learning. In: ICLR

  31. Lin H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. arXiv:1805.08318

  32. Yoshida Y (2017) Spectral norm regularization for improving the generalizability of deep learning, National Institute of Informatics. arXiv: 1705.10941v1, 2017

  33. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473,

  34. Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: ICLR

  35. Dziugaite GK, Ghahramani Z, Roy DM (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853

  36. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. TPAMI 40(4):834–848

    Article  Google Scholar 

  37. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: EMNLP

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61672321, 61771289, 61832012, 61802062, 51977113, and 51507084), Major Basic Research of Natural Science Foundation of Shandong Province (ZR2019ZD10), Shandong Province Key Research and Development Plan (2019GGX101050), and the Project of Department of Education of Guangdong Province (2017KQNCX209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangshun Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Cui, Y., Li, G. et al. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput & Applic 32, 14613–14622 (2020). https://doi.org/10.1007/s00521-020-05148-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05148-3

Keywords