Lightweight identification of retail products based on improved convolutional neural network

Wang, Junjie; Huang, Chengwei; Zhao, Liye; Li, Zhi

doi:10.1007/s11042-022-12872-6

Lightweight identification of retail products based on improved convolutional neural network

Published: 09 April 2022

Volume 81, pages 31313–31328, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Junjie Wang¹,
Chengwei Huang²,
Liye Zhao¹ &
…
Zhi Li³

419 Accesses
1 Altmetric
Explore all metrics

Abstract

Due to the similar appearances among many retail products, it is a big challenge to identify the product with high accuracy and low computational cost in smart retail scenes. In this paper, we proposed a lightweight retail product identification and localization method based on an improved convolutional neural network. First, we use group convolution and deep separable convolution to optimize the structure of the backbone network and reduce the amount of calculation. Second, the multiscale structure was adjusted to optimal scales. We further use the k-means clustering algorithm to re-cluster six anchors with different sizes. Third, we introduced spatial pyramid pooling (SPP) to replace pooling by convolution to effectively improve the robustness against image distortion, such as cropping and scaling. Finally, we use mosaic data enhancement method to improved the robustness of the network. Experiments on the RPC dataset show that, compared with YOLOv5, the number of parameters is reduced by 1/6.4 times, and FLOPs is reduced by 1/9 times. Experiments on the DeepBlue Retail Dataset show that compared with YOLOv5, the number of parameters is reduced by 1/7.8 times, and FLOPs is reduced by 1/9.3 times. Realtime evaluation under the same hardware show that the FPS of the proposed model is 123 in the forward inference test, while the FPS of the YOLOv5 model under the same conditions is 58.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

Retail Product Classification on Distinct Distribution of Training and Evaluation Data

Article 18 March 2022

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

References

Baz I, Yoruk E, Cetin M (2016) Context-aware hybrid classification system for fine-grained retail product recognition. In: 2016 IEEE 12th image, video, and multidimensional signal processing workshop, Bordeaux, France, pp 1–5
Bochkovskiy A, Wang CY , Liao H (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934
Chong T, Bustan I, Wee M (2016) Deep learning approach to planogram compliance in retail stores. Semantic Scholar, pp 1–6
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition, San Diego, CA, pp 886–893
Efraty B, Huang C, Shah SK, Kakadiaris IA (2011) Facial landmark detection in uncontrolled conditions. In: 2011 International joint conference on biometrics, pp 1–8
Farren D (2017) Classifying food items by image using Convolutional Neural Networks
Geng W, Han F, Lin J et al (2018) Fine-grained grocery product recognition by one-shot learning. In: Proceedings of the 26th ACM International conference on multimedia, Republic of Seoul, Korea, pp 1706–1714
Girshick R et al (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, pp 580–587
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision, Santiago, Chile, pp 1440–1448
He K et al (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Howard AG et al (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861
Huang C, Jiang H (2019) Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78 (15):20679–20695
Article Google Scholar
Huang C, Efraty BA, Kurkure U, Papadakis M, Shah SK, Kakadiaris IA (2012) Facial landmark configuration for improved detection. In: 2012 IEEE International workshop on information forensics and security, pp 13–18
Huang C, Jin Y, Zhao Y, Yu Y, Zhao L (2009) Speech emotion recognition based on re-composition of two-class classifiers. In: 2009 3rd International conference on affective computing and intelligent interaction and workshops, pp 1–3
Huang C et al (2013) Practical speech emotion recognition based on online learning: From acted data to elicited data. Mathematical Problems in Engineering
Huang CW, Jin Y, Zhao Y, Yu YH, Zhao L (2010) Design and establishment of practical speech emotion database. Tech Acoust 29(4):396–399
Google Scholar
Huang C, Jiang H (2019) Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78 (15):20679–20695
Article Google Scholar
Jin Y, Zhao Y, Huang C, Zhao L (2010) The design and establishment of a Chinese whispered speech emotion database. Tech Acoust 29(1):63–68
Google Scholar
Jin Y, Zhao Y, Huang C, Zhao L (2009) Study on the emotion recognition of whispered speech. In: 2009 WRI global congress on intelligent systems, vol 3, pp 242–246
Jin Y, Zhao Y, Huang C, Zhao L (2010) The design and establishment of a Chinese whispered speech emotion database. Tech Acoust 29(1):63–68
Google Scholar
Jund P, Abdo N, Eitel A et al (2016) The Freiburg groceries dataset. arXiv preprint, arXiv:1611.05799
Karlinsky L, Shtok J, Tzur Y et al (2017) Fine-grained recognition of thousands of object categories with single-example training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4113–4122
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Minneapolis, MN, USA, pp 1–8
Kumar K, Shrimankar D (2018) F-DES: Fast and Deep Event Summarization. IEEE Trans Multimed 20(2):323–334
Article Google Scholar
Kumar K, Shrimankar D et al (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77:7383–7404
Article Google Scholar
Kumar K, Shrimankar D (2018) Deep Event Learning boosT-up Approach: DELTA. Multimed Tools and Appl 77:26635–26655
Article Google Scholar
Kumar K (2021) Text query based summarized event searching interface system using deep learning over cloud. Multimed Tools and Appl 80:11079–11094
Article Google Scholar
Kumar K, Sinha S, Manupriya P D-pnr: Deep license plate number recognition. Proceedings of 2nd International Conference on Computer Vision & Image Processing, pp 37–46, (2018)
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: Binary Robust invariant scalable keypoints. In: 2011 International conference on computer vision, Barcelona, Spain, pp 2548–2555
Lin T et al (2017) Feature Pyramid Networks for Object Detection. In: 2017 IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 936–944
Liu L, Zhou B, Zou Z et al (2018) A smart unstaffed retail shop based on artificial intelligence and IoT. 2018 IEEE 23rd International workshop on computer aided modeling and design of communication links and networks (CAMAD), pp 1–4
Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Luo V, Huang C et al (2013) Emotional feature analysis and recognition from Vietnamese speech. J Signal Process 29(10):1423–1432
Google Scholar
Milella A et al (2021) 3D Vision-Based Shelf Monitoring System for Intelligent Retail, ICPR International Workshops and Challenges, Milan, Italy, pp 447–459
Merler M, Galleguillos C, Belongie S (2007) Recognizing groceries in situ using in vitro training data. In: 2007 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2007.383486, pp 1–8
Paolanti M et al (2019) Robotic retail surveying by deep learning visual and textual data. Robot Auton Syst 118:179–188
Article Google Scholar
Ren S et al (2017) Faster R-CNN: towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Redmon J et al (2016) You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 7263–7271
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv:1804.02767
Santra B, Mukherjee DP (2019) A comprehensive survey on computer vision based approaches for automatic identification of products in retail store. Image Vis Comput 86:45–63
Article Google Scholar
Shankar V et al (2021) How Technology is Changing Retail. J Retail 97(1):13–27
Article Google Scholar
Sharma S, Kumar K, Singh N (2017) D-FES: Deep facial expression recognition system, 2017 Conference on Information and Communication Technology (CICT), pp 1–6. https://doi.org/10.1109/INFOCOMTECH.2017.8340635
Sharma S, Kumar K (2021) ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks. Multimed Tools and Appl 80:26319–26331
Article Google Scholar
Singh N, Dhanak N et al (2017) HDML: habit detection with machine learning. ICCCT-2017: Proceedings of the 7th International Conference on Computer and Communication Technology, pp 29–33
Sun H, Zhang J, Akashi T (2020) TemplateFree: product detection on retail store shelves, vol 15
Sriram T et al (1996) Applications of barcode technology in automated storage and retrieval systems. In: Proceedings of the 22nd international conference on industrial electronics, control, and instrumentation, Taipei, Taiwan, pp 641–646
Srivastava MM (2020) Bag of Tricks for Retail Product Image Classification. In: Image analysis and recognition, Póvoa de Varzim, Porto, Portugal, pp 71–82
Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition, Boston, MA, pp 1–9
Sonmez EB, Albayrak S (2017) A survey of product recognition in shelf images. 2017 International Conference on Computer Science and Engineering (UBMK), pp 145–150
Tonioni A, Di Stefano L (2019) Domain invariant hierarchical embedding for grocery products recognition. Computer Vision and Image Understanding, (182):81-92
Want R (2006) An introduction to, RFID technology. IEEE Pervasive Computing 5(1):25–33
Article Google Scholar
Wang W et al (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622
Article Google Scholar
Wu C, Huang C, Chen H (2015) Automatic recognition of emotions and actions in bi-modal video analysis. In: International conference on internet of vehicles, pp 427–438
Wei X-S et al (2019) RPC: A large-scale retail product checkout dataset. arXiv preprint, arXiv:1901.07249 1901.07249. URL: https://rpc-dataset.github.io/, accessed on May 22, 2022
Yan J, Lu G, Li X, et al. (2020) FENP: a database of neonatal facial expression for pain analysis. IEEE transactions on affective computing, https://doi.org/10.1109/TAFFC.2020.3030296
Yun S et al (2019) CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features. In: 2019 IEEE/CVF international conference on computer vision, Seoul, Korea (South), pp 6022–6031
Yan J, Lu G, Li X et al (2020) FENP: a database of neonatal facial expression for pain analysis. IEEE Transactions on Affective Computing

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
Junjie Wang & Liye Zhao
Jiangsu Intever Energy Technology Co. Ltd., Nanjing, China
Chengwei Huang
School of Electrical Engineering, Southeast University, Nanjing, 210096, China
Zhi Li

Authors

Junjie Wang
View author publications
You can also search for this author inPubMed Google Scholar
Chengwei Huang
View author publications
You can also search for this author inPubMed Google Scholar
Liye Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Zhi Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Liye Zhao.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., Huang, C., Zhao, L. et al. Lightweight identification of retail products based on improved convolutional neural network. Multimed Tools Appl 81, 31313–31328 (2022). https://doi.org/10.1007/s11042-022-12872-6

Download citation

Received: 01 July 2021
Revised: 07 February 2022
Accepted: 10 March 2022
Published: 09 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11042-022-12872-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight identification of retail products based on improved convolutional neural network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

Retail Product Classification on Distinct Distribution of Training and Evaluation Data

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now