Abstract
Multi-class retail product recognition is an important Computer Vision application for the retail industry. Track 4 of the AICITY challenge is introduced for the retail industry. This track focuses on the accuracy and efficiency of the automatic checkout process. However, due to the lack of training data for retail items in the real world, a synthetic data set is usually generated based on the 3d scanned items to produce training data for an automated checkout system. To overcome the difference informative representative appearance between training data and the real-world scenario in the test set provided by the AICITY organizer, our research focuses on analyzing and recognizing retail items by combining the traditional method and state-of-the-art Convolutional Neural Network (CNN) approach. This paper presents our proposed system for product counting and recognition for automated retail checkout. Our proposed method is ranked top 8 in the experimental evaluation in the 2022 AI City challenge Track-4 with an F1-score 0.4082.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sriram, T., et al.: Applications of barcode technology in automated storage and retrieval systems. In: Proceedings of the 1996 IEEE IECON. 22nd International Conference on Industrial Electronics, Control, and Instrumentation, vol. 1, pp. 641–646 (1996)
LoweDavid, G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (2004)
Christopher, G., Harris, M.J., Stephens, A.: Combined corner and edge detector. In: Alvey Vision Conference (1988)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learn. nat. 521(7553), 436–444 (2015)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Wang, C-Y., Bochkovskiy, A., Mark Liao, H.Y.: Scaled-YOLOV4: Scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. (13029–13038) (2021)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. (10781–10790) (2020)
Ze, L., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Yin, J., Wang, W., Meng, Q., Yang, R., Shen, J.: A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6768–6777 (2020)
Braso, G., Leal-Taixe, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257 (2020)
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951 (2019)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Chattopadhyay, P., Vedantam, R., Selvaraju, R.R., Batra, D., Parikh, D.: Counting everyday objects in everyday scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1135–1144 (2017)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6142–6151 (2019)
Kai, C., et al.: MMDetection: Open MMLab Detection Toolbox and Benchmark (2019)
Zivkovic, Z., et al.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn. Lett. 27(7), 773–780 (2006)
Lucena, O., et al.: Improving face detection performance by skin detection post-processing. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 300–307 IEEE, (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International conference on machine learning, PMLR (2019)
Acknowledgment
We would like to express a big thank to Ho Chi Minh City International University-Vietnam National University (HCMIU-VNU) for supporting our work. Additionally, we would like to express our appreciation to all of our colleagues for their contributions, which considerably aided in the revision of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Thai, TT. et al. (2022). The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13757. Springer, Cham. https://doi.org/10.1007/978-3-031-21743-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-21743-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21742-5
Online ISBN: 978-3-031-21743-2
eBook Packages: Computer ScienceComputer Science (R0)