Skip to main content

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2022)

Abstract

Multi-class retail product recognition is an important Computer Vision application for the retail industry. Track 4 of the AICITY challenge is introduced for the retail industry. This track focuses on the accuracy and efficiency of the automatic checkout process. However, due to the lack of training data for retail items in the real world, a synthetic data set is usually generated based on the 3d scanned items to produce training data for an automated checkout system. To overcome the difference informative representative appearance between training data and the real-world scenario in the test set provided by the AICITY organizer, our research focuses on analyzing and recognizing retail items by combining the traditional method and state-of-the-art Convolutional Neural Network (CNN) approach. This paper presents our proposed system for product counting and recognition for automated retail checkout. Our proposed method is ranked top 8 in the experimental evaluation in the 2022 AI City challenge Track-4 with an F1-score 0.4082.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sriram, T., et al.: Applications of barcode technology in automated storage and retrieval systems. In: Proceedings of the 1996 IEEE IECON. 22nd International Conference on Industrial Electronics, Control, and Instrumentation, vol. 1, pp. 641–646 (1996)

    Google Scholar 

  2. LoweDavid, G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (2004)

    Google Scholar 

  3. Christopher, G., Harris, M.J., Stephens, A.: Combined corner and edge detector. In: Alvey Vision Conference (1988)

    Google Scholar 

  4. LeCun, Y., Bengio, Y., Hinton, G.: Deep learn. nat. 521(7553), 436–444 (2015)

    Google Scholar 

  5. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  6. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  7. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  8. Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  9. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  10. Wang, C-Y., Bochkovskiy, A., Mark Liao, H.Y.: Scaled-YOLOV4: Scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. (13029–13038) (2021)

    Google Scholar 

  11. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. (10781–10790) (2020)

    Google Scholar 

  12. Ze, L., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

    Google Scholar 

  13. Yin, J., Wang, W., Meng, Q., Yang, R., Shen, J.: A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6768–6777 (2020)

    Google Scholar 

  14. Braso, G., Leal-Taixe, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257 (2020)

    Google Scholar 

  15. Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951 (2019)

    Google Scholar 

  16. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)

    Google Scholar 

  17. Chattopadhyay, P., Vedantam, R., Selvaraju, R.R., Batra, D., Parikh, D.: Counting everyday objects in everyday scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1135–1144 (2017)

    Google Scholar 

  18. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

    Google Scholar 

  19. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6142–6151 (2019)

    Google Scholar 

  20. Kai, C., et al.: MMDetection: Open MMLab Detection Toolbox and Benchmark (2019)

    Google Scholar 

  21. Zivkovic, Z., et al.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn. Lett. 27(7), 773–780 (2006)

    Article  Google Scholar 

  22. Lucena, O., et al.: Improving face detection performance by skin detection post-processing. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 300–307 IEEE, (2017)

    Google Scholar 

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  24. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778 (2016)

    Google Scholar 

  26. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  27. Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International conference on machine learning, PMLR (2019)

    Google Scholar 

Download references

Acknowledgment

We would like to express a big thank to Ho Chi Minh City International University-Vietnam National University (HCMIU-VNU) for supporting our work. Additionally, we would like to express our appreciation to all of our colleagues for their contributions, which considerably aided in the revision of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Synh Viet-Uyen Ha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thai, TT. et al. (2022). The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13757. Springer, Cham. https://doi.org/10.1007/978-3-031-21743-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21743-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21742-5

  • Online ISBN: 978-3-031-21743-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics