Skip to main content
Log in

Context-guided feature enhancement network for automatic check-out

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Powered by deep learning technology, automatic check-out (ACO) has made great breakthroughs. Nevertheless, because of the complex nature of real scenes, ACO is still an exceedingly testing task in the field of computer vision. Existing methods cannot fully exploit the contextual information, so that the improvement of checkout accuracy is inhibited. In this study, a novel context-guided feature enhancement network (CGFENet) is proposed, in which products are detected in multi-scale features by exploring the global and local context. Specifically, we design three customized modules: Global context learning module (GCLM), local context learning module (LCLM), and attention transfer module (ATM). GCLM is designed for enhancing the feature representation of feature maps by fully exploring global context information, the purpose of LCLM is that interactions between local and global features can be strengthened gradually, and ATM aims to make the model attach more attention to the challenging products. For the purpose of proving the effectiveness of the proposed CGFENet, extensive experiments are conducted on the large-scale retail product checkout dataset. Experimental results indicate that CGFENet accomplishes favorable performance and surpasses state-of-the-art methods. We achieve 85.88% checkout accuracy in the averaged mode, by comparison with 56.68% of the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/RPC-Dataset/RPC-Leaderboard.

References

  1. Wei X-S, Cui Q, Yang L, Wang P, Liu L (2019) Rpc: a large-scale retail product checkout dataset. arXiv:1901.07249

  2. Li C, Du D, Zhang L, Luo T, Wu Y, Tian Q, Wen L, Lyu S (2019) Data priming network for automatic check-out. In: Proceedings of the 27th ACM international conference on multimedia, pp 2152–2160

  3. Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)

  4. Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. arXiv:1704.04224

  5. Carbonetto P, De Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision. Springer, pp 350–362

  6. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  7. Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst 114(6):712–722

    Article  Google Scholar 

  8. Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8

  9. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629

    Article  Google Scholar 

  10. Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11(12):520–527

    Article  Google Scholar 

  11. Palmer TE (1975) The effects of contextual scenes on the identification of objects. Memory Cognit 3:519–526

    Article  Google Scholar 

  12. Alex Krizhevsky I, Hinton SG (2012) Imagenet classification with deep convolutional neural networks. In: NIPS

  13. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: toward real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  14. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405

  15. Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  16. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  17. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  18. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  19. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  20. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  21. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  22. Bochkovskiy A, Wang C-Y, Mark Liao H-Y (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934

  23. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855

  24. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659

  25. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  26. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  27. Yi Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007

  28. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750

  29. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6569–6578

  30. Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1271–1278

  31. Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 891–898

  32. Yu R, Chen X, Morariu VI, Davis LS (2016) The role of context selection in object detection. arXiv:1609.02948

  33. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  34. Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. arXiv:1702.07054

  35. Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558

    Article  Google Scholar 

  36. Leng J, Liu Y, Dawei D, Zhang T, Quan P (2019) Robust obstacle detection and recognition for driver assistance systems. IEEE Trans Intell Transp Syst 21(4):1560–1571

    Article  Google Scholar 

  37. Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  38. Li J, Wei Y, Liang X, Dong J, Tingfa X, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954

    Article  Google Scholar 

  39. Chen X, Li L-J, Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. arXiv:1803.11189

  40. Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597

  41. Gu J, Hu H, Wang L, Wei Y, Dai J (2018) Learning region features for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 381–395

  42. Dong L, Zhang H, Ji Y, Ding, (2020) Crowd counting by using multi-level density-based spatial information: a multi-scale cnn framework. Inf Sci 528:79–91

    Article  MathSciNet  Google Scholar 

  43. Koubaroulis D, Matas J, Kittler J, CMP CTU (2002) Evaluating colour-based object recognition algorithms using the soil-47 database. In: Asian conference on computer vision, vol 2

  44. Merler M, Galleguillos C, Belongie S (2007) Recognizing groceries in situ using in vitro training data. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  45. Rocha A, Hauagge DC, Wainer J, Goldenstein S (2010) Automatic fruit and vegetable classification from images. Comput Electron Agric 70(1):96–104

    Article  Google Scholar 

  46. George M, Floerkemeier C (2014) Recognizing products: a per-exemplar multi-label image classification approach. In: European conference on computer vision. Springer, pp 440–455

  47. Jund P, Abdo N, Eitel A, Burgard W (2016) The freiburg groceries dataset. arXiv:1611.05799

  48. Follmann P, Bottger T, Hartinger P, Konig R, Ulrich M (2018) Mvtec d2s: densely segmented supermarket dataset. In: Proceedings of the European conference on computer vision (ECCV), pp 569–585

  49. Zhang H, Li D, Ji Y, Zhou H, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 99:1

    Google Scholar 

  50. Liu A, Wang J, Liu X, Cao B, Zhang C, Yu H (2020) Bias-based universal adversarial patch attack for automatic check-out. In: European conference on computer vision

  51. Zhang L, Du D, Li C, Wu Y, Luo T (2020) Iterative knowledge distillation for automatic check-out. In: IEEE Transactions on Multimedia. IEEE. https://doi.org/10.1109/TMM.2020.3037502

  52. Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622

    Article  Google Scholar 

  53. Yang Y, Sheng L, Jiang X, Wang H, Xu D, Cao X (2021) Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 626–634

  54. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

  55. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  56. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

Download references

Acknowledgments

This work was supported by China Postdoctoral Science Foundation (grant number 2020M670152).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihan Sun.

Ethics declarations

Conflicts of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Luo, T. & Zuo, Z. Context-guided feature enhancement network for automatic check-out. Neural Comput & Applic 34, 593–606 (2022). https://doi.org/10.1007/s00521-021-06394-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06394-9

Keywords

Navigation