Abstract
Powered by deep learning technology, automatic check-out (ACO) has made great breakthroughs. Nevertheless, because of the complex nature of real scenes, ACO is still an exceedingly testing task in the field of computer vision. Existing methods cannot fully exploit the contextual information, so that the improvement of checkout accuracy is inhibited. In this study, a novel context-guided feature enhancement network (CGFENet) is proposed, in which products are detected in multi-scale features by exploring the global and local context. Specifically, we design three customized modules: Global context learning module (GCLM), local context learning module (LCLM), and attention transfer module (ATM). GCLM is designed for enhancing the feature representation of feature maps by fully exploring global context information, the purpose of LCLM is that interactions between local and global features can be strengthened gradually, and ATM aims to make the model attach more attention to the challenging products. For the purpose of proving the effectiveness of the proposed CGFENet, extensive experiments are conducted on the large-scale retail product checkout dataset. Experimental results indicate that CGFENet accomplishes favorable performance and surpasses state-of-the-art methods. We achieve 85.88% checkout accuracy in the averaged mode, by comparison with 56.68% of the baseline methods.
Similar content being viewed by others
References
Wei X-S, Cui Q, Yang L, Wang P, Liu L (2019) Rpc: a large-scale retail product checkout dataset. arXiv:1901.07249
Li C, Du D, Zhang L, Luo T, Wu Y, Tian Q, Wen L, Lyu S (2019) Data priming network for automatic check-out. In: Proceedings of the 27th ACM international conference on multimedia, pp 2152–2160
Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. arXiv:1704.04224
Carbonetto P, De Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision. Springer, pp 350–362
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst 114(6):712–722
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629
Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11(12):520–527
Palmer TE (1975) The effects of contextual scenes on the identification of objects. Memory Cognit 3:519–526
Alex Krizhevsky I, Hinton SG (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: toward real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Mark Liao H-Y (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Yi Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6569–6578
Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1271–1278
Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 891–898
Yu R, Chen X, Morariu VI, Davis LS (2016) The role of context selection in object detection. arXiv:1609.02948
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. arXiv:1702.07054
Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558
Leng J, Liu Y, Dawei D, Zhang T, Quan P (2019) Robust obstacle detection and recognition for driver assistance systems. IEEE Trans Intell Transp Syst 21(4):1560–1571
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Li J, Wei Y, Liang X, Dong J, Tingfa X, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954
Chen X, Li L-J, Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. arXiv:1803.11189
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
Gu J, Hu H, Wang L, Wei Y, Dai J (2018) Learning region features for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 381–395
Dong L, Zhang H, Ji Y, Ding, (2020) Crowd counting by using multi-level density-based spatial information: a multi-scale cnn framework. Inf Sci 528:79–91
Koubaroulis D, Matas J, Kittler J, CMP CTU (2002) Evaluating colour-based object recognition algorithms using the soil-47 database. In: Asian conference on computer vision, vol 2
Merler M, Galleguillos C, Belongie S (2007) Recognizing groceries in situ using in vitro training data. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Rocha A, Hauagge DC, Wainer J, Goldenstein S (2010) Automatic fruit and vegetable classification from images. Comput Electron Agric 70(1):96–104
George M, Floerkemeier C (2014) Recognizing products: a per-exemplar multi-label image classification approach. In: European conference on computer vision. Springer, pp 440–455
Jund P, Abdo N, Eitel A, Burgard W (2016) The freiburg groceries dataset. arXiv:1611.05799
Follmann P, Bottger T, Hartinger P, Konig R, Ulrich M (2018) Mvtec d2s: densely segmented supermarket dataset. In: Proceedings of the European conference on computer vision (ECCV), pp 569–585
Zhang H, Li D, Ji Y, Zhou H, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 99:1
Liu A, Wang J, Liu X, Cao B, Zhang C, Yu H (2020) Bias-based universal adversarial patch attack for automatic check-out. In: European conference on computer vision
Zhang L, Du D, Li C, Wu Y, Luo T (2020) Iterative knowledge distillation for automatic check-out. In: IEEE Transactions on Multimedia. IEEE. https://doi.org/10.1109/TMM.2020.3037502
Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622
Yang Y, Sheng L, Jiang X, Wang H, Xu D, Cao X (2021) Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 626–634
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Acknowledgments
This work was supported by China Postdoctoral Science Foundation (grant number 2020M670152).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We declare that we have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, Y., Luo, T. & Zuo, Z. Context-guided feature enhancement network for automatic check-out. Neural Comput & Applic 34, 593–606 (2022). https://doi.org/10.1007/s00521-021-06394-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06394-9