Context-guided feature enhancement network for automatic check-out

Sun, Yihan; Luo, Tiejian; Zuo, Zhen

doi:10.1007/s00521-021-06394-9

Context-guided feature enhancement network for automatic check-out

Original Article
Published: 14 August 2021

Volume 34, pages 593–606, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

336 Accesses
1 Citation
Explore all metrics

Abstract

Powered by deep learning technology, automatic check-out (ACO) has made great breakthroughs. Nevertheless, because of the complex nature of real scenes, ACO is still an exceedingly testing task in the field of computer vision. Existing methods cannot fully exploit the contextual information, so that the improvement of checkout accuracy is inhibited. In this study, a novel context-guided feature enhancement network (CGFENet) is proposed, in which products are detected in multi-scale features by exploring the global and local context. Specifically, we design three customized modules: Global context learning module (GCLM), local context learning module (LCLM), and attention transfer module (ATM). GCLM is designed for enhancing the feature representation of feature maps by fully exploring global context information, the purpose of LCLM is that interactions between local and global features can be strengthened gradually, and ATM aims to make the model attach more attention to the challenging products. For the purpose of proving the effectiveness of the proposed CGFENet, extensive experiments are conducted on the large-scale retail product checkout dataset. Experimental results indicate that CGFENet accomplishes favorable performance and surpasses state-of-the-art methods. We achieve 85.88% checkout accuracy in the averaged mode, by comparison with 56.68% of the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

Article 10 July 2020

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

CA-CentripetalNet: a novel anchor-free deep learning framework for hardhat wearing detection

Article 10 July 2023

Notes

https://github.com/RPC-Dataset/RPC-Leaderboard.

References

Wei X-S, Cui Q, Yang L, Wang P, Liu L (2019) Rpc: a large-scale retail product checkout dataset. arXiv:1901.07249
Li C, Du D, Zhang L, Luo T, Wu Y, Tian Q, Wen L, Lyu S (2019) Data priming network for automatic check-out. In: Proceedings of the 27th ACM international conference on multimedia, pp 2152–2160
Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. arXiv:1704.04224
Carbonetto P, De Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision. Springer, pp 350–362
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst 114(6):712–722
Article Google Scholar
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629
Article Google Scholar
Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11(12):520–527
Article Google Scholar
Palmer TE (1975) The effects of contextual scenes on the identification of objects. Memory Cognit 3:519–526
Article Google Scholar
Alex Krizhevsky I, Hinton SG (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: toward real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Mark Liao H-Y (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Yi Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6569–6578
Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1271–1278
Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 891–898
Yu R, Chen X, Morariu VI, Davis LS (2016) The role of context selection in object detection. arXiv:1609.02948
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. arXiv:1702.07054
Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558
Article Google Scholar
Leng J, Liu Y, Dawei D, Zhang T, Quan P (2019) Robust obstacle detection and recognition for driver assistance systems. IEEE Trans Intell Transp Syst 21(4):1560–1571
Article Google Scholar
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Li J, Wei Y, Liang X, Dong J, Tingfa X, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954
Article Google Scholar
Chen X, Li L-J, Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. arXiv:1803.11189
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
Gu J, Hu H, Wang L, Wei Y, Dai J (2018) Learning region features for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 381–395
Dong L, Zhang H, Ji Y, Ding, (2020) Crowd counting by using multi-level density-based spatial information: a multi-scale cnn framework. Inf Sci 528:79–91
Article MathSciNet Google Scholar
Koubaroulis D, Matas J, Kittler J, CMP CTU (2002) Evaluating colour-based object recognition algorithms using the soil-47 database. In: Asian conference on computer vision, vol 2
Merler M, Galleguillos C, Belongie S (2007) Recognizing groceries in situ using in vitro training data. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Rocha A, Hauagge DC, Wainer J, Goldenstein S (2010) Automatic fruit and vegetable classification from images. Comput Electron Agric 70(1):96–104
Article Google Scholar
George M, Floerkemeier C (2014) Recognizing products: a per-exemplar multi-label image classification approach. In: European conference on computer vision. Springer, pp 440–455
Jund P, Abdo N, Eitel A, Burgard W (2016) The freiburg groceries dataset. arXiv:1611.05799
Follmann P, Bottger T, Hartinger P, Konig R, Ulrich M (2018) Mvtec d2s: densely segmented supermarket dataset. In: Proceedings of the European conference on computer vision (ECCV), pp 569–585
Zhang H, Li D, Ji Y, Zhou H, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 99:1
Google Scholar
Liu A, Wang J, Liu X, Cao B, Zhang C, Yu H (2020) Bias-based universal adversarial patch attack for automatic check-out. In: European conference on computer vision
Zhang L, Du D, Li C, Wu Y, Luo T (2020) Iterative knowledge distillation for automatic check-out. In: IEEE Transactions on Multimedia. IEEE. https://doi.org/10.1109/TMM.2020.3037502
Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622
Article Google Scholar
Yang Y, Sheng L, Jiang X, Wang H, Xu D, Cao X (2021) Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 626–634
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

Download references

Acknowledgments

This work was supported by China Postdoctoral Science Foundation (grant number 2020M670152).

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Yihan Sun & Tiejian Luo
School of Mechanical Engineering, Beijing Institute of Technology, Beijing, 100081, China
Zhen Zuo

Authors

Yihan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tiejian Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yihan Sun.

Ethics declarations

Conflicts of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Luo, T. & Zuo, Z. Context-guided feature enhancement network for automatic check-out. Neural Comput & Applic 34, 593–606 (2022). https://doi.org/10.1007/s00521-021-06394-9

Download citation

Received: 14 October 2020
Accepted: 26 July 2021
Published: 14 August 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00521-021-06394-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-guided feature enhancement network for automatic check-out

Abstract

Access this article

Similar content being viewed by others

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

CA-CentripetalNet: a novel anchor-free deep learning framework for hardhat wearing detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Context-guided feature enhancement network for automatic check-out

Abstract

Access this article

Similar content being viewed by others

A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition

The Combination of Background Subtraction and Convolutional Neural Network for Product Recognition

CA-CentripetalNet: a novel anchor-free deep learning framework for hardhat wearing detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation