Integral customer pose estimation using body orientation and visibility mask

Liu, Jingwen; Gu, Yanlei; Kamijo, Shunsuke

doi:10.1007/s11042-018-5839-2

Integral customer pose estimation using body orientation and visibility mask

Published: 05 March 2018

Volume 77, pages 26107–26134, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jingwen Liu¹,
Yanlei Gu² &
Shunsuke Kamijo²

372 Accesses
4 Citations
Explore all metrics

Abstract

The analysis of customer pose is one of the most important topics for marketing. Using the customer pose information, the retailers can evaluate the customer interest level to the merchandise. However, the pose estimation is not easy because of the problems of occlusion and left-right similarity. To address these two problems, we propose an Integral Pose Network (IntePoseNet) which incorporates the body orientation and visibility mask. Firstly, benefiting from the simple gaits in retail store, the body orientation can give the global information of pose configuration. For example, if a person is facing to right, the body orientation indicates the occlusion of his or her left body. Similarly, if a person is facing to the camera, his or her right shoulder is probably at the left side of image. The body orientation is fused with local joint connections by a set of novel Orientational Message Passing (OMP) layers in Deep Neural Network. Secondly, the visibility mask models the occlusion state of each joint. It is tightly related to the body orientation because the body orientation is the main reason of self-occlusion. In addition, occluding object (e.g. shopping basket in retail store environment) detection can also give the clues of visibility mask prediction. Furthermore, the system models the temporal consistency by introducing the optical flow and Bi-directional Recurrent Neural Network. Therefore, the global body orientation, local joint connections, customer motion, occluding objects and temporal consistency are integrally considered in our system. At last, we conduct a series of comparison experiments to show the effectiveness of our system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customer pose estimation using orientational spatio-temporal network from surveillance camera

Article 01 November 2017

An intelligent shopping support robot: understanding shopping behavior from 2D skeleton data using GRU network

Article Open access 16 December 2019

Research on Human Pose Estimation and Object Detection in the Field of Unmanned Retail

References

Achilles F, Ichim A-E, Coskun H, Tombari F, Noachtar S, Navab N (2016) Patient MoCap: human pose estimation under blanket occlusion for hospital monitoring applications. In: Proceedings of the international conference on medical image computing and computer-assisted intervention, pp 491–499
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3686–3693
Azizpour H, Laptev I (2012) Object detection using strongly-supervised deformable part models. In: European Conference on Computer Vision (ECCV), pp 836–849
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4733–4742
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems, pp 1736–1744
Chen X, Yuille AL (2015) Parsing occluded people by flexible compositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3945–3954
Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 2016 ACM on multimedia conference, New York, pp 898–907
Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4715–4723
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3041–3048
Desai C, Ramanan D (2012) Detecting actions, poses, and objects with relational phraselets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 158–172
Dosovitskiy A, et al. (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2758–2766
Eichner M, Ferrari V (2012) Appearance sharing for collective human pose estimation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 138–151
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis (IJCV) 61(1):55–79
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell (PAMI) 32(9):1627–1645
Article Google Scholar
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4346–4354
Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L (2016) Towards viewpoint invariant 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 160–177
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Iqbal U, Garbade M, Gall J (2017) Pose for action-action for pose. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, pp 438–445
Jain A, Tompson J, LeCun Y, Bregler C (2014) MoDeep: a deep learning framework using motion features for human pose estimation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 302–315
Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 3192–3199
Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (BMVC)
Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp 1465–1472
Le Cun BB, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Handwritten digit recognition with a back-propagation network. In: Neural Information Processing Systems (NIPS)
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: The handbook of brain theory and neural networks, vol. 3361, no. 10
Liu Z, Wang Z (2016) Action recognition with low observational latency via part movement model. Multimed Tools Appl (MTAP) 76:26675–26693
Article Google Scholar
Liu J, Gu Y, Kamijo S (2016) Customer behavior classification using surveillance camera for marketing. Multimed Tools Appl (MTAP) 76:6595–6622
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3431–3440
Park D, Ramanan D (2015) Articulated pose estimation with tiny synthetic videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 58–66
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp 1913–1921
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 588–595
Rafi U, Gall J, Leibe B (2015) A semantic occlusion model for human pose estimation from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 67–74
Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Article MathSciNet Google Scholar
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Sminchisescu C, Telea A (2002) Human pose estimation from silhouettes. A consistent approach using distance level sets. In: Proceedings of the International Conference on Computer Graphics, Visualization and Computer Vision (WSCG)
Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. arXiv preprint arXiv:1703.10898
Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 723–730
Tafazzoli F, Safabakhsh R (2010) Model-based human gait recognition using leg and arm movements. Eng Appl Artif Intell 23(8):1237–1246
Article Google Scholar
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp 1799–1807
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 648–656
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1653–1660
Wagg DK, Nixon MS (2003) Model-based gait enrolment in real-world imagery. In: Proceedings of the workshop on multimodal user authentication, pp 189–195
Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) DeepFlow: large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1385–1392
Xiaohan Nie B, Xiong C, Zhu S-C (2015) Joint action recognition and pose estimation from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1293–1301
Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(12):2878–2890
Article Google Scholar
Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4715–4723
Yao J, Odobez J-M (2007) Multi-layer background subtraction based on color and texture. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp 1–8
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: Venue category estimation from micro-video. In: Proceedings of the 2016 ACM on multimedia conference, pp 1415–1424

Download references

Acknowledgments

The authors thank Haitao Wang, Yongjie Liu, Qianlong Wang for their help for labeling data. The faces of customers are blurred for the purpose of privacy in this paper. This research is permitted by the Compliance Committee of The University of Tokyo.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Jingwen Liu
Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
Yanlei Gu & Shunsuke Kamijo

Authors

Jingwen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanlei Gu
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Kamijo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingwen Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Gu, Y. & Kamijo, S. Integral customer pose estimation using body orientation and visibility mask. Multimed Tools Appl 77, 26107–26134 (2018). https://doi.org/10.1007/s11042-018-5839-2

Download citation

Received: 06 April 2017
Revised: 12 January 2018
Accepted: 26 February 2018
Published: 05 March 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11042-018-5839-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integral customer pose estimation using body orientation and visibility mask

Abstract

Access this article

Similar content being viewed by others

Customer pose estimation using orientational spatio-temporal network from surveillance camera

An intelligent shopping support robot: understanding shopping behavior from 2D skeleton data using GRU network

Research on Human Pose Estimation and Object Detection in the Field of Unmanned Retail

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integral customer pose estimation using body orientation and visibility mask

Abstract

Access this article

Similar content being viewed by others

Customer pose estimation using orientational spatio-temporal network from surveillance camera

An intelligent shopping support robot: understanding shopping behavior from 2D skeleton data using GRU network

Research on Human Pose Estimation and Object Detection in the Field of Unmanned Retail

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation