Skip to main content
Log in

Integral customer pose estimation using body orientation and visibility mask

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The analysis of customer pose is one of the most important topics for marketing. Using the customer pose information, the retailers can evaluate the customer interest level to the merchandise. However, the pose estimation is not easy because of the problems of occlusion and left-right similarity. To address these two problems, we propose an Integral Pose Network (IntePoseNet) which incorporates the body orientation and visibility mask. Firstly, benefiting from the simple gaits in retail store, the body orientation can give the global information of pose configuration. For example, if a person is facing to right, the body orientation indicates the occlusion of his or her left body. Similarly, if a person is facing to the camera, his or her right shoulder is probably at the left side of image. The body orientation is fused with local joint connections by a set of novel Orientational Message Passing (OMP) layers in Deep Neural Network. Secondly, the visibility mask models the occlusion state of each joint. It is tightly related to the body orientation because the body orientation is the main reason of self-occlusion. In addition, occluding object (e.g. shopping basket in retail store environment) detection can also give the clues of visibility mask prediction. Furthermore, the system models the temporal consistency by introducing the optical flow and Bi-directional Recurrent Neural Network. Therefore, the global body orientation, local joint connections, customer motion, occluding objects and temporal consistency are integrally considered in our system. At last, we conduct a series of comparison experiments to show the effectiveness of our system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Achilles F, Ichim A-E, Coskun H, Tombari F, Noachtar S, Navab N (2016) Patient MoCap: human pose estimation under blanket occlusion for hospital monitoring applications. In: Proceedings of the international conference on medical image computing and computer-assisted intervention, pp 491–499

  2. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3686–3693

  3. Azizpour H, Laptev I (2012) Object detection using strongly-supervised deformable part models. In: European Conference on Computer Vision (ECCV), pp 836–849

  4. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4733–4742

  5. Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems, pp 1736–1744

  6. Chen X, Yuille AL (2015) Parsing occluded people by flexible compositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3945–3954

  7. Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 2016 ACM on multimedia conference, New York, pp 898–907

  8. Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4715–4723

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893

  10. Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3041–3048

  11. Desai C, Ramanan D (2012) Detecting actions, poses, and objects with relational phraselets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 158–172

  12. Dosovitskiy A, et al. (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2758–2766

  13. Eichner M, Ferrari V (2012) Appearance sharing for collective human pose estimation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 138–151

  14. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis (IJCV) 61(1):55–79

    Article  Google Scholar 

  15. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell (PAMI) 32(9):1627–1645

    Article  Google Scholar 

  16. Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4346–4354

  17. Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L (2016) Towards viewpoint invariant 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 160–177

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  19. Iqbal U, Garbade M, Gall J (2017) Pose for action-action for pose. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, pp 438–445

  20. Jain A, Tompson J, LeCun Y, Bregler C (2014) MoDeep: a deep learning framework using motion features for human pose estimation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 302–315

  21. Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 3192–3199

  22. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (BMVC)

  23. Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp 1465–1472

  24. Le Cun BB, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Handwritten digit recognition with a back-propagation network. In: Neural Information Processing Systems (NIPS)

  25. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: The handbook of brain theory and neural networks, vol. 3361, no. 10

  26. Liu Z, Wang Z (2016) Action recognition with low observational latency via part movement model. Multimed Tools Appl (MTAP) 76:26675–26693

    Article  Google Scholar 

  27. Liu J, Gu Y, Kamijo S (2016) Customer behavior classification using surveillance camera for marketing. Multimed Tools Appl (MTAP) 76:6595–6622

    Article  Google Scholar 

  28. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3431–3440

  29. Park D, Ramanan D (2015) Articulated pose estimation with tiny synthetic videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 58–66

  30. Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp 1913–1921

  31. Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 588–595

  32. Rafi U, Gall J, Leibe B (2015) A semantic occlusion model for human pose estimation from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 67–74

  33. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252

    Article  MathSciNet  Google Scholar 

  34. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  35. Sminchisescu C, Telea A (2002) Human pose estimation from silhouettes. A consistent approach using distance level sets. In: Proceedings of the International Conference on Computer Graphics, Visualization and Computer Vision (WSCG)

  36. Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. arXiv preprint arXiv:1703.10898

  37. Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 723–730

  38. Tafazzoli F, Safabakhsh R (2010) Model-based human gait recognition using leg and arm movements. Eng Appl Artif Intell 23(8):1237–1246

    Article  Google Scholar 

  39. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp 1799–1807

  40. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 648–656

  41. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1653–1660

  42. Wagg DK, Nixon MS (2003) Model-based gait enrolment in real-world imagery. In: Proceedings of the workshop on multimodal user authentication, pp 189–195

  43. Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) DeepFlow: large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1385–1392

  44. Xiaohan Nie B, Xiong C, Zhu S-C (2015) Joint action recognition and pose estimation from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1293–1301

  45. Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(12):2878–2890

    Article  Google Scholar 

  46. Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4715–4723

  47. Yao J, Odobez J-M (2007) Multi-layer background subtraction based on color and texture. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp 1–8

  48. Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: Venue category estimation from micro-video. In: Proceedings of the 2016 ACM on multimedia conference, pp 1415–1424

Download references

Acknowledgments

The authors thank Haitao Wang, Yongjie Liu, Qianlong Wang for their help for labeling data. The faces of customers are blurred for the purpose of privacy in this paper. This research is permitted by the Compliance Committee of The University of Tokyo.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingwen Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Gu, Y. & Kamijo, S. Integral customer pose estimation using body orientation and visibility mask. Multimed Tools Appl 77, 26107–26134 (2018). https://doi.org/10.1007/s11042-018-5839-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5839-2

Keywords

Navigation