Abstract
This work targets human pose recognition based on RGB-D videos. In recently, RGB-D based methods can be typically represented as either maps-based approaches or skeleton-based approaches. This paper proposes a semi-supervised learning method for evaluating human posture via RGB-D and light-model. The light-model is generated to represent depth sequence, by using the dynamic-fusion strategy. In this regard, light-model has richer information than depth image, and a CNN classifier is further constructed to recognize human pose with trained labeled light model data. Soft correlation and hard correlation are used to adjust the CNN output of non-labeled data. This paper constructs a set of posture data which consist of RGB images and light model. The experiments results show that our method is more accuracy than the state of the art, and the efficient is also competitive. This study implies that feature extracted from 3D models is reliable for human pose recognition, especially for sitting posture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Proceedings of the Pattern Recognition (ICPR 2004), vol. 3, pp. 32–36. IEEE Computer Society (2004)
Heng, W., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, vol. 2 (2014)
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Tran, D., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2014)
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE (2013)
Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Lu, X., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2013)
Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2013)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (2014)
Ng, J.Y.-H., et al.: Beyond short snippets: deep networks for video classification. arXiv preprint arXiv:1503.08909a (2015)
Wang, K., Wang, X., Lin, L., et al.: 3D human activity recognition with reconfigurable convolutional neural networks. In: Proceedings of the ACM International Conference on Multimedia. ACM (2014)
Whelan, T., et al.: Kintinuous: spatially extended kinectfusion. MIT-CSAIL-TR-2012-020 (2012)
Nießner, M., et al.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32(6) (2013). Article No. 169
Whelan, T., et al.: ElasticFusion: dense SLAM without a pose graph. In: RSS (2015)
Blan, A.O., et al.: Shining a light on human pose: on shadows, shading and the estimation of pose and shape. In: IEEE 11th International Conference on Computer Vision, ICCV 2007. IEEE (2007)
Lee, M.W., Nevatia, R.: Body part detection for human pose estimation and tracking. In: IEEE Workshop on Motion and Video Computing, WMVC 2007. IEEE (2007)
Lee, M.W., Nevatia, R.: Dynamic human pose estimation using Markov chain Monte Carlo approach. In: Seventh IEEE Workshops on Application of Computer Vision, WACV/MOTIONS 2005, vol. 1–2. IEEE (2005)
Fathi, A., Mori, G.: Human pose estimation using motion exemplars. In: IEEE 11th International Conference on Computer Vision, ICCV 2007. IEEE (2007)
Baumberg, A.M., Hogg, D.C.: An efficient method for contour tracking using active shape models. In: Proceedings of the 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects. IEEE (1994)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE (2010)
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.M.: STOP: Space-Time Occupancy Patterns for 3D action recognition from depth map sequences. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 252–259. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33275-3_31
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7441, pp. 872–885. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_62
Mao, Y., et al.: Accurate 3d pose estimation from a single depth image. In: 2011 IEEE International Conference on Computer Vision (ICCV). IEEE (2011)
Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for efficient anatomy detection and localization in CT studies. In: Menze, B., Langs, G., Tu, Z., Criminisi, A. (eds.) MCV 2010. LNCS, vol. 6533, pp. 106–117. Springer, Heidelberg (2011). doi:10.1007/978-3-642-18421-5_11
Jalal, A., et al.: Recognition of human home activities via depth silhouettes and transformation for smart homes. Indoor Built Environ. 21(1), 184–190 (2011)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia. ACM (2012)
Wu, S.-L., Cui, R.-Y.: Human behavior recognition based on sitting postures. In: 2010 International Symposium on Computer Communication Control and Automation (3CA), vol. 1. IEEE (2010)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. arXiv preprint arXiv:1505.00687 (2015)
Stikic, M., Van Laerhoven, K., Schiele, B.: Exploring semi-supervised and active learning for activity recognition. In: 12th IEEE International Symposium on Wearable Computers, ISWC 2008. IEEE (2008)
Zhao, X., et al.: Human action recognition based on semi-supervised discriminant analysis with global constraint. Neurocomputing 105, 45–50 (2013)
Zhang, T., et al.: Boosted multi-class semi-supervised learning for human action recognition. Pattern Recogn. 44(10), 2334–2342 (2011)
Guan, D., et al.: Activity recognition based on semi-supervised learning. In: 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007. IEEE (2007)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39, 1–38 (1977)
Miller, D.J., Uyar, H.S.: A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Advances in Neural Information Processing Systems (1997)
Zhao, Y., et al.: Combing RGB and depth map features for human activity recognition. In: 2012 Asia-Pacific on Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE (2012)
Faria, D.R., Premebida, C., Nunes, U.: A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In: 2014 RO-MAN: The 23rd IEEE International Symposium on Robot and Human Interactive Communication. IEEE (2014)
Ming, Y., Ruan, Q., Hauptmann, A.G.: Activity recognition from RGB-D camera with 3d local spatio-temporal features. In: 2012 IEEE International Conference on Multimedia and Expo (ICME). IEEE (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Zhang, G., Yu, D., Liu, D. (2016). Semi-supervised Learning for Human Pose Recognition with RGB-D Light-Model. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-48896-7_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)