Human arm pose modeling with learned features using joint convolutional neural network

Li, Chongguo; Yung, Nelson H. C.; Sun, Xing; Lam, Edmund Y.

doi:10.1007/s00138-016-0796-0

Human arm pose modeling with learned features using joint convolutional neural network

Original Paper
Published: 20 July 2016

Volume 28, pages 1–14, (2017)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Chongguo Li ORCID: orcid.org/0000-0002-3839-0206¹,
Nelson H. C. Yung¹,
Xing Sun¹ &
…
Edmund Y. Lam¹

449 Accesses
4 Citations
Explore all metrics

Abstract

This paper proposes a new approach to model human arm pose configuration from still images based on learned features and arm part structure constraints. The subjects in still images have no assumption with regards to clothing style, action category and background, so our model has to accommodate these uncertainties. Proposed approach uses an energy model that incorporates the dependence relationships among arm joints and arm parts, where the potentials represent their occurrence probabilities. Positive and negative instances are computed from input image, using multi-scale image patches to capture the details of arm joints and arm parts. A joint convolutional neural network is then developed for feature extraction. Local rigidity of arm part is used to constrain occurrence of arm joints and arm parts, and these constraints can be efficiently incorporated in dynamic programming for human arm pose inference. Our experimental results show better performance than alternative approaches using hand-crafted features for various still images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

Understanding holistic human pose using class-specific convolutional neural network

Article 23 January 2018

A Robust Estimation of 2D Human Upper-Body Poses Using Fully Convolutional Network

References

Wang, L., Yung, N.: Bayesian 3d model based human detection in crowded scenes using efficient optimization. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 557–563. IEEE (2011)
Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: IEEE International Conference on Computer Vision (ICCV), pp. 3312–3319. IEEE (2013)
Li, C., Yung, N.: Action categorization based on arm pose modeling. In: Proceedings of the 9th International Conference on Computer Vision Theory and Applications, vol. 2, pp. 39–47 (2014)
Li, C., Yung, N.: Categorization of human actions with high dynamics in upper extremities based on arm pose modeling. Mach. Vis. Appl. 26(5), 619–632 (2015)
Article Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE (2011)
Sapp, B., Taskar, B.: Modec: multimodal decomposable models for human pose estimation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3674–3681. IEEE (2013)
Palastanga, N., Field, D., Soames, R.: Anatomy and Human Movement: Structure and Function, vol. 20056. Elsevier Health Sciences, Amsterdam (2006)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE (2013)
Conaire, C., O’Connor, N., Smeaton, A.: Detector adaptation by maximising agreement between independent data sources. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29(1), 51–59 (1996)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43(1), 7–27 (2001)
Article MATH Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 762–769. IEEE (2004)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook Of Brain Theory and Neural Networks, vol. 3361(10) (1995)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Neural Information Processing Systems, pp. 1736–1744 (2014)
Jain, A., Tompson, J., LeCun, Y., Bregler, C.: Modeep: a deep learning framework using motion features for human pose estimation. In: Computer Vision-ACCV, pp. 302–315 (2014)
Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Computer Vision-ACCV, pp. 538–552 (2014)
Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Computer Vision-ECCV, pp. 33–47. Springer, Berlin (2014)
Park, D., Ramanan, D.: N-best maximal decoders for part models. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2627–2634. IEEE (2011)
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. In: Predicting structured data. MIT press (2006)
Felzenszwalb, P.F., Zabih, R.: Dynamic programming and graph algorithms in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 721–740 (2011)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2809–2813. IEEE (2011)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
Liu, C.: Probabilistic siamese network for learning representations. Master’s thesis, University of Toronto (2013)
Delalleau, O., Bengio, Y.: Parallel Stochastic Gradient Descent. CIAR Summer School, Toronto (2007)
Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), vol. 4, p. 3. Austin, TX (2010)
Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library (2013). arXiv preprint arXiv:1308.4214
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

Download references

Acknowledgments

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla k40c GPU used for this research.

Author information

Authors and Affiliations

The University of Hong Kong, Hong Kong, China
Chongguo Li, Nelson H. C. Yung, Xing Sun & Edmund Y. Lam

Authors

Chongguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Nelson H. C. Yung
View author publications
You can also search for this author in PubMed Google Scholar
Xing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Edmund Y. Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chongguo Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Yung, N.H.C., Sun, X. et al. Human arm pose modeling with learned features using joint convolutional neural network. Machine Vision and Applications 28, 1–14 (2017). https://doi.org/10.1007/s00138-016-0796-0

Download citation

Received: 10 July 2015
Revised: 06 June 2016
Accepted: 10 July 2016
Published: 20 July 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00138-016-0796-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human arm pose modeling with learned features using joint convolutional neural network

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

Understanding holistic human pose using class-specific convolutional neural network

A Robust Estimation of 2D Human Upper-Body Poses Using Fully Convolutional Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human arm pose modeling with learned features using joint convolutional neural network

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

Understanding holistic human pose using class-specific convolutional neural network

A Robust Estimation of 2D Human Upper-Body Poses Using Fully Convolutional Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation