Hand Segmentation with Structured Convolutional Learning

Neverova, Natalia; Wolf, Christian; Taylor, Graham W.; Nebout, Florian

doi:10.1007/978-3-319-16811-1_45

Natalia Neverova^17,18,
Christian Wolf^17,18,
Graham W. Taylor¹⁹ &
…
Florian Nebout²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9005))

Included in the following conference series:

Asian Conference on Computer Vision

2838 Accesses
10 Citations
3 Altmetric

Abstract

The availability of cheap and effective depth sensors has resulted in recent advances in human pose estimation and tracking. Detailed estimation of hand pose, however, remains a challenge since fingers are often occluded and may only represent just a few pixels. Moreover, labelled data is difficult to obtain. We propose a deep learning based-approach for hand pose estimation, targeting gesture recognition, that requires very little labelled data. It leverages both unlabeled data and synthetic data from renderings. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding unlabelled real-world samples significantly improves results compared to a purely supervised setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)
Google Scholar
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29, 837–848 (2013)
Article Google Scholar
Keskin, C., Kiraç, F., Kara, Y., Akarun, L.: Real time hand pose estimation using depth sensors. In: ICCV Workshop on Consumer Depth Cameras. IEEE (2011)
Google Scholar
Półrola, M., Wojciechowski, A.: Real-time hand pose estimation using classifiers. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 573–580. Springer, Heidelberg (2012)
Chapter Google Scholar
Tang, D., Yu, T., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: ICCV (2013)
Google Scholar
Shotton, J.: Conditional regression forests for human pose estimation. In: CVPR, pp. 3394–3401 (2012)
Google Scholar
Jain, A., Tompson, J., Andriluka, M., Taylor, G., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)
Google Scholar
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC, pp. 101.1–101.11 (2011)
Google Scholar
Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)
Google Scholar
Athitsos, V., Liu, Z., Wu, Y., Yuan, J.: Estimating 3D hand pose from a cluttered image. In: CVPR. IEEE (2003)
Google Scholar
Jiu, M., Wolf, C., Taylor, G., Baskurt, A.: Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. 50(1), 122–129 (2014)
Article Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)
Google Scholar
Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real time continuous pose recovery of human hands using convolutional networks. In: SIGGRAPH/ACM-ToG (2014)
Google Scholar
Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)
Article Google Scholar
Malima, A., Özgür, E., Çetin, M.: A fast algorithm for vision-based hand gesture recognition for robot control. In: IEEE 14th Conference on Signal Processing and Communications Applications (2006)
Google Scholar
Mateo, C.M., Gil, P., Corrales, J.A., Puente, S.T., Torres, F.: RGBD human-hand recognition for the interaction with robot-hand. In: IROS (2012)
Google Scholar
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: ICCV (2013)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. In: ICML (2012)
Google Scholar
Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Chapter Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)
Google Scholar
Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)
Google Scholar
Ross, S., Munoz, D., Hebert, M., Bagnell, J.A.: Learningmessage-passing inference machines for structured prediction. In: CVPR, pp. 2737–2744 (2011)
Google Scholar
Shapovalov, R., Vetrov, D., Kohli, P.: Spatial inference machines. In: CVPR, pp. 2985–2992 (2013)
Google Scholar
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR, pp. 1–8 (2008)
Google Scholar
Montillo, A., Shotton, J., Winn, J., Iglesias, J.E., Metaxas, D., Criminisi, A.: Entangled decision forests and their application for semantic segmentation of CT images. In: Székely, G., Hahn, H.K. (eds.) IPMI 2011. LNCS, vol. 6801, pp. 184–196. Springer, Heidelberg (2011)
Chapter Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Article Google Scholar
Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 639–655. Springer, Heidelberg (2012)
Chapter Google Scholar
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Challenges Mach. Learn. 7, 19 (2012)
Google Scholar
Fromont, E., Emonet, R., Kekeç, T., Trémeau, A., Wolf, C.: Contextually constrained deep networks for scene labeling. In: BMVC (2014)
Google Scholar
Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J.: Fast image scanning with deep max-pooling convolutional neural networks. In: ICIP (2013)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)
Google Scholar

Download references

Acknowledgements

This work was partially funded by French grants Interabot, call Investissements d’Avenir, and SoLStiCe (ANR-13-BS02-0002-01), call ANR blanc.

Author information

Authors and Affiliations

CNRS, Université de Lyon, Lyon, France
Natalia Neverova & Christian Wolf
INSA-Lyon, LIRIS, UMR5205, 69621, Lyon, France
Natalia Neverova & Christian Wolf
University of Guelph, Guelph, Canada
Graham W. Taylor
Awabot, Lyon, France
Florian Nebout

Authors

Natalia Neverova
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Graham W. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Florian Nebout
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Neverova .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neverova, N., Wolf, C., Taylor, G.W., Nebout, F. (2015). Hand Segmentation with Structured Convolutional Learning. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-16811-1_45
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics