Skip to main content

Hand Segmentation with Structured Convolutional Learning

  • Conference paper
  • First Online:
Book cover Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9005))

Included in the following conference series:

Abstract

The availability of cheap and effective depth sensors has resulted in recent advances in human pose estimation and tracking. Detailed estimation of hand pose, however, remains a challenge since fingers are often occluded and may only represent just a few pixels. Moreover, labelled data is difficult to obtain. We propose a deep learning based-approach for hand pose estimation, targeting gesture recognition, that requires very little labelled data. It leverages both unlabeled data and synthetic data from renderings. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding unlabelled real-world samples significantly improves results compared to a purely supervised setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)

    Google Scholar 

  2. Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29, 837–848 (2013)

    Article  Google Scholar 

  3. Keskin, C., Kiraç, F., Kara, Y., Akarun, L.: Real time hand pose estimation using depth sensors. In: ICCV Workshop on Consumer Depth Cameras. IEEE (2011)

    Google Scholar 

  4. Półrola, M., Wojciechowski, A.: Real-time hand pose estimation using classifiers. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 573–580. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Tang, D., Yu, T., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: ICCV (2013)

    Google Scholar 

  6. Shotton, J.: Conditional regression forests for human pose estimation. In: CVPR, pp. 3394–3401 (2012)

    Google Scholar 

  7. Jain, A., Tompson, J., Andriluka, M., Taylor, G., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)

    Google Scholar 

  8. Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC, pp. 101.1–101.11 (2011)

    Google Scholar 

  9. Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)

    Google Scholar 

  10. Athitsos, V., Liu, Z., Wu, Y., Yuan, J.: Estimating 3D hand pose from a cluttered image. In: CVPR. IEEE (2003)

    Google Scholar 

  11. Jiu, M., Wolf, C., Taylor, G., Baskurt, A.: Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. 50(1), 122–129 (2014)

    Article  Google Scholar 

  12. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)

    Google Scholar 

  13. Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real time continuous pose recovery of human hands using convolutional networks. In: SIGGRAPH/ACM-ToG (2014)

    Google Scholar 

  14. Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)

    Article  Google Scholar 

  15. Malima, A., Özgür, E., Çetin, M.: A fast algorithm for vision-based hand gesture recognition for robot control. In: IEEE 14th Conference on Signal Processing and Communications Applications (2006)

    Google Scholar 

  16. Mateo, C.M., Gil, P., Corrales, J.A., Puente, S.T., Torres, F.: RGBD human-hand recognition for the interaction with robot-hand. In: IROS (2012)

    Google Scholar 

  17. Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: ICCV (2013)

    Google Scholar 

  18. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. In: ICML (2012)

    Google Scholar 

  19. Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)

    Google Scholar 

  21. Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)

    Google Scholar 

  22. Ross, S., Munoz, D., Hebert, M., Bagnell, J.A.: Learningmessage-passing inference machines for structured prediction. In: CVPR, pp. 2737–2744 (2011)

    Google Scholar 

  23. Shapovalov, R., Vetrov, D., Kohli, P.: Spatial inference machines. In: CVPR, pp. 2985–2992 (2013)

    Google Scholar 

  24. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  25. Montillo, A., Shotton, J., Winn, J., Iglesias, J.E., Metaxas, D., Criminisi, A.: Entangled decision forests and their application for semantic segmentation of CT images. In: Székely, G., Hahn, H.K. (eds.) IPMI 2011. LNCS, vol. 6801, pp. 184–196. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  26. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)

    Article  Google Scholar 

  27. Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, 2nd edn, pp. 639–655. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  28. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Challenges Mach. Learn. 7, 19 (2012)

    Google Scholar 

  29. Fromont, E., Emonet, R., Kekeç, T., Trémeau, A., Wolf, C.: Contextually constrained deep networks for scene labeling. In: BMVC (2014)

    Google Scholar 

  30. Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J.: Fast image scanning with deep max-pooling convolutional neural networks. In: ICIP (2013)

    Google Scholar 

  31. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by French grants Interabot, call Investissements d’Avenir, and SoLStiCe (ANR-13-BS02-0002-01), call ANR blanc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Neverova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Neverova, N., Wolf, C., Taylor, G.W., Nebout, F. (2015). Hand Segmentation with Structured Convolutional Learning. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16811-1_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16810-4

  • Online ISBN: 978-3-319-16811-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics