Skip to main content

Neural Networks with Hidden Markov Models in Skeleton-Based Gesture Recognition

  • Conference paper
Book cover Knowledge and Systems Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

In Gesture Recognition (GR) tasks, a system with a traditional use of Hidden Markov Models (HMMs) usually serves as a baseline. Their performance is often not so good and therefore somehow overlooked. However, in recent years, especially in Automatic Speech Recognition (ASR), there are advanced methods proposed for this type of model which have been shown to improve significantly recognition results. Among them, the use of Neural Networks (NNs) instead of Gaussian Mixture Models (GMMs) for estimating emission probabilities of HMMs has been considered as one of biggest advances [1,2,3]. This fact implies that the performance of HMM-based models on GR need to be revised. For this reason, in this study, we show that by carefully tailoring NNs to a traditional HMM-based GR system, we can improve significantly the performance, hence, achieving very competitive results on a skeleton-based GR task which is defined by using Microsoft Research Cambridge 12 (MSRC-12) data [4]. It should be pointed out that, it is straightforward to apply our proposed techniques to more complicated GR tasks such as Sign Language Recognition [5], where basically a sequence of sign gestures need to be transcribed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: INTERSPEECH 2011, pp. 437–440 (2011)

    Google Scholar 

  2. Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 14–22 (2012)

    Article  Google Scholar 

  3. Hinton, G.E., Deng, L., Yu, D., Dahl, G.E., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  4. Fothergill, S., Mentis, H.M., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. In: Konstan, C.J.A., Chi, E.H., Höök, K. (eds.) CHI, pp. 1737–1746. ACM (2012)

    Google Scholar 

  5. Forster, J., Koller, O., Oberdörfer, C., Gweth, Y., Ney, H.: Improving continuous sign language recognition: Speech recognition techniques and system design. In: Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, pp. 41–46. Association for Computational Linguistics (2013)

    Google Scholar 

  6. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297 (June 2012)

    Google Scholar 

  7. Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Proceedings of the Twenty-Third IJCAI, pp. 2466–2472. AAAI Press (2013)

    Google Scholar 

  8. Malgireddy, M., Corso, J., Setlur, S., Govindaraju, V., Mandalapu, D.: A framework for hand gesture recognition and spotting using sub-gesture modeling. In: 20th International Conference on Pattern Recognition, pp. 3780–3783 (2010)

    Google Scholar 

  9. Yang, H.-D., Park, A.-Y., Lee, S.-W.: Gesture spotting and recognition for human ndash; robot interaction. IEEE Transactions on Robotics 23(2), 256–270 (2007)

    Google Scholar 

  10. Elmezain, M., Al-Hamadi, A., Sadek, S., Michaelis, B.: Robust methods for hand gesture spotting and recognition using hidden markov models and conditional random fields. In: 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 131–136 (December 2010)

    Google Scholar 

  11. Wilson, A., Bobick, A.: Parametric hidden markov models for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 884–900 (1999)

    Article  Google Scholar 

  12. Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1808–1815 (October 2005)

    Google Scholar 

  13. Wang, S.B., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1521–1527 (2006)

    Google Scholar 

  14. Vinh, L., Lee, S., Le, H., Ngo, H., Kim, H., Han, M., Lee, Y.-K.: Semi-markov conditional random fields for accelerometer-based activity recognition. Applied Intelligence 35(2), 226–241 (2011)

    Article  Google Scholar 

  15. Juang, B.-H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Transactions on Information Theory 32(2), 307–309 (1986)

    Article  Google Scholar 

  16. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  17. Forney Jr., G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  18. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  19. Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  20. Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)

    Google Scholar 

  21. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)

    Google Scholar 

  22. Rath, P.S., Povey, D., Veselý, K., Černocký, J.: Improved feature processing for deep neural networks. In: Proceedings of Interspeech 2013. International Speech Communication Association, vol. 8, pp. 109–113 (2013)

    Google Scholar 

  23. Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: IEEE ICASSP, vol. 1, pp. 13–16 (1992)

    Google Scholar 

  24. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)

    Google Scholar 

  25. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (December 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Son Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Le, HS., Pham, NQ., Nguyen, DD. (2015). Neural Networks with Hidden Markov Models in Skeleton-Based Gesture Recognition. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11680-8_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11679-2

  • Online ISBN: 978-3-319-11680-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics