Neural Networks with Hidden Markov Models in Skeleton-Based Gesture Recognition

Le, Hai-Son; Pham, Ngoc-Quan; Nguyen, Duc-Dung

doi:10.1007/978-3-319-11680-8_24

Hai-Son Le⁵,
Ngoc-Quan Pham⁵ &
Duc-Dung Nguyen⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

1803 Accesses
2 Citations

Abstract

In Gesture Recognition (GR) tasks, a system with a traditional use of Hidden Markov Models (HMMs) usually serves as a baseline. Their performance is often not so good and therefore somehow overlooked. However, in recent years, especially in Automatic Speech Recognition (ASR), there are advanced methods proposed for this type of model which have been shown to improve significantly recognition results. Among them, the use of Neural Networks (NNs) instead of Gaussian Mixture Models (GMMs) for estimating emission probabilities of HMMs has been considered as one of biggest advances [1,2,3]. This fact implies that the performance of HMM-based models on GR need to be revised. For this reason, in this study, we show that by carefully tailoring NNs to a traditional HMM-based GR system, we can improve significantly the performance, hence, achieving very competitive results on a skeleton-based GR task which is defined by using Microsoft Research Cambridge 12 (MSRC-12) data [4]. It should be pointed out that, it is straightforward to apply our proposed techniques to more complicated GR tasks such as Sign Language Recognition [5], where basically a sequence of sign gestures need to be transcribed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: INTERSPEECH 2011, pp. 437–440 (2011)
Google Scholar
Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 14–22 (2012)
Article Google Scholar
Hinton, G.E., Deng, L., Yu, D., Dahl, G.E., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Fothergill, S., Mentis, H.M., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. In: Konstan, C.J.A., Chi, E.H., Höök, K. (eds.) CHI, pp. 1737–1746. ACM (2012)
Google Scholar
Forster, J., Koller, O., Oberdörfer, C., Gweth, Y., Ney, H.: Improving continuous sign language recognition: Speech recognition techniques and system design. In: Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, pp. 41–46. Association for Computational Linguistics (2013)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297 (June 2012)
Google Scholar
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Proceedings of the Twenty-Third IJCAI, pp. 2466–2472. AAAI Press (2013)
Google Scholar
Malgireddy, M., Corso, J., Setlur, S., Govindaraju, V., Mandalapu, D.: A framework for hand gesture recognition and spotting using sub-gesture modeling. In: 20th International Conference on Pattern Recognition, pp. 3780–3783 (2010)
Google Scholar
Yang, H.-D., Park, A.-Y., Lee, S.-W.: Gesture spotting and recognition for human ndash; robot interaction. IEEE Transactions on Robotics 23(2), 256–270 (2007)
Google Scholar
Elmezain, M., Al-Hamadi, A., Sadek, S., Michaelis, B.: Robust methods for hand gesture spotting and recognition using hidden markov models and conditional random fields. In: 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 131–136 (December 2010)
Google Scholar
Wilson, A., Bobick, A.: Parametric hidden markov models for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 884–900 (1999)
Article Google Scholar
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1808–1815 (October 2005)
Google Scholar
Wang, S.B., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1521–1527 (2006)
Google Scholar
Vinh, L., Lee, S., Le, H., Ngo, H., Kim, H., Han, M., Lee, Y.-K.: Semi-markov conditional random fields for accelerometer-based activity recognition. Applied Intelligence 35(2), 226–241 (2011)
Article Google Scholar
Juang, B.-H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Transactions on Information Theory 32(2), 307–309 (1986)
Article Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Article MATH Google Scholar
Forney Jr., G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Article MathSciNet MATH Google Scholar
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)
Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)
Google Scholar
Rath, P.S., Povey, D., Veselý, K., Černocký, J.: Improved feature processing for deep neural networks. In: Proceedings of Interspeech 2013. International Speech Communication Association, vol. 8, pp. 109–113 (2013)
Google Scholar
Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: IEEE ICASSP, vol. 1, pp. 13–16 (1992)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (December 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
Hai-Son Le, Ngoc-Quan Pham & Duc-Dung Nguyen

Authors

Hai-Son Le
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Quan Pham
View author publications
You can also search for this author in PubMed Google Scholar
Duc-Dung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai-Son Le .

Editor information

Editors and Affiliations

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Viet-Ha Nguyen
Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Anh-Cuong Le
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, HS., Pham, NQ., Nguyen, DD. (2015). Neural Networks with Hidden Markov Models in Skeleton-Based Gesture Recognition. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-11680-8_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics