Abstract
In Gesture Recognition (GR) tasks, a system with a traditional use of Hidden Markov Models (HMMs) usually serves as a baseline. Their performance is often not so good and therefore somehow overlooked. However, in recent years, especially in Automatic Speech Recognition (ASR), there are advanced methods proposed for this type of model which have been shown to improve significantly recognition results. Among them, the use of Neural Networks (NNs) instead of Gaussian Mixture Models (GMMs) for estimating emission probabilities of HMMs has been considered as one of biggest advances [1,2,3]. This fact implies that the performance of HMM-based models on GR need to be revised. For this reason, in this study, we show that by carefully tailoring NNs to a traditional HMM-based GR system, we can improve significantly the performance, hence, achieving very competitive results on a skeleton-based GR task which is defined by using Microsoft Research Cambridge 12 (MSRC-12) data [4]. It should be pointed out that, it is straightforward to apply our proposed techniques to more complicated GR tasks such as Sign Language Recognition [5], where basically a sequence of sign gestures need to be transcribed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: INTERSPEECH 2011, pp. 437–440 (2011)
Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 14–22 (2012)
Hinton, G.E., Deng, L., Yu, D., Dahl, G.E., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Fothergill, S., Mentis, H.M., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. In: Konstan, C.J.A., Chi, E.H., Höök, K. (eds.) CHI, pp. 1737–1746. ACM (2012)
Forster, J., Koller, O., Oberdörfer, C., Gweth, Y., Ney, H.: Improving continuous sign language recognition: Speech recognition techniques and system design. In: Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, pp. 41–46. Association for Computational Linguistics (2013)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297 (June 2012)
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Proceedings of the Twenty-Third IJCAI, pp. 2466–2472. AAAI Press (2013)
Malgireddy, M., Corso, J., Setlur, S., Govindaraju, V., Mandalapu, D.: A framework for hand gesture recognition and spotting using sub-gesture modeling. In: 20th International Conference on Pattern Recognition, pp. 3780–3783 (2010)
Yang, H.-D., Park, A.-Y., Lee, S.-W.: Gesture spotting and recognition for human ndash; robot interaction. IEEE Transactions on Robotics 23(2), 256–270 (2007)
Elmezain, M., Al-Hamadi, A., Sadek, S., Michaelis, B.: Robust methods for hand gesture spotting and recognition using hidden markov models and conditional random fields. In: 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 131–136 (December 2010)
Wilson, A., Bobick, A.: Parametric hidden markov models for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 884–900 (1999)
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1808–1815 (October 2005)
Wang, S.B., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1521–1527 (2006)
Vinh, L., Lee, S., Le, H., Ngo, H., Kim, H., Han, M., Lee, Y.-K.: Semi-markov conditional random fields for accelerometer-based activity recognition. Applied Intelligence 35(2), 226–241 (2011)
Juang, B.-H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Transactions on Information Theory 32(2), 307–309 (1986)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Forney Jr., G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)
Rath, P.S., Povey, D., Veselý, K., Černocký, J.: Improved feature processing for deep neural networks. In: Proceedings of Interspeech 2013. International Speech Communication Association, vol. 8, pp. 109–113 (2013)
Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: IEEE ICASSP, vol. 1, pp. 13–16 (1992)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (December 2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Le, HS., Pham, NQ., Nguyen, DD. (2015). Neural Networks with Hidden Markov Models in Skeleton-Based Gesture Recognition. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-11680-8_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)