Abstract
In video-based human gesture recognition, it is very important to combine useful features and analyze the dynamic structure thereof as efficiently as possible. In this paper, we proposed a dynamic Bayesian network model that is a simplified model of dynamics at the level of hidden variables and employs observation windows of observation time slices for robust modeling and handling of noise and other variabilities. The proposed Simplified dynamic Bayesian network (DBN) was tested on a gesture database and an American sign language database. According to the experiments, the proposed DBN outperformed other methods: Conditional Random Fields (CRFs), conventional Bayesian Networks (BNs), DBNs, and Hidden Markov Models (HMMs). The proposed DBN achieved 98 % recognition accuracy in gesture recognition and 94.6 % in ASL recognition whereas the HMM and the CRF did 80 and 86 % in gesture recognition and 75.4 and 85.4 % in ASL (American Sign Language) recognition, respectively.








Similar content being viewed by others
Notes
Korea University Gesture Database, http://gesturedb.korea.ac.kr.
American Sign Language Database, http://www.bu.edu/asllrp/ncslgr.html.
We would like to thank H.-D. Yang, the first author of [22] for providing the feature data and the results.
References
Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37(3), 311–324 (2007). doi:10.1109/TSMCC.2007.893280
Bian, W., Tao, D., Rui, Y.: Cross-domain human action recognition. IEEE Trans. Syst. Man Cybern. Part B Appl. Rev. 42(2), 298–307 (2012). doi:10.1109/TSMCB.2011.2166761
Dielmann, A., Renals, S.: Automatic meeting segmentation using dynamic bayesian networks. IEEE Trans. Multimed. 9(1), 25–36 (2007)
Du, Y., Chen, F., Xu, W., Li, Y.: Recognizing interaction activities using dynamic bayesian network. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 1, pp. 618–621 (2006)
Robertson, N., Reid, I.: Behaviour understanding in video: a combined method. In: Proceedings of The Tenth IEEE International Conference on Computer Vision, vol. 1, pp. 808–815 (2005)
Suk, H.I., Shin, B.K., Lee, S.W.: Hand gesture recognition based on dynamic bayesian network framework. Pattern Recognit. 43(9), 3059–3072 (2010)
Wang, T., Diao, Q., Zhang, Y., Song, G., Lai, C., Bradski, G.: A dynamic bayesian network approach to multi-cue based visual tracking. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 2, pp. 167–170 (2004)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of IEEE, vol. 77, pp. 257–286 (1989)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of International Conference on Machine Learning, pp. 282–289, USA (2001)
Fenton, N., Neil, M.: Making decisions: using bayesian nets and mcda. Knowl. Based Syst. 14, 307–325 (2001)
Heckerman, D.: A tutorial on learning with Bayesian networks. Technical report msr-tr-95-06, Microsoft Research (1995)
Murphy, K.: Dynamic bayesian networks: Representation, inference and learning. Ph.D. thesis, University Of California, Berkeley (2002)
Bitmes, J., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22(5), 89–100 (2005)
Ji, Q., Lan, P., Looney, C.: A probabilistic framework for modeling and real-time monitoring human fatigue. IEEE Trans. Syst. Man Cybern. A 36(35), 862–875 (2006)
Nikolopoulos, S., Papadopoulos, G., Kompatsiaris, I., Patras, I.: Evidence-driven image interpretation by combining implicit and explicit knowledge in a bayesian network. IEEE Trans. Syst. Man Cybern. Part B Appl. Rev. 41(5), 1366–1381 (2011). doi:10.1109/TSMCB.2011.2147781
Park, S., Aggarwal, J.: A hierarchical bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10(2), 164–179 (2004)
Darrell, T., Pentland, A.: Space-time gestures. In: Computer Vision and Pattern Recognition. In: Proceedings of CVPR ’93, 1993 IEEE Computer Society Conference on (1993)
Li, H., Greenspan, M.: Multi-scale gesture recognition from time-varying contours. Int. Conf. Comput. Vis. 1, 226–234 (2005)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Signal Proc. 26(1), 43–49 (1978)
Ahmad, M., Lee, S.W.: Human action recognition using shape and CLG-motion flow from multi-view image sequences. Pattern Recognit. 41(7), 2237–2252 (2008)
Starner, T., Weaver, J., Pentland, A.: Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Yang, H.D., Sclaroff, S., Lee, S.W.: Sign language spotting with a threshold model based on conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1264–1277 (2009)
Moenne-Loccoz, N., Bremond, F., Thonnat, M.: Recurrent bayesian network for the recognition of human behaviors from video. In: Proceedings of 3rd International Conference on Computer Vision Systems, pp. 68–77 (2003)
Wang, S., Quattoni, A., Morency, L.P., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1521–1527 (2006)
Murphy, K.: Bayes net toolbox for Matlab (2014). http://code.google.com/p/bnt/Sept.(2014)
Kudo, T.: CRF++: Yet another CRF toolkit (2005). http://code.google.com/p/crfpp/Sept.(2014)
Lee, H.K., Kim, J.H.: An hmm-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Recognit. 21(10), 961–973 (1999)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE conference on Computer Vision and Patter Recognition, vol. 1, pp. 511–519 (2001)
Yang, H.D., Lee, S.W., Lee, S.W.: Multiple human detection and tracking based on weighted temporal texture features. Int. J. Pattern Recognit. Artif. Intell. 20(3), 377–391 (2006)
Acknowledgments
This research was supported by the Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion Program through the Ministry of Trade, Industry and Energy (Grant No. 10041629) and the 2014 R&D Program for S/W Computing Industrial Core Technology through the Ministry of Science, ICT and Future Planning/Korea Evaluation Institute of Industrial Technology (Project No. 2014-044-023-001), Korea.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Q. Tian.
Rights and permissions
About this article
Cite this article
Roh, MC., Lee, SW. Human gesture recognition using a simplified dynamic Bayesian network. Multimedia Systems 21, 557–568 (2015). https://doi.org/10.1007/s00530-014-0414-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0414-9