Abstract
A new multi-sensor fusion framework is proposed, which is based on the Convolutional Neural Network (CNN) and the Dynamic Bayesian Network (DBN) for Sign Language Recognition (SLR). In this framework, a Microsoft Kinect, which is a low-cost RGB-D sensor, is used as tools of the Human-Computer-Interaction (HCI). In our method, at first, the color and depth videos are collected using the Kinect, the next, all image sequences features are extracted out using the CNN. The color and depth feature sequences are input into the DBN as observation data. Based on graph model fusion, the maximum recognition rate of dynamic isolated sign language is calculated. The proposed the DBN + CNN SLR framework is tested in our dataset, the highest recognition rate can up to 99.40%. The test results show that our approach is effective.
Similar content being viewed by others
References
Almeida SGM, Guimarães FG, Ramírez JA (2014) Feature extraction in brazilian sign language recognition based on phonological structure and using RGB-D sensors. Expert Syst Appl 4l:7259–7271
Brand MO, Pentland N (1997) A coupled hidden Markov models for complex action recognition. CVPR: 994–999
Celebi S, Aydin AS, Temiz TT, Arici T (2013) Gesture recognition using skeleton data with weighted dynamic time warping. Int Conf Comput Vision Theory Appl: 620–625
Chen FS, Fu CM, Huang CL (2003) Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis Comput 2003(21):745–758
Yan Chenggang, Zhang Yongdong, Xu Jizheng, Dai Feng, Li Liang, Dai Qionghai, Wu Feng. A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Sign Process Lett, v 21, n 5, p 573–576, 2014
Yan Chenggang, Zhang Yongdong, Xu Jizheng, Dai Feng, Zhang Jun, Dai Qionghai, Wu Feng. Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circ Syst Video Technol, 24, n 12, p 2077–2089, 2014
Yan Chenggang, Xie Hongtao, Yang Dongbao, Yin Jian, Zhang Yongdong, Dai Qionghai. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst, 19, n 1, p 284–295, 2018
Yan Chenggang, Xie Hongtao, Liu Shun, Yin Jian, Zhang Yongdong, Dai Qionghai. Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst, 19, n 1, p 220–229, 2018
Chu SM, Huang TS (2002) Audio-visual speech modeling using coupled hidden markov models. ICASSP: 2009–2012
Dagum P, Galper A, Horvitz E (1992) Dynamic network models for forecasting. Proc Eighth Conf Uncertainty Artif Intell AUAI Press: 41–48
Elons A, Ahmed M, Shedid H, Tolba M (2014) Arabia sign language recognition using leap motion sensor. Int Conf Comput Eng Syst:368–373
Graves A, Liwicki M, Fern’andez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868
Lang S, Block M, Rojas R (2012) Sign language recognition using kinect. Artif Intell Soft Comput: 394–402
Marin G, Dominio F, Zanuttigh P (2014) Hand gesture recognition with leap motion and kinect devices. Int Conf Image Process: 1565–1569
Marin G, Dominio F, Zanuttigh P (2015) Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed Tools Appl I25
Nefian AV, Liang L, Pi X, Xiaoxiang L, Mao C, Murphy K (2002) A coupled hmm for audio-visual speech recognition. ICASSP 2002:2013–2016
Pedersoli F, Benini S, Adami N, Leonardi R (2014) Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis Comput 30:1107–1122
Pugeault N, Bowden R (2011) Spelling it out: real-time ASL finger spelling recognition. ICCV:1114–1119
Russell S, Norvig P (2010) Artificial intelligence: a modern approach (third ed.). Prentice Hall
Suk HI, Sin BK, Lee SW (2010) Hand gesture recognition based on dynamic Bayesian network framework. Pattern Recogn: 3059–3072
Acknowledgements
This work is supported by the Nature Science Foundation of China (Nos. 60972095, 61271362, 61671362) and Nature Science Basic Research Plan in Shaanxi Province of China (Nos. 2017JM6041).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Qinkun Xiao stated that he has no conflicts of interest.
Author Zhao Yidan claims she has no conflicts of interest.
Author Wang Huan claims she has no conflicts of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xiao, Q., Zhao, Y. & Huan, W. Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network. Multimed Tools Appl 78, 15335–15352 (2019). https://doi.org/10.1007/s11042-018-6939-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6939-8