Skip to main content
Log in

Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A new multi-sensor fusion framework is proposed, which is based on the Convolutional Neural Network (CNN) and the Dynamic Bayesian Network (DBN) for Sign Language Recognition (SLR). In this framework, a Microsoft Kinect, which is a low-cost RGB-D sensor, is used as tools of the Human-Computer-Interaction (HCI). In our method, at first, the color and depth videos are collected using the Kinect, the next, all image sequences features are extracted out using the CNN. The color and depth feature sequences are input into the DBN as observation data. Based on graph model fusion, the maximum recognition rate of dynamic isolated sign language is calculated. The proposed the DBN + CNN SLR framework is tested in our dataset, the highest recognition rate can up to 99.40%. The test results show that our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Almeida SGM, Guimarães FG, Ramírez JA (2014) Feature extraction in brazilian sign language recognition based on phonological structure and using RGB-D sensors. Expert Syst Appl 4l:7259–7271

    Article  Google Scholar 

  2. Brand MO, Pentland N (1997) A coupled hidden Markov models for complex action recognition. CVPR: 994–999

  3. Celebi S, Aydin AS, Temiz TT, Arici T (2013) Gesture recognition using skeleton data with weighted dynamic time warping. Int Conf Comput Vision Theory Appl: 620–625

  4. Chen FS, Fu CM, Huang CL (2003) Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis Comput 2003(21):745–758

    Article  Google Scholar 

  5. Yan Chenggang, Zhang Yongdong, Xu Jizheng, Dai Feng, Li Liang, Dai Qionghai, Wu Feng. A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Sign Process Lett, v 21, n 5, p 573–576, 2014

  6. Yan Chenggang, Zhang Yongdong, Xu Jizheng, Dai Feng, Zhang Jun, Dai Qionghai, Wu Feng. Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circ Syst Video Technol, 24, n 12, p 2077–2089, 2014

  7. Yan Chenggang, Xie Hongtao, Yang Dongbao, Yin Jian, Zhang Yongdong, Dai Qionghai. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst, 19, n 1, p 284–295, 2018

  8. Yan Chenggang, Xie Hongtao, Liu Shun, Yin Jian, Zhang Yongdong, Dai Qionghai. Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst, 19, n 1, p 220–229, 2018

  9. Chu SM, Huang TS (2002) Audio-visual speech modeling using coupled hidden markov models. ICASSP: 2009–2012

  10. Dagum P, Galper A, Horvitz E (1992) Dynamic network models for forecasting. Proc Eighth Conf Uncertainty Artif Intell AUAI Press: 41–48

  11. Elons A, Ahmed M, Shedid H, Tolba M (2014) Arabia sign language recognition using leap motion sensor. Int Conf Comput Eng Syst:368–373

  12. Graves A, Liwicki M, Fern’andez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868

    Article  Google Scholar 

  13. Lang S, Block M, Rojas R (2012) Sign language recognition using kinect. Artif Intell Soft Comput: 394–402

  14. Marin G, Dominio F, Zanuttigh P (2014) Hand gesture recognition with leap motion and kinect devices. Int Conf Image Process: 1565–1569

  15. Marin G, Dominio F, Zanuttigh P (2015) Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed Tools Appl I25

  16. Nefian AV, Liang L, Pi X, Xiaoxiang L, Mao C, Murphy K (2002) A coupled hmm for audio-visual speech recognition. ICASSP 2002:2013–2016

    Google Scholar 

  17. Pedersoli F, Benini S, Adami N, Leonardi R (2014) Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis Comput 30:1107–1122

    Article  Google Scholar 

  18. Pugeault N, Bowden R (2011) Spelling it out: real-time ASL finger spelling recognition. ICCV:1114–1119

  19. Russell S, Norvig P (2010) Artificial intelligence: a modern approach (third ed.). Prentice Hall

  20. Suk HI, Sin BK, Lee SW (2010) Hand gesture recognition based on dynamic Bayesian network framework. Pattern Recogn: 3059–3072

Download references

Acknowledgements

This work is supported by the Nature Science Foundation of China (Nos. 60972095, 61271362, 61671362) and Nature Science Basic Research Plan in Shaanxi Province of China (Nos. 2017JM6041).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinkun Xiao.

Ethics declarations

Conflicts of interest

Qinkun Xiao stated that he has no conflicts of interest.

Author Zhao Yidan claims she has no conflicts of interest.

Author Wang Huan claims she has no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Q., Zhao, Y. & Huan, W. Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network. Multimed Tools Appl 78, 15335–15352 (2019). https://doi.org/10.1007/s11042-018-6939-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6939-8

Keywords

Navigation