Skip to main content
Log in

Audio-visual speech recognition techniques in augmented reality environments

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Sherman, W.R., Craig, A.B.: Understanding Virtual Reality. Morgan Kaufmann, San Mateo (2003)

    Google Scholar 

  2. Cawood, S., Falia, M.: Augmented Reality: A Practical Guide. Pragmatic Bookshelf (2008)

  3. Arusoaie, A., Cristei, A.I., Livadariu, M.A., Manea, V., Iftene, A.: Augmented reality. In: Proc. of the 12th Int. Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 502–509. IEEE Comput. Soc., Los Alamitos (2010)

    Google Scholar 

  4. Schmalstieg, D., Wagner, D.: Experiences with handheld augmented reality. In: Proc. of the 6th Int. Symposium on Mixed and Augmented Reality, Japan, pp. 3–15. IEEE Press/ACM, New York (2007)

    Google Scholar 

  5. Silva, R., Oliveira, J.C., Giraldi, G.A.: Introduction to Augmented Reality. National Laboratory for Scientific Computation. LNCC research report No. 25, Brazil (2003)

  6. Lange, B.S., Requejo, P., Flynn, S.M., Rizzo, A.A., Cuevas, F.J., Baker, L., Winstein, C.: The potential of virtual reality and gaming to assist successful aging with disability. J. Phys. Med. Rehabil. Clin. N. Am. 21(2), 339–356 (2010)

    Article  Google Scholar 

  7. Zainuddin, N.M., Zaman, H.B.: Augmented reality in science education for deaf students: preliminary analysis. Presented at Regional Conf. on Special Needs Education, Faculty of Education, Malaya Univ (2009)

  8. Zayed, H.S., Sharawy, M.I.: ARSC: an augmented reality solution for the education field. Int. J. Comput. Educ. 56, 1045–1061 (2010)

    Google Scholar 

  9. Passig, D., Eden, S.: Improving flexible thinking in deaf and hard of hearing children with virtual reality technology. Am. Ann. Deaf 145(3), 286–291 (2000)

    Article  Google Scholar 

  10. Kalra, A., Singh, S., Singh, S.: Speech recognition. Int. J. Comput. Sci. Netw. Secur. 10(6), 216–221 (2010)

    Google Scholar 

  11. Mosbah, B.B.: Speech recognition for disabilities people. In: Proc. of the 2nd Information and Communication Technologies (ICTTA), Syria, pp. 864–869 (2006)

    Google Scholar 

  12. Mihelic, F., Zibert, J.: Speech Recognition, Technologies and Applications. InTech Open Access Publisher (2008)

  13. Bailly, G., Vatikiotis, E., Perrier, P.: Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)

    Google Scholar 

  14. Lipovic, I.: Speech and Language Technologies, InTech Open Access Publisher (2011)

  15. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)

    Article  Google Scholar 

  16. Navarathna, R., Lucey, P., Dean, D., Fookes, C., Sridharan, S.: Lip detection for audio-visual speech recognition in-car environment. In: Proc. of the 10th Int. Conf. on Information Science, Signal Processing and Their Applications, pp. 598–601 (2010)

    Google Scholar 

  17. Shen, P., Tamura, S., Hayamizu, S.: Evaluation of real-time audio-visual speech recognition. Presented at Int. Conf. on Audio-Visual Speech Processing, Japan (2010)

  18. Zainuddin, N.M.M., Zaman, H.B., Ahmad, A.: Developing augmented reality book for deaf in science: the determining factors. In: Proc. of the Int. Symposium in Information Technology (ITSim), pp. 1–4. IEEE Press, Los Alamitos (2010)

    Google Scholar 

  19. Lopez-Ludena, V., San-Segundo, R., Martin, R., Sanchez, D., Garcia, A.: Evaluating a speech communication system for deaf people. J. Latin Am. Trans. 9(4), 556–570 (2011)

    Google Scholar 

  20. Irawati, S., Green, S., Billinghurst, M., Duenser, A., Ko, H.: Move the couch where? Developing an augmented reality multimodal interface. In: Proc. of 5th Int. Symposium on Mixed and Augmented Reality, pp. 183–186. IEEE/ACM, Los Alamitos (2006)

    Google Scholar 

  21. Hanlon, N., Namee, B.M., Kelleher, J.D.: Just Say It: an evaluation of speech interfaces for augmented reality design applications. In: Proc. of the 20th Irish Conf. on Artificial and Cognitive Science (AICS), pp. 134–143 (2009)

    Google Scholar 

  22. Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proc. of the 5th Int. Conf. on Multimodal Interfaces, Vancouver, BC, Canada, pp. 12–19. ACM, New York (2003)

    Chapter  Google Scholar 

  23. Goose, S., Sudarsky, S., Zhang, X., Navab, N.: Speech-enabled augmented reality supporting mobile industrial maintenance. Int. J. Pervasive Comput. Commun. 2(1), 65–70 (2003)

    Article  Google Scholar 

  24. Chin, S.W., Ang, L.M., Seng, K.P.: Lips detection for audio-visual speech recognition system. Presented at Int. Symposium on Intelligent Signal Processing and Communication Systems, Thailand (2008)

    Google Scholar 

  25. Adobe Systems Inc.: Adobe flash builder (2011). http://www.adobe.com/products/flash-builder.html. Accessed 10 June 2011

  26. Open Computer Vision Library: Open AVSR Alpha 1 (2011). http://sourceforge.net/projects/opencvlibrary/files/obsolete/. Accessed 12 May 2011

  27. Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Sebastopol (2008)

    Google Scholar 

  28. Liu, X.X., Zhao, Y., Pi, X., Liang, L.H., Nefian, A.V.: Audio-visual continuous speech recognition using a coupled hidden Markov model. In: Proc. of the 7th Int. Conf. on Spoken Language Processing, Denver, CO, pp. 213–216 (2002)

    Google Scholar 

  29. Braunstein, R., Wright, M.H., Noble, J.J.: ActionScript 3.0 Bible. Wiley, New York (2007)

    Google Scholar 

  30. Transmote: FLARManager: augmented reality in flash (2011). http://words.transmote.com/wp/flarmanager/. Accessed 8 May 2011

  31. Spark Project Team: FLARToolKit (2011). http://www.libspark.org/wiki/saqoosha/FLARToolKit/en. Accessed 8 May 2011

  32. Hohl, W.: Interactive environment with open-source software: 3D walkthrough and augmented reality for architects with blender 2.43, DART 3.0 and ARToolkit 2.72. Springer Vienna Architecture (2008)

  33. Hello Enjoy Company: Papervision 3D (2011). http://blog.papervision3d.org/. Accessed 8 May 2011

  34. Spark Project Team: Marilena face detection (2011). http://www.libspark.org/wiki/mash/Marilena. Accessed 14 June 2011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Reza Mirzaei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mirzaei, M.R., Ghorshi, S. & Mortazavi, M. Audio-visual speech recognition techniques in augmented reality environments. Vis Comput 30, 245–257 (2014). https://doi.org/10.1007/s00371-013-0841-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-013-0841-1

Keywords

Navigation