Audio-visual speech recognition techniques in augmented reality environments

Mirzaei, Mohammad Reza; Ghorshi, Seyed; Mortazavi, Mohammad

doi:10.1007/s00371-013-0841-1

Audio-visual speech recognition techniques in augmented reality environments

Original Article
Published: 18 May 2013

Volume 30, pages 245–257, (2014)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Mohammad Reza Mirzaei¹,
Seyed Ghorshi¹ &
Mohammad Mortazavi¹

2600 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sherman, W.R., Craig, A.B.: Understanding Virtual Reality. Morgan Kaufmann, San Mateo (2003)
Google Scholar
Cawood, S., Falia, M.: Augmented Reality: A Practical Guide. Pragmatic Bookshelf (2008)
Arusoaie, A., Cristei, A.I., Livadariu, M.A., Manea, V., Iftene, A.: Augmented reality. In: Proc. of the 12th Int. Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 502–509. IEEE Comput. Soc., Los Alamitos (2010)
Google Scholar
Schmalstieg, D., Wagner, D.: Experiences with handheld augmented reality. In: Proc. of the 6th Int. Symposium on Mixed and Augmented Reality, Japan, pp. 3–15. IEEE Press/ACM, New York (2007)
Google Scholar
Silva, R., Oliveira, J.C., Giraldi, G.A.: Introduction to Augmented Reality. National Laboratory for Scientific Computation. LNCC research report No. 25, Brazil (2003)
Lange, B.S., Requejo, P., Flynn, S.M., Rizzo, A.A., Cuevas, F.J., Baker, L., Winstein, C.: The potential of virtual reality and gaming to assist successful aging with disability. J. Phys. Med. Rehabil. Clin. N. Am. 21(2), 339–356 (2010)
Article Google Scholar
Zainuddin, N.M., Zaman, H.B.: Augmented reality in science education for deaf students: preliminary analysis. Presented at Regional Conf. on Special Needs Education, Faculty of Education, Malaya Univ (2009)
Zayed, H.S., Sharawy, M.I.: ARSC: an augmented reality solution for the education field. Int. J. Comput. Educ. 56, 1045–1061 (2010)
Google Scholar
Passig, D., Eden, S.: Improving flexible thinking in deaf and hard of hearing children with virtual reality technology. Am. Ann. Deaf 145(3), 286–291 (2000)
Article Google Scholar
Kalra, A., Singh, S., Singh, S.: Speech recognition. Int. J. Comput. Sci. Netw. Secur. 10(6), 216–221 (2010)
Google Scholar
Mosbah, B.B.: Speech recognition for disabilities people. In: Proc. of the 2nd Information and Communication Technologies (ICTTA), Syria, pp. 864–869 (2006)
Google Scholar
Mihelic, F., Zibert, J.: Speech Recognition, Technologies and Applications. InTech Open Access Publisher (2008)
Bailly, G., Vatikiotis, E., Perrier, P.: Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)
Google Scholar
Lipovic, I.: Speech and Language Technologies, InTech Open Access Publisher (2011)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)
Article Google Scholar
Navarathna, R., Lucey, P., Dean, D., Fookes, C., Sridharan, S.: Lip detection for audio-visual speech recognition in-car environment. In: Proc. of the 10th Int. Conf. on Information Science, Signal Processing and Their Applications, pp. 598–601 (2010)
Google Scholar
Shen, P., Tamura, S., Hayamizu, S.: Evaluation of real-time audio-visual speech recognition. Presented at Int. Conf. on Audio-Visual Speech Processing, Japan (2010)
Zainuddin, N.M.M., Zaman, H.B., Ahmad, A.: Developing augmented reality book for deaf in science: the determining factors. In: Proc. of the Int. Symposium in Information Technology (ITSim), pp. 1–4. IEEE Press, Los Alamitos (2010)
Google Scholar
Lopez-Ludena, V., San-Segundo, R., Martin, R., Sanchez, D., Garcia, A.: Evaluating a speech communication system for deaf people. J. Latin Am. Trans. 9(4), 556–570 (2011)
Google Scholar
Irawati, S., Green, S., Billinghurst, M., Duenser, A., Ko, H.: Move the couch where? Developing an augmented reality multimodal interface. In: Proc. of 5th Int. Symposium on Mixed and Augmented Reality, pp. 183–186. IEEE/ACM, Los Alamitos (2006)
Google Scholar
Hanlon, N., Namee, B.M., Kelleher, J.D.: Just Say It: an evaluation of speech interfaces for augmented reality design applications. In: Proc. of the 20th Irish Conf. on Artificial and Cognitive Science (AICS), pp. 134–143 (2009)
Google Scholar
Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proc. of the 5th Int. Conf. on Multimodal Interfaces, Vancouver, BC, Canada, pp. 12–19. ACM, New York (2003)
Chapter Google Scholar
Goose, S., Sudarsky, S., Zhang, X., Navab, N.: Speech-enabled augmented reality supporting mobile industrial maintenance. Int. J. Pervasive Comput. Commun. 2(1), 65–70 (2003)
Article Google Scholar
Chin, S.W., Ang, L.M., Seng, K.P.: Lips detection for audio-visual speech recognition system. Presented at Int. Symposium on Intelligent Signal Processing and Communication Systems, Thailand (2008)
Google Scholar
Adobe Systems Inc.: Adobe flash builder (2011). http://www.adobe.com/products/flash-builder.html. Accessed 10 June 2011
Open Computer Vision Library: Open AVSR Alpha 1 (2011). http://sourceforge.net/projects/opencvlibrary/files/obsolete/. Accessed 12 May 2011
Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Sebastopol (2008)
Google Scholar
Liu, X.X., Zhao, Y., Pi, X., Liang, L.H., Nefian, A.V.: Audio-visual continuous speech recognition using a coupled hidden Markov model. In: Proc. of the 7th Int. Conf. on Spoken Language Processing, Denver, CO, pp. 213–216 (2002)
Google Scholar
Braunstein, R., Wright, M.H., Noble, J.J.: ActionScript 3.0 Bible. Wiley, New York (2007)
Google Scholar
Transmote: FLARManager: augmented reality in flash (2011). http://words.transmote.com/wp/flarmanager/. Accessed 8 May 2011
Spark Project Team: FLARToolKit (2011). http://www.libspark.org/wiki/saqoosha/FLARToolKit/en. Accessed 8 May 2011
Hohl, W.: Interactive environment with open-source software: 3D walkthrough and augmented reality for architects with blender 2.43, DART 3.0 and ARToolkit 2.72. Springer Vienna Architecture (2008)
Hello Enjoy Company: Papervision 3D (2011). http://blog.papervision3d.org/. Accessed 8 May 2011
Spark Project Team: Marilena face detection (2011). http://www.libspark.org/wiki/mash/Marilena. Accessed 14 June 2011

Download references

Author information

Authors and Affiliations

School of Science and Engineering, Sharif University of Technology, International Campus, Kish Island, Iran
Mohammad Reza Mirzaei, Seyed Ghorshi & Mohammad Mortazavi

Authors

Mohammad Reza Mirzaei
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Ghorshi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Reza Mirzaei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mirzaei, M.R., Ghorshi, S. & Mortazavi, M. Audio-visual speech recognition techniques in augmented reality environments. Vis Comput 30, 245–257 (2014). https://doi.org/10.1007/s00371-013-0841-1

Download citation

Published: 18 May 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s00371-013-0841-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio-visual speech recognition techniques in augmented reality environments

Abstract

Access this article

Similar content being viewed by others

Augmented Reality: A Comprehensive Review

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Affective auditory stimulus database: An expanded version of the International Affective Digitized Sounds (IADS-E)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio-visual speech recognition techniques in augmented reality environments

Abstract

Access this article

Similar content being viewed by others

Augmented Reality: A Comprehensive Review

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Affective auditory stimulus database: An expanded version of the International Affective Digitized Sounds (IADS-E)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation