Abstract
Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.
Similar content being viewed by others
References
Sherman, W.R., Craig, A.B.: Understanding Virtual Reality. Morgan Kaufmann, San Mateo (2003)
Cawood, S., Falia, M.: Augmented Reality: A Practical Guide. Pragmatic Bookshelf (2008)
Arusoaie, A., Cristei, A.I., Livadariu, M.A., Manea, V., Iftene, A.: Augmented reality. In: Proc. of the 12th Int. Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 502–509. IEEE Comput. Soc., Los Alamitos (2010)
Schmalstieg, D., Wagner, D.: Experiences with handheld augmented reality. In: Proc. of the 6th Int. Symposium on Mixed and Augmented Reality, Japan, pp. 3–15. IEEE Press/ACM, New York (2007)
Silva, R., Oliveira, J.C., Giraldi, G.A.: Introduction to Augmented Reality. National Laboratory for Scientific Computation. LNCC research report No. 25, Brazil (2003)
Lange, B.S., Requejo, P., Flynn, S.M., Rizzo, A.A., Cuevas, F.J., Baker, L., Winstein, C.: The potential of virtual reality and gaming to assist successful aging with disability. J. Phys. Med. Rehabil. Clin. N. Am. 21(2), 339–356 (2010)
Zainuddin, N.M., Zaman, H.B.: Augmented reality in science education for deaf students: preliminary analysis. Presented at Regional Conf. on Special Needs Education, Faculty of Education, Malaya Univ (2009)
Zayed, H.S., Sharawy, M.I.: ARSC: an augmented reality solution for the education field. Int. J. Comput. Educ. 56, 1045–1061 (2010)
Passig, D., Eden, S.: Improving flexible thinking in deaf and hard of hearing children with virtual reality technology. Am. Ann. Deaf 145(3), 286–291 (2000)
Kalra, A., Singh, S., Singh, S.: Speech recognition. Int. J. Comput. Sci. Netw. Secur. 10(6), 216–221 (2010)
Mosbah, B.B.: Speech recognition for disabilities people. In: Proc. of the 2nd Information and Communication Technologies (ICTTA), Syria, pp. 864–869 (2006)
Mihelic, F., Zibert, J.: Speech Recognition, Technologies and Applications. InTech Open Access Publisher (2008)
Bailly, G., Vatikiotis, E., Perrier, P.: Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)
Lipovic, I.: Speech and Language Technologies, InTech Open Access Publisher (2011)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)
Navarathna, R., Lucey, P., Dean, D., Fookes, C., Sridharan, S.: Lip detection for audio-visual speech recognition in-car environment. In: Proc. of the 10th Int. Conf. on Information Science, Signal Processing and Their Applications, pp. 598–601 (2010)
Shen, P., Tamura, S., Hayamizu, S.: Evaluation of real-time audio-visual speech recognition. Presented at Int. Conf. on Audio-Visual Speech Processing, Japan (2010)
Zainuddin, N.M.M., Zaman, H.B., Ahmad, A.: Developing augmented reality book for deaf in science: the determining factors. In: Proc. of the Int. Symposium in Information Technology (ITSim), pp. 1–4. IEEE Press, Los Alamitos (2010)
Lopez-Ludena, V., San-Segundo, R., Martin, R., Sanchez, D., Garcia, A.: Evaluating a speech communication system for deaf people. J. Latin Am. Trans. 9(4), 556–570 (2011)
Irawati, S., Green, S., Billinghurst, M., Duenser, A., Ko, H.: Move the couch where? Developing an augmented reality multimodal interface. In: Proc. of 5th Int. Symposium on Mixed and Augmented Reality, pp. 183–186. IEEE/ACM, Los Alamitos (2006)
Hanlon, N., Namee, B.M., Kelleher, J.D.: Just Say It: an evaluation of speech interfaces for augmented reality design applications. In: Proc. of the 20th Irish Conf. on Artificial and Cognitive Science (AICS), pp. 134–143 (2009)
Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proc. of the 5th Int. Conf. on Multimodal Interfaces, Vancouver, BC, Canada, pp. 12–19. ACM, New York (2003)
Goose, S., Sudarsky, S., Zhang, X., Navab, N.: Speech-enabled augmented reality supporting mobile industrial maintenance. Int. J. Pervasive Comput. Commun. 2(1), 65–70 (2003)
Chin, S.W., Ang, L.M., Seng, K.P.: Lips detection for audio-visual speech recognition system. Presented at Int. Symposium on Intelligent Signal Processing and Communication Systems, Thailand (2008)
Adobe Systems Inc.: Adobe flash builder (2011). http://www.adobe.com/products/flash-builder.html. Accessed 10 June 2011
Open Computer Vision Library: Open AVSR Alpha 1 (2011). http://sourceforge.net/projects/opencvlibrary/files/obsolete/. Accessed 12 May 2011
Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Sebastopol (2008)
Liu, X.X., Zhao, Y., Pi, X., Liang, L.H., Nefian, A.V.: Audio-visual continuous speech recognition using a coupled hidden Markov model. In: Proc. of the 7th Int. Conf. on Spoken Language Processing, Denver, CO, pp. 213–216 (2002)
Braunstein, R., Wright, M.H., Noble, J.J.: ActionScript 3.0 Bible. Wiley, New York (2007)
Transmote: FLARManager: augmented reality in flash (2011). http://words.transmote.com/wp/flarmanager/. Accessed 8 May 2011
Spark Project Team: FLARToolKit (2011). http://www.libspark.org/wiki/saqoosha/FLARToolKit/en. Accessed 8 May 2011
Hohl, W.: Interactive environment with open-source software: 3D walkthrough and augmented reality for architects with blender 2.43, DART 3.0 and ARToolkit 2.72. Springer Vienna Architecture (2008)
Hello Enjoy Company: Papervision 3D (2011). http://blog.papervision3d.org/. Accessed 8 May 2011
Spark Project Team: Marilena face detection (2011). http://www.libspark.org/wiki/mash/Marilena. Accessed 14 June 2011
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mirzaei, M.R., Ghorshi, S. & Mortazavi, M. Audio-visual speech recognition techniques in augmented reality environments. Vis Comput 30, 245–257 (2014). https://doi.org/10.1007/s00371-013-0841-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-013-0841-1