Performance Evaluation of Bimodal Hindi Speech Recognition under Adverse Environment

Upadhyaya, Prashant; Farooq, Omar; Abidi, M. R.; Varshney, Priyanka

doi:10.1007/978-3-319-12012-6_38

Prashant Upadhyaya⁶,
Omar Farooq⁶,
M. R. Abidi⁶ &
…
Priyanka Varshney⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 328))

2253 Accesses

Abstract

Designing of a robust Human-Computer Interaction (HCI) system is a challenging task,especially for automatic speech recognition (ASR) when working under unfriendly environment.This paper proposesan ASRsystem which uses bimodal information (i.e. Speech along with the visual input) resulting inimproved robustness. In thisresearch staticand dynamic (∆) audio features are extracted using the Mel-Frequency Cepstral Coefficients (MFCC).The visual feature isextracted using Two-Dimensional Discrete Wavelet Transform (2D-DWT). Audio-video recognition is performed over different combination of visual feature using HMM (Hidden Markov Model) under clean and noisy environmental conditions.Aligarh Muslim University Audio Visual (AMUAV) Hindi database has been chosen as the baseline data. In addition, noisy speech signal performance is evaluated for different Signal to Noise Ratio (SNR: 30 dB to -20 dB). At last, addition of visual information to ASR is reported to increase the accuracy when working under smart assistive environment, i.e. for applications, which may not have the noise-free background condition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Seymour, R., Stewart, D., Ming, J.: Comparison of image transform-based features for visual speech recognition in clean and corrupted videos. EURASIP Journal on Image and Video Processing 2008, 1–9 (2008)
Article Google Scholar
Upadhyaya, P., Farooq, O., Varshney, P., Upadhyaya, A.: Enhancement of VSR Using Low Dimension Visual Feature. In: International Conference on Multimedia Signal Processing and Communication Technologies, IMPACT 2013, AMU, Aligarh, India, pp. 71–74. IEEE Press (2013)
Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91, 1306–1326 (2003)
Article Google Scholar
Petajan, E.: Automatic lipreading to enhance speech recognition. In: IEEE Global Telecommunications Conference, Atlanta, GA, USA, pp. 265–272. IEEE Press (1984)
Google Scholar
Chen, T.: Audiovisual speech processing, Lip Reading and Lip Synchronization. IEEE Signal Processing Magazine, 9–21 (2001)
Google Scholar
Valles, A., Gurban, M., Thiran, J.: Low Dimensional Motion Features for Audio-Visual Speech Recognition. In: 15th European Signal Processing Conference, EUSIPCO, Poznan, Poland, pp. 297–301 (2007)
Google Scholar
Young, S.: A review of large vocabulary continuous speech. IEEE Signal Processing Magazine 13(5), 45–57 (1996)
Article Google Scholar
Upadhyaya, P., Farooq, O., Varshney, P.: Comparative study of viseme recognition by using DCT feature. In: International Symposium Frontier Research on Speech and Music, FRSM, Gurgaon, Haryana, India, pp. 171–175 (2012)
Google Scholar
Varshney, P., Farooq, O., Upadhyaya, P.: Hindi viseme recognition using subspace DCT features. International Journal of Applied Pattern Recognition (in press, 2014)
Google Scholar
Varshney, P., Upadhyaya, P., Farooq, O.: Transform based Visual Features for Bimodal Recognition of Hindi Visemes. International Journal of Electronics and Computer Science Engineering 1(3), 892–897 (2012) ISSN- 2277- 1956
Google Scholar
Stewart, D., Seymour, R., Pass, A., Ming, J.: Robust Audio Visual Speech Recognition under noisy audio-video conditions. IEEE Transactions on Cybernetics 44(2), 175–184 (2014)
Article Google Scholar
Zhou, Z., Hong, X., Zhao, G., Pietikainen, M.: A compact representation of visual speech data using latent variables. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(1), 181–187 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics, AMU, Aligarh, India
Prashant Upadhyaya, Omar Farooq & M. R. Abidi
Department of Electronics, GLA University, Mathura, India
Priyanka Varshney

Authors

Prashant Upadhyaya
View author publications
You can also search for this author in PubMed Google Scholar
Omar Farooq
View author publications
You can also search for this author in PubMed Google Scholar
M. R. Abidi
View author publications
You can also search for this author in PubMed Google Scholar
Priyanka Varshney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prashant Upadhyaya .

Editor information

Editors and Affiliations

Dept. of Computer Science Engineering, Anil Neerukonda Ins Tech & Sci Dept of Comp Sci Engg, Vishakapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
Bhubaneswar Engineering College, Bhubaneswar, Odisha, India
Bhabendra Narayan Biswal
University of Hyderabad, Hyderabad, Andhra Pradesh, India
Siba K. Udgata
Department of Computer Science and Engineering, University of Kalyanai Faculty of Engg., Tech. & Management, Kalyanai, West Bengal, India
J. K. Mandal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P. (2015). Performance Evaluation of Bimodal Hindi Speech Recognition under Adverse Environment. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-12012-6_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12011-9
Online ISBN: 978-3-319-12012-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics