Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 328))

  • 2253 Accesses

Abstract

Designing of a robust Human-Computer Interaction (HCI) system is a challenging task,especially for automatic speech recognition (ASR) when working under unfriendly environment.This paper proposesan ASRsystem which uses bimodal information (i.e. Speech along with the visual input) resulting inimproved robustness. In thisresearch staticand dynamic (∆) audio features are extracted using the Mel-Frequency Cepstral Coefficients (MFCC).The visual feature isextracted using Two-Dimensional Discrete Wavelet Transform (2D-DWT). Audio-video recognition is performed over different combination of visual feature using HMM (Hidden Markov Model) under clean and noisy environmental conditions.Aligarh Muslim University Audio Visual (AMUAV) Hindi database has been chosen as the baseline data. In addition, noisy speech signal performance is evaluated for different Signal to Noise Ratio (SNR: 30 dB to -20 dB). At last, addition of visual information to ASR is reported to increase the accuracy when working under smart assistive environment, i.e. for applications, which may not have the noise-free background condition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Seymour, R., Stewart, D., Ming, J.: Comparison of image transform-based features for visual speech recognition in clean and corrupted videos. EURASIP Journal on Image and Video Processing 2008, 1–9 (2008)

    Article  Google Scholar 

  2. Upadhyaya, P., Farooq, O., Varshney, P., Upadhyaya, A.: Enhancement of VSR Using Low Dimension Visual Feature. In: International Conference on Multimedia Signal Processing and Communication Technologies, IMPACT 2013, AMU, Aligarh, India, pp. 71–74. IEEE Press (2013)

    Google Scholar 

  3. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91, 1306–1326 (2003)

    Article  Google Scholar 

  4. Petajan, E.: Automatic lipreading to enhance speech recognition. In: IEEE Global Telecommunications Conference, Atlanta, GA, USA, pp. 265–272. IEEE Press (1984)

    Google Scholar 

  5. Chen, T.: Audiovisual speech processing, Lip Reading and Lip Synchronization. IEEE Signal Processing Magazine, 9–21 (2001)

    Google Scholar 

  6. Valles, A., Gurban, M., Thiran, J.: Low Dimensional Motion Features for Audio-Visual Speech Recognition. In: 15th European Signal Processing Conference, EUSIPCO, Poznan, Poland, pp. 297–301 (2007)

    Google Scholar 

  7. Young, S.: A review of large vocabulary continuous speech. IEEE Signal Processing Magazine 13(5), 45–57 (1996)

    Article  Google Scholar 

  8. Upadhyaya, P., Farooq, O., Varshney, P.: Comparative study of viseme recognition by using DCT feature. In: International Symposium Frontier Research on Speech and Music, FRSM, Gurgaon, Haryana, India, pp. 171–175 (2012)

    Google Scholar 

  9. Varshney, P., Farooq, O., Upadhyaya, P.: Hindi viseme recognition using subspace DCT features. International Journal of Applied Pattern Recognition (in press, 2014)

    Google Scholar 

  10. Varshney, P., Upadhyaya, P., Farooq, O.: Transform based Visual Features for Bimodal Recognition of Hindi Visemes. International Journal of Electronics and Computer Science Engineering 1(3), 892–897 (2012) ISSN- 2277- 1956

    Google Scholar 

  11. Stewart, D., Seymour, R., Pass, A., Ming, J.: Robust Audio Visual Speech Recognition under noisy audio-video conditions. IEEE Transactions on Cybernetics 44(2), 175–184 (2014)

    Article  Google Scholar 

  12. Zhou, Z., Hong, X., Zhao, G., Pietikainen, M.: A compact representation of visual speech data using latent variables. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(1), 181–187 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prashant Upadhyaya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P. (2015). Performance Evaluation of Bimodal Hindi Speech Recognition under Adverse Environment. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12012-6_38

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12011-9

  • Online ISBN: 978-3-319-12012-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics