skip to main content
10.1145/1667780.1667832acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiucsConference Proceedingsconference-collections
research-article

Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

Authors Info & Claims
Published:03 December 2009Publication History

ABSTRACT

In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.

References

  1. S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP (27), 113--120, 1979Google ScholarGoogle Scholar
  2. Y. Ephraim, and D. Malah. Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Trans. on Acoustics, speech and signal processing, 33 (2), 443--445, 1985.Google ScholarGoogle Scholar
  3. P. J. Wolfe, and S. J. Godsill. Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP Journal on Applied Signal Processing, 10, 1043--1051, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Furui, and M. Sondhi. Advances in Speech Signal Processing, Marcel Dekker, Inc., New York, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Takiguchi, S. Nakamura, and K. Shikano. Hands-free speech recognition by HMM composition in noisy reverberant environments. IEICE Trans. D-II, J79-D-II (12), 2047--2053, 1996.Google ScholarGoogle Scholar
  6. S. Nakagawa S. A survey on automatic speech recognition. IEICE Trans. D-II, J83-D-II (2), 433--457, 2000.Google ScholarGoogle Scholar
  7. X. Lu, S. Matsuda, M. Unoki, T. Shimizu, and S. Nakamura. Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition. In ICASSP09, 4573--4576, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Kanedera, T. Arai, H. Hermansky, M. Pavel. On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communication, 28 (1), 43--55, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Liu, R. Stern, X. Huang, and A. Acero. Efficient cepstral normalization for robust speech recognition. In Proceedings of ARPA Human Language Technology Workshop, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Hermansky, N. Morgan and H. G. Hirsch. Recognition of speech in additive and convolutional noise based on RASTA spectral processing. In Proc. ICASSP'93, 83--86, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Miyoshi and Y. Kaneda. Inverse filtering of room acoustics. IEEE Trans. on Acoustics, speech, and signal processing, ASSP (36), 145--152, 1998.Google ScholarGoogle Scholar
  12. M. S. Brandstein and D. B. Ward, Eds. Microphone Arrays: Signal Processing Techniques and Applications, Springer-Verlag, Berlin, 1st edition, 2000.Google ScholarGoogle Scholar
  13. J. B. Allen, D. A. Berkley and J. Blauert. Multi-microphone signal-processing technique to remove room reverberation from speech signals. J. Acoust. Soc. Amer., 62 (4), 912--915, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  14. K. Kinoshita, T. Nakatani, and M. Miyoshi. Spectral subtraction steered by multi-step forward linear prediction for single channel speech dereverberation. In Proc. ICASSP06, I, 817--820, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Nakatani, and M. Miyoshi. Blind dereverberation of single channel speech signal based on harmonic structure. In Proc. ICASSP03, 1, 92--95, 2003.Google ScholarGoogle Scholar
  16. T. Nakatani, M. Miyoshi and K. Kinoshita. Blind dereverberation of monaural speech signals based on harmonic structure. IEICE D-II, J88-D-II (3), 509--520, 2005.Google ScholarGoogle Scholar
  17. M. Unoki, T. Hosorogiya and Y. Ishimoto. Comparative evaluations of robust and accurate F0 estimates in reverberant environments. In Proc. ICASSP08, 4569--4572, 2008.Google ScholarGoogle Scholar
  18. R. Drullman, J. M. Festen, R. Plomp. Effects of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am., 95 (5), 2670--2680, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. V. Shannon, F. Zeng, V. Kamath, J. Wygonski and M. Ekelid. Speech recognition with primarily temporal cues. Science, 270, 303--304, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. P. Chen, J. Bilmes. MVA processing of speech features. IEEE Transactions on Audio, Speech, and Language Processing, 15 (1), 257--270, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Xiao, E. S. Chng, and H. Li. Temporal structure normalization of speech feature for robust speech recognition. IEEE Signal Processing Letters, 14 (7), 500--503, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Xiao, E. S. Chng, and H. Li. Normalization of speech modulation spectra for robust speech recognition. IEEE Trans. on Audio, Speech, and Language Processing, 16 (8), 1662--1674, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Houtgast and H. J. M. Steeneken. The modulation transfer function in room acoustics as a predictor of speech intelligibility. Acustica, 28, 66--73, 1973.Google ScholarGoogle Scholar
  24. M. R. Schroeder. Modulation transfer function: definition and measurement. Acustica, 49, 179--182, 1981.Google ScholarGoogle Scholar
  25. S. Hirobayashi, H. Nomura, T. Koike and M. Tohyama. Speech waveform recovery from a reverberant speech signal using inverse filtering of the power envelope transfer function. IEICE Trans. A, Vol. J81-A, 10, 1323--1330, 1998.Google ScholarGoogle Scholar
  26. S. Hirobayashi and T. Yamabuchi. Validation of blind dereverberation using power envelope inverse filtering and filter banks. IEICE Trans. A, Vol. J83-A, 8, 1029--1033, 2000.Google ScholarGoogle Scholar
  27. M. Unoki, M. Furukawa, K. Sakata and M. Akagi. An improved method based on the MTF concept for restoring the power envelope from a reverberant signal. Acoust. Sci.&Tech., 25 (4), 232--242, 2004.Google ScholarGoogle Scholar
  28. M. Unoki, K. Sakata, M. Furukawa and M. Akagi. A speech dereverberation method based on the MTF concept in power envelope restoration. Acoust. Sci.&Tech., 25 (4), 243--254, 2004.Google ScholarGoogle Scholar
  29. X. Lu, M. Unoki and M. Akagi. A robust feature extraction based on the MTF concept for speech recognition in reverberant environment. In Proc. ICSLP06, 2546--2549, 2006.Google ScholarGoogle Scholar
  30. X. Lu, M. Unoki and M. Akagi. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems. Acoust. Sci.&Tech., 29 (6), 351--361, 2008.Google ScholarGoogle Scholar
  31. J. Neumann, J. R. Gasas, D. Macho, J. R. Hidalgo. Integration of audio-visual sensors and technologies in a smart room. Personal and Ubiquitous Computing, Springer London, ISSN: 1617--4909 (print), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. J. Shannon, and K. K. Paliwal. A comparative study of filter bank spacing for speech recognition. In Microelectronic Engineering Research Conference, 1--3, 2003.Google ScholarGoogle Scholar
  33. http://sp.shinshu-u.ac.jp/CENSREC/, AURORA-2J database.Google ScholarGoogle Scholar
  34. The HTK Book (version 3.2), Cambridge University Engineering Department, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IUCS '09: Proceedings of the 3rd International Universal Communication Symposium
        December 2009
        404 pages
        ISBN:9781605586410
        DOI:10.1145/1667780
        • General Chair:
        • Kazumasa Enami

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 December 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader