Real-time pre-processing for improved feature extraction of noisy speech

Raj, P. P.

doi:10.1007/s10772-021-09835-x

Real-time pre-processing for improved feature extraction of noisy speech

Published: 26 March 2021

Volume 24, pages 715–728, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

P. P. Raj ORCID: orcid.org/0000-0002-9084-2513¹

258 Accesses
1 Citation
Explore all metrics

Abstract

Several improvements of algorithms for the front-end feature extraction of real-time speech decoding in noisy ambiance have been proposed with their demonstration on the TIMIT speech corpus. Real-Time Voice Activity Detection (RT-VAD) is used to separate the voiced–unvoiced part of input from silence in the streaming speech input. Novel techniques for RT-Zero Crossing Detection and RT-Pitch Detection are presented as part of RT-VAD. Real-Time approximate Kalman filter is then applied to de-noise the incoming signal. All these are applied across a collection of frames of speech called context. Frame-based Linear Discriminant Analysis (LDA)-feature extraction is done by RT-Cepstral Mean and Variance Normalization (RT-CMVN) and RT-Splicing. The algorithms are tested on the TIMIT database for various noise levels. It is observed that we obtain a word-error rate (WER) improvement of 5% for 30 dB and 7% for 10 dB SNR, thus validating the proposed algorithms. Also, the comparison with other works shows a superior Speech Hit Rate (SHR) of 90.6% and Noise Hit Rate (NHR) of 86.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Temporal Feature Selection for Noisy Speech Recognition

Speech Endpoint Detection Based on Improvement Feature and S-Transform

Data availability

Open source TIMIT data is mainly used which is cited appropriately.

Code availability

Custom codes are used. The codes will be made available if required.

References

Abbasian, H., Nasersharif, B., Akbari, A., Rahmani, M., & Moin, M. (2008). Optimized linear discriminant analysis for extracting robust speech features. In 2008 3rd International symposium on communications, control and signal processing (pp. 819–824). IEEE.
ANSI. (1994). American National Standard Acoustical Terminology. In ANSI S1, 1994 (Vol. 1).
Bachu, R., Kopparthi, S., Adapa, B., & Barkana, B. (2008). Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In American Society for Engineering Education (ASEE) zone conference proceedings (pp. 1–7).
Bocklet, T., & Marek, A. (2018). Cepstral variance normalization for audio feature extraction. US Patent App. 15/528,068.
Das, O. (2016). Kalman filter in speech enhancement. Thesis, Jadavpur University.
Das, O., Goswami, B., & Ghosh, R. (2016). Application of the tuned Kalman filter in speech enhancement. In 2016 IEEE first international conference on control, measurement and instrumentation (CMI) (pp. 62–66). IEEE.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Delaney, B., Jayant, N., Hans, M., Simunic, T., & Acquaviva, A. (2002). A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system. In 2002 IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-793). IEEE.
Dionelis, N., & Brookes, M. (2018). Speech enhancement using Kalman filtering in the logarithmic bark power spectral domain. In 2018 26th European signal processing conference (EUSIPCO) (pp. 1642–1646). IEEE.
Erdogan, H. (2005). Regularizing linear discriminant analysis for speech recognition. In Ninth European conference on speech communication and technology.
Estevez, P., Becerra-Yoma, N., Boric, N., & Ramırez, J. (2005). Genetic programming-based voice activity detection. Electronics Letters, 41(20), 1141–1143.
Article Google Scholar
Fujimoto, M., & Ariki, Y. (2000). Noisy speech recognition using noise reduction method based on Kalman filter. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1727–1730). IEEE.
Gabrea, M. (2003). Kalman filter based single microphone noise canceller. In International workshop on acoustic echo and noise control.
Garofolo, J., Lamel, L., Fisher, W., Fiscus, D. J., Dahlgren, N., & Zue, V. (1993). TIMIT acoustic–phonetic continuous speech corpus ldc93s1. Web. Download.
Ghahabi, O., Zhou, W., & Fischer, V. (2018). A robust voice activity detection for real-time automatic speech recognition. In Proceedings of ESSV.
Goh, Y. H., Raveendran, P., & Goh, Y. L. (2015). Robust speech recognition system using bidirectional Kalman filter. IET Signal Processing, 9(6), 491–497.
Article Google Scholar
Guo, J., Sainath, T. N., & Weiss, R. J. (2019). A spelling correction model for end-to-end speech recognition. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5651–5655). IEEE.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35–45.
Article MathSciNet Google Scholar
Lacey, T. (1997). The Kalman filter. Tutorial—From the notes of the CS7322 course at Georgia Tech—Notes taken from the TINA Algortihms’ Guide by N Thacker, Electronic Systems Group, University of Sheffield.
Lin, Z. Q., Chung, A. G., & Wong, A. (2018). Edgespeechnets: Highly efficient deep neural networks for speech recognition on the edge. arXiv preprint arXiv:181008559.
MATLAB. (2019). Version 9.7.0.1471314 (R2019b) Update 7. The MathWorks, Inc.
Mathe, M., Nandyala, S. P., & Kumar, T. K. (2012). Speech enhancement using Kalman filter for white, random and color noise. In 2012 International conference on devices, circuits and systems (ICDCS) (pp. 195–198). IEEE.
Meoni, G., Pilato, L., & Fanucci, L. (2018). A low power voice activity detector for portable applications. In 2018 14th conference on Ph.D. research in microelectronics and electronics (PRIME) (pp. 41–44). IEEE.
Moattar, M. H., & Homayounpour, M. M. (2009). A simple but efficient real-time voice activity detection algorithm. In 2009 17th European signal processing conference (pp. 2549–2553). IEEE.
Mohan, M. S., Naik, N., Gemson, R., & Ananthasayanam, M. (2015). Introduction to the Kalman filter and tuning its statistics for near optimal estimates and Cramer Rao bound. arXiv preprint arXiv:150304313.
Nguyen, T. S., Sperber, M., Stüker, S., & Waibel, A. (2018). Building real-time speech recognition without CMVN. In International conference on speech and computer (pp. 451–460). Springer.
Paliwal, K., & Basu, A. (1987). A speech enhancement method based on Kalman filtering. In ICASSP’87. IEEE international conference on acoustics, speech, and signal processing (Vol. 12, pp. 177–180). IEEE.
Perkins, K., & Meeker, M. (2017). Internet trends 2017. https://www.slideshare.net/kleinerperkins/internet-trends-2017-report.
Price, M., Chandrakasan, A., & Glass, J. R. (2016). Memory-efficient modeling and search techniques for hardware ASR decoders. In Interspeech (pp. 1893–1897).
Price, M., Glass, J., & Chandrakasan, A. P. (2017). A low-power speech recognizer and voice activity detector using deep neural networks. IEEE Journal of Solid-State Circuits, 53(1), 66–75.
Article Google Scholar
Pujol, P., Macho, D., & Nadeu, C. (2006). On real-time mean-and-variance normalization of speech recognition features. In 2006 IEEE international conference on acoustics, speech and signal processing Proceedings (Vol. 1, p. I-I). IEEE.
Rath, S. P., Povey, D., Veselỳ, K., & Cernockỳ, J. (2013). Improved feature processing for deep neural networks. In Interspeech (pp. 109–113).
Rosen, S. M., Fourcin, A., & Moore, B. C. (1981). Voice pitch as an aid to lipreading. Nature, 291(5811), 150.
Article Google Scholar
Sárosi, G., Mozsáry, M., Mihajlik, P., & Fegyó, T. (2011). Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment. In 2011 6th Conference on speech technology and human–computer dialogue (SpeD) (pp. 1–8). IEEE.
Sehgal, A., Saki, F., & Kehtarnavaz, N. (2017). Real-time implementation of voice activity detector on arm embedded processor of smartphones. In 2017 IEEE 26th international symposium on industrial electronics (ISIE) (pp. 1285–1290). IEEE.
Sharma, N., & Sardana, S. (2016). A real time speech to text conversion system using bidirectional Kalman filter in MATLAB. In 2016 International conference on advances in computing, communications and informatics (ICACCI) (pp. 2353–2357). IEEE.
So, S., & Paliwal, K. K. (2011). Suppressing the influence of additive noise on the Kalman gain for low residual noise speech enhancement. Journal of Speech Communication, 53(3), 355–378.
Article Google Scholar
Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.
Article Google Scholar
Verteletskaya, E., & Sakhnov, K. (2010). Voice activity detection for speech enhancement applications. Acta Polytechnica,. https://doi.org/10.14311/1251.
Article Google Scholar
Yang, X., Tan, B., Ding, J., Zhang, J., & Gong, J. (2010, June). Comparative study on voice activity detection algorithm. In International conference on electrical and control engineering (pp. 599–602). IEEE.
Ying, D., Yan, Y., Dang, J., & Soong, F. K. (2011). Voice activity detection based on an unsupervised learning framework. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2624–2633.
Article Google Scholar

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Department of Electronics & Communication Engineering, National Institute of Technology, Tiruchchirappalli, Tamil Nadu, India
P. P. Raj

Authors

P. P. Raj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. P. Raj.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raj, P.P. Real-time pre-processing for improved feature extraction of noisy speech. Int J Speech Technol 24, 715–728 (2021). https://doi.org/10.1007/s10772-021-09835-x

Download citation

Received: 08 August 2020
Accepted: 30 January 2021
Published: 26 March 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10772-021-09835-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time pre-processing for improved feature extraction of noisy speech

Abstract

Access this article

Similar content being viewed by others

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Temporal Feature Selection for Noisy Speech Recognition

Speech Endpoint Detection Based on Improvement Feature and S-Transform

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time pre-processing for improved feature extraction of noisy speech

Abstract

Access this article

Similar content being viewed by others

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Temporal Feature Selection for Noisy Speech Recognition

Speech Endpoint Detection Based on Improvement Feature and S-Transform

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation