Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

Khonglah, Banriskhem K.; Dey, Abhishek; Prasanna, S. R. Mahadeva

doi:10.1007/s00034-018-0873-x

Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

Published: 22 June 2018

Volume 38, pages 643–663, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Banriskhem K. Khonglah¹,
Abhishek Dey² &
S. R. Mahadeva Prasanna³

348 Accesses
4 Citations
Explore all metrics

Abstract

This work explores the significance of source information for speech enhancement resulting in better phoneme recognition of speech with background music segments. Standard procedure for speech enhancement in noisy conditions involves sequential processing in terms of the temporal, spectral and perceptual methods. This work follows the same sequential processing but with the additional modification of studying the effect of source, particularly in the temporal and perceptual-based enhancement techniques for enhancing speech with background music segments. The source information is studied in terms of the epoch locations and epoch strength, obtained after passing the sum of the mean and standard deviation of the component envelopes computed across frequencies obtained using the single frequency filter (SFF), through a zero frequency filter (ZFF). This method of obtaining epoch locations and epoch strength will be termed as SFF-ZFF in this work. The enhanced segments are passed through a phoneme recognizer built using Gaussian mixture model-hidden Markov model (GMM-HMM), subspace Gaussian mixture model-hidden Markov model (SGMM-HMM) and deep neural network-hidden Markov model (DNN-HMM) system, where the models are trained on clean speech. The enhanced audio files show a better phone error rate than the degraded audio files, which means that performing enhancement in terms of the source information is significant for the speech with background music regions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

Peter Ochieng

Effect of Noise in Khasi Speech Recognition System

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Article 22 January 2020

G. Thimmaraja Yadava & H. S. Jayanna

References

G. Aneeja, B. Yegnanarayana, Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 705–717 (2015)
Article Google Scholar
M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’79, vol. 4 (IEEE, 1979), pp. 208–211
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
O. Cappé, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)
Article Google Scholar
K.T. Deepak, S.R.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1204–1218 (2016)
Article Google Scholar
M. Dendrinos, S. Bakamidis, G. Carayannis, Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)
Article Google Scholar
Y. Ephraim, A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. Signal Process. 40(4), 725–735 (1992)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
Y. Ephraim, H.L. Van Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)
Article Google Scholar
G. Fant, Speech Sounds and Features (The MIT Press, Cambridge, MA, 1973)
Google Scholar
S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sorensen, Reduction of broad-band noise in speech by truncated QSVD. IEEE Trans. Speech Audio Process. 3(6), 439–448 (1995)
Article MATH Google Scholar
S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)
Article Google Scholar
S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4. (Citeseer, 2002), pp. 4164–4164
B.K. Khonglah, S.R.M. Prasanna, Speech/music classification using speech-specific features. Digit. Signal Process. 48, 71–83 (2016)
Article MathSciNet Google Scholar
P. Krishnamoorthy, S.R.M. Prasanna, Reverberant speech enhancement by temporal and spectral processing. IEEE Trans. Audio Speech Lang. Process. 17(2), 253–266 (2009)
Article Google Scholar
P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)
Article Google Scholar
R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005)
Article Google Scholar
R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)
Article Google Scholar
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al., The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, EPFL-CONF-192584 (IEEE Signal Processing Society, 2011)
D. Povey, X. Zhang, S. Khudanpur, Parallel training of dnns with natural gradient and parameter averaging. arXiv preprint arXiv:1410.7455 (2014)
S. Shahnawazuddin, D. Thotappa, A. Dey, S. Imani, S.R.M. Prasanna, R. Sinha, Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. J. Signal Process. Syst. 88, 91–102 (2016)
Article Google Scholar
K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis—a unified approach to speech spectral estimation. In: ICSLP (1994)
S.V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction (Wiley, New York, 2008)
Book Google Scholar
K. Veselỳ, A. Ghoshal, L. Burget, D. Povey, Sequence-discriminative training of deep neural networks. In: Interspeech (2013), pp. 2345–2349
C.M. Vikram, S.M. Mahadeva Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. (ASLP) 25(3), 624–636 (2017)
Article Google Scholar
D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
Article Google Scholar
B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Commun. 28(1), 25–42 (1999)
Article Google Scholar

Download references

Acknowledgements

This work is part of the project titled Multi-modal Broadcast Analytics: Structured Evidence Visualization for Events of Security Concern funded by the e-Security division of the Department of Electronics and Information Technology (DeitY), Govt. of India.

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Banriskhem K. Khonglah
Department of Electronics and Communication Engineering, Gauhati University Institute of Science and Technology (GUIST), Guwahati, 781014, India
Abhishek Dey
Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, 580011, India
S. R. Mahadeva Prasanna

Authors

Banriskhem K. Khonglah
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Dey
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Banriskhem K. Khonglah.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khonglah, B.K., Dey, A. & Prasanna, S.R.M. Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music. Circuits Syst Signal Process 38, 643–663 (2019). https://doi.org/10.1007/s00034-018-0873-x

Download citation

Received: 19 July 2017
Revised: 02 June 2018
Accepted: 05 June 2018
Published: 22 June 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s00034-018-0873-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

Abstract

Access this article

Similar content being viewed by others

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Effect of Noise in Khasi Speech Recognition System

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

Abstract

Access this article

Similar content being viewed by others

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Effect of Noise in Khasi Speech Recognition System

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation