An Adaptive Method for Robust Detection of Vowels in Noisy Environment

Kumar, Avinash; Pradhan, Gayadhar; Shahnawazuddin, S.

doi:10.1007/s00034-019-01052-x

An Adaptive Method for Robust Detection of Vowels in Noisy Environment

Published: 08 February 2019

Volume 38, pages 4180–4201, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Avinash Kumar¹,
Gayadhar Pradhan¹ &
S. Shahnawazuddin¹

351 Accesses
2 Citations
Explore all metrics

Abstract

Automatic detection of vowels plays a significant role in the analysis and synthesis of speech signal. Detecting vowels within a speech utterance in noisy environment and varied contexts is a very challenging task. In this work, a robust technique based on non-local means (NLM) estimation is proposed for the detection of vowels in noisy speech signals. In the NLM algorithm, the signal value at each sample point is estimated as the weighted sum of signal values at other sample points within a search neighborhood. The weight value is computed by finding square of the difference between the signal values belonging to two different segments. During the estimation, one segment is kept as fixed, while other segment is slid over the search neighborhood. For any particular sample point, the sum of those weight values is significantly less when the segments under consideration are higher in magnitude. In a given speech signal, the vowels are regions of high energy. This will be true even under noisy conditions. In this work, the sum of weight values (SWV), computed at each time instant is used as a discriminating feature for detecting the vowels in a given speech signal. In the proposed approach, the regions where the SWV exhibits significant transitions and attain lower values for a considerable duration of time compared to the preceding and succeeding regions are hypothesized as the vowels. This hypothesis is statistically validated for detecting vowels under clean as well as noisy test conditions. For proper comparison, a three-class statistical classifier (vowel, non-vowel and silence) is developed for detecting the vowels in a given speech signal. For developing the said classifier, the mel-frequency cepstral coefficients are used as the acoustic feature vectors, while deep neural network (DNN)-hidden Markov model (HMM) is employed for acoustic modeling. The proposed vowel detection method is observed to outperform the DNN-HMM-based statistical classifiers as well as existing signal processing approaches under both clean and noisy test conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Article 01 August 2017

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Article 08 September 2016

References

N. Adiga, S.M. Prasanna, A hybrid text-to-speech synthesis using vowel and non vowel like regions, in Proceedings of the INDICON, pp. 1–5 (2014)
T. Anastasakos, J. Mcdonough, R. Schwartz, J. Makhoul, A compact model for speaker-adaptive training. Proc. Int. Conf. Spok. Lang. Process 2, 1137–1140 (1996)
Google Scholar
A. Buades, B. Coll, J.M. Morel, A non-local algorithm for image denoising, in Computer Vision and Pattern Recognition, pp. 60–65 (2005)
A. Buades, B. Coll, J.M. Morel, A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
Article MathSciNet MATH Google Scholar
G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process 20(1), 30–42 (2012)
Article Google Scholar
S.B. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process ASSP–28(4), 357–366 (1980)
Article Google Scholar
K.T. Deepak, S.R.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process 24(7), 1205–1219 (2016)
Article Google Scholar
C.A. Deledalle, V. Duval, J. Salmon, Non-local methods with shape-adaptive patches (nlm-sap). J. Math. Imaging Vis. 43(2), 103–120 (2012)
Article MathSciNet MATH Google Scholar
V. Digalakis, D. Rtischev, L. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Audio Speech Lang. Process 3(5), 357–366 (1995)
Article Google Scholar
N. Fakotakis, A. Tsopanoglou, G. Kokkinakis, A text-independent speaker recognition system based on vowel spotting. Speech Commun. 12(1), 57–68 (1993)
Article Google Scholar
M.J.F. Gales, Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process 7(3), 272–281 (1999)
Article Google Scholar
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1, vol. 33 (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87(2), 866–873 (1990)
Article Google Scholar
Q. Jin, A. Waibel, Application of LDA to speaker recognition, in Proceedings of the Interspeech, pp. 250–253 (2000)
A. Kumar, S. Shahnawazuddin, G. Pradhan, Exploring different acoustic modeling techniques for the detection of vowels in speech signal, in Proceedings of the National Conference on Communication (NCC), pp. 1–5 (2016)
A. Kumar, S. Shahnawazuddin, G. Pradhan, Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst. Signal Process 36, 1–26 (2016)
MathSciNet Google Scholar
A. Kumar, S. Shahnawazuddin, G. Pradhan, Non-local estimation of speech signal for vowel onset point detection in varied environments, in Proceedings of the Interspeech, Stockholm, Sweden, pp. 429–433 (2017)
T. Lotter, P. Vary, Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process 2005, 1110–1126 (2005)
MATH Google Scholar
Y. Lu, P.C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating snr uncertainty. IEEE Trans. Audio Speech Lang. Process 19(5), 1123–1137 (2011)
Article Google Scholar
V.K. Mittal, B. Yegnanarayana, Effect of glottal dynamics in the production of shouted speech. J. Acoust. Soc. Am. 133(5), 3050–3061 (2013)
Article Google Scholar
V.K. Mittal, B. Yegnanarayana, Study of changes in glottal vibration characteristics during laughter, in INTERSPEECH (2014)
V.K. Mittal, B. Yegnanarayana, P. Bhaskararao, Study of the effects of vocal tract constriction on glottal vibration. J. Acoust. Soc. Am. 136(4), 1932–1941 (2014)
Article Google Scholar
D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R.C. Rose, P. Schwarz, S. Thomas, The subspace Gaussian mixture model—a structured model for speech recognition. Comput. Speech Lang. 25(2), 404–439 (2011)
Article Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, K. Vesely, The kaldi speech recognition toolkit, in Workshop on Automatic Speech Recognition and Understanding (2011)
G. Pradhan, B. Haris, S. Prasanna, R. Sinha, Speaker verification in sensor and acoustic environment mismatch conditions. Int. J. Speech Technol. 15(3), 381–392 (2012)
Article Google Scholar
G. Pradhan, A. Kumar, S. Shahnawazuddin, Excitation source features for improving the detection of vowel onset and offset points in a speech sequence, in Proceedings of the Interspeech 2017, pp. 1884–1888 (2017)
G. Pradhan, S.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process 21(4), 854–867 (2013)
Article Google Scholar
S.M. Prasanna, G. Pradhan, Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans. Audio Speech Lang. Process 19(8), 2552–2565 (2011)
Article Google Scholar
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process 17(4), 556–565 (2009)
Article Google Scholar
S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proceedings of the Interspeech, pp. 1133–1136 (2005)
J. Rao, C.C. Sekhar, B. Yegnanarayana, Neural network based approach for detection of vowel onset points, in Proceedings of the International Conference on Advanced Pattern Recognition Digital Technologies, vol. 1, pp. 316–320 (1999)
K.S. Rao, A.K. Vuppala, Speaker Identification and Time Scale Modification Using VOPs (Springer, Berlin, 2014)
Book Google Scholar
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51(12), 1263–1269 (2009)
Article Google Scholar
S.P. Rath, D. Povey, K. Veselý, J. Cernocký, Improved feature process for deep neural networks, in Proceedings of the Interspeech (2013)
B.S. Reddy, K.V. Rao, S.M. Prasanna, Keyword spotting using vowel onset point, vector quantization and hidden Markov modeling based techniques, in Proceedings of the TENCON, pp. 1–4 (2008)
B. Sarma, S. Prajwal, S.M. Prasanna, Improved vowel onset and offset points detection using bessel features, in Proceedings of the International Conference on Signal Processing and Communication, pp. 1–6 (2014)
P. Singh, G. Pradhan, Exploring the non-local similarity present in variational mode functions for effective ecg denoising, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 861–865. IEEE (2018)
P. Singh, G. Pradhan, Variational mode decomposition based ecg denoising using non-local means and wavelet domain filtering. Australas. Phys. Eng. Sci. Med. 41, 1–14 (2018)
Article Google Scholar
P. Singh, S. Shahnawazuddin, G. Pradhan, An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst. Signal Process. 37(10), 4527–4547 (2018)
Article Google Scholar
N. Srinivas, G. Pradhan, S. Shahnawazuddin, Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018, 1156–1160 (2018)
Article Google Scholar
K.N. Stevens, Acoustic Phonetics (The MIT Press Cambridge, London, 2000)
Google Scholar
B.H. Tracey, E.L. Miller, Nonlocal means denoising of ecg signals. IEEE Trans. Biomed. Eng. 59(9), 2383–2386 (2012)
Article Google Scholar
D. Van De Ville, M. Kocher, Sure-based non-local means. IEEE Signal Process. Lett. 16(11), 973–976 (2009)
Article Google Scholar
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
A. Vuppala, J. Yadav, S. Chakrabarti, K.S. Rao, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)
Article Google Scholar
A.K. Vuppala, K.S. Rao, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Technol. 16(2), 229–235 (2013)
Article Google Scholar
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU Int. J. Electron. Commun. 66(8), 697–700 (2012)
Article Google Scholar
H.K. Vydana, S.R. Kadiri, A.K. Vuppala, Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 35(5), 1643–1663 (2016)
Article Google Scholar
J. Wang, C. Hu, S. Hung, J. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)
Article Google Scholar
J.H. Wang, S.H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proceedings of the International Conference on Acoustics, Speech, Signal Process., vol. 1, pp. 417–420 (1999)
P.J. Wolfe, S.J. Godsill, Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement, in Signal Processing Workshop on Statistical Signal Processing, pp. 496–499 (2001)
J. Yadav, K.S. Rao, Detection of vowel offset point from speech signal. IEEE Signal Process. Lett. 20(4), 299–302 (2013)
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their careful reading of our manuscript and their insightful comments and suggestions that have greatly improved the quality of the manuscript.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Patna, Patna, India
Avinash Kumar, Gayadhar Pradhan & S. Shahnawazuddin

Authors

Avinash Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Gayadhar Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
S. Shahnawazuddin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gayadhar Pradhan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A., Pradhan, G. & Shahnawazuddin, S. An Adaptive Method for Robust Detection of Vowels in Noisy Environment. Circuits Syst Signal Process 38, 4180–4201 (2019). https://doi.org/10.1007/s00034-019-01052-x

Download citation

Received: 14 December 2017
Revised: 28 January 2019
Accepted: 30 January 2019
Published: 08 February 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s00034-019-01052-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Method for Robust Detection of Vowels in Noisy Environment

Abstract

Access this article

Similar content being viewed by others

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Adaptive Method for Robust Detection of Vowels in Noisy Environment

Abstract

Access this article

Similar content being viewed by others

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation