Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum

Najnin, Shamima; Shahnaz, Celia

doi:10.1007/s00034-016-0298-3

Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum

Published: 23 March 2016

Volume 36, pages 181–201, (2017)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Shamima Najnin¹ &
Celia Shahnaz²

278 Accesses
Explore all metrics

Abstract

In this paper, a method based on cepstra derived from the differential product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band-limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. The product spectrum is then differentiated with respect to frequency to obtain differential product spectrum (DPrS) that is argued to provide more noise robustness in the presence of noise. Unlike conventional mel-frequency cepstral coefficient (MFCC), MFCCs computed from the differential product spectrum, namely MFDPrSCCs, are fed to a linear discriminant analysis-based classifier for the detection and classification of nasalized vowels. Detailed simulation results on TIMIT database show that the proposed cepstral features derived from the differential product spectrum are capable of outperforming the cepstral features derived from the conventional power spectrum in the task of detecting and classifying nasalized vowel not only in clean condition but also in different noisy condition with varying signal to noise ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Mohammed Jawad Al-Dujaili & Abbas Ebrahimi-Moghadam

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Mahendra Kumar Gourisaria, Rakshit Agrawal, … Pradeep Kumar Singh

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Qiang Li, Qianyu Mai, … Mingjuan Ma

References

P.S. Beddor, The perception of nasal vowels, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 171–196
Chapter Google Scholar
F. Bell-Berti, Understanding velic motor control: studies of segmental context, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 63–85
Chapter Google Scholar
D.A. Cairns, J. Hansen, J. Riski, A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans. Biomed. Eng. 43(1), 35 (1996). doi:10.1109/10.477699
Article Google Scholar
J. Chen, K.K. Paliwal, S. Nakamura, Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Commun. 41, 469–484 (2003)
Article Google Scholar
M.Y. Chen, Acoustic parameters of nasalized vowels in hearing-impaired and normal-hearing speakers. J. Acoust. Soc. Am. 98(5), 2443–2453 (1995)
Article Google Scholar
M.Y. Chen, Acoustic correlates of English and French nasalized vowels. J. Acoust. Soc. Am. 102(4), 2360–2370 (1997)
Article Google Scholar
N.F. Chen, J.L. Slifka, K.N. Stevens, Vowel nasalization im american english: Acoustic variability due to phonetic context. In: Proceedings of International Congress of Phonetic Sciences, pp. 905–908 (2007)
K. Daqrouq, T.A. Tutunji, Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl. Soft. Comput. 27, 231–239 (2015)
Article Google Scholar
L. Deng, A. Acero, I. Bazzi, Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint. IEEE Trans. Audio Speech Lang. Process. 14(2), 425–434 (2006)
Article Google Scholar
G. Fant, Acoustic Theory of Speech Production, 2nd edn. (Mouton, The Hague, 1960)
Google Scholar
J.R. Glass, V.W. Zue, Detection of nasalized vowels in american English. In: Proceedings of IEEE International Conference of Acoustic, Speech, and Signal Processing, pp. 1569–1572 (1985)
M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, S. Mohan, et al., Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. 1213. NIH Public Access (2005)
S. Hawkins, K.N. Stevens, Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. J. Acoust. Soc. Am. 77(4), 1560–1575 (1985)
Article Google Scholar
L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, Y. Huang, Automatic evaluation of hypernasality based on a cleft palate speech database. J. Med. Syst. 39(5), 1–7 (2015)
Article Google Scholar
R.M. Hegde, H.A. Murthy, V.R.R. Gadde, Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)
Y. Horii, An accelerometric measure as a physical correlate of perceived hypernasality in speech. J. Speech Lang. Hear. Res. 26(3), 476–480 (1983)
Article Google Scholar
A. Kanagasundaram, D. Dean, R. Vogt, M. McLaren, S. Sridharan, M. Mason, Weighted lda techniques for i-vector based speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4781–4784. IEEE (2012)
H.K. Kim, R.C. Rose, Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for asr in noisy environments. IEEE Trans. Audio Speech Lang. Process. 11(5), 435–446 (2003)
Article Google Scholar
R. Krakow, Nonsegmental influences on velum movement patterns: syllables, sentences, stress and speaking rate, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 87–116
Chapter Google Scholar
I. Maddieson, Patterns of Sounds (Cambridge University Press, Cambridge, 1984)
Book Google Scholar
B.B. Monson, A.J. Lotto, B.H. Story, Detection of high-frequency energy level changes in speech and singing. J. Acoust. Soc. Am. 135(1), 400–406 (2014)
Article Google Scholar
H. Murthy, V. Gadde, et al., The modified group delay function and its application to phoneme recognition. In: Proceedings of ICASSP’03, vol. 1, pp. I–68, IEEE (2003)
S. Najnin, B. Banerjee, Improved speech inversion using general regression neural network. J. Acoust. Soc. Am. 138(3), EL229–EL235 (2015)
Article Google Scholar
S. Najnin, C. Shahnaz, A detection and classification method for nasalized vowels in noise using product spectrum based cepstra. Int. J. Speech Technol. 18(1), 97–111 (2015)
Article Google Scholar
A.V. Oppenheim, R.W. Schafer, From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. 21(5), 95–106 (2004)
Article Google Scholar
D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd edn. (Universities Presss, New York, 2000)
MATH Google Scholar
J.R. Orozco-Arroyave, J. Vargas-Bonilla, J.D. Arias-Londoño, S. Murillo-Rendón, G. Castellanos-Domínguez, J. Garcés, Nonlinear dynamics for hypernasality detection in Spanish vowels and words. Cogn. Comput. 5(4), 448–457 (2013)
Article Google Scholar
V.K. Prasad, T. Nagarajan, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42(3), 429–446 (2004)
Article Google Scholar
T. Pruthi, Analysis, vocal-tract modeling, and automatic detection of vowel nasalization. Ph.D. thesis, University of Maryland (2007)
T. Pruthi, C.Y. Espy-Wilson, Acoustic parameters for automatic detection of nasal manner. Speech Commun. 43(3), 225–239 (2004)
Article Google Scholar
M.A. Redenbaugh, A.R. Reich, Correspondence between an accelerometric nasal/voice amplitude ratio and listeners’ direct magnitude estimations of hypernasality. J. Speech. Lang. Hear. Res. 28(2), 273–281 (1985)
Article Google Scholar
E.J. Seaver, R.M. Dalston, H.A. Leeper, L.E. Adams, A study of nasometric values for normal nasal resonance. J. Speech Lang. Hear. Res. 34(4), 715–721 (1991)
Article Google Scholar
C. Shahnaz, S. Najnin, S.A. Fattah, W.P. Zhu, M.O. Ahmad, A detection method of nasalised vowels based on an acoustic parameter derived from phase spectrum. In: IEEE International Symposium on Circuits and Systems, pp. 297–300. IEEE (2013)
E.B. Thorp, B.T. Virnik, C.E. Stepp, Comparison of nasal acceleration and nasalance across vowels. J. Speech Lang. Hear. Res. 56(5), 1476–1484 (2013)
Article Google Scholar
TIMIT, Timit acoustic-phonetic continuous speech corpus, national institute of standards and technology speech disc 1-1.1, ntis order no. pb91-5050651996 (1990)
W. Verhelst, O. Steenhaut, A new model for the short-time complex cepstrum of voiced speech. IEEE Trans. Audio Speech Lang. Process. 34(1), 43–51 (1986)
Article Google Scholar
P. Vijayalakshmi, M.R. Reddy, D. O’Shaughnessy, Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)
Article Google Scholar
B. Yegnanarayana, D. Saikia, T. Krishnan, Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Audio Speech Lang. Process. 32(3), 610–623 (1984). doi:10.1109/TASSP.1984.1164365
Article Google Scholar
S. Young, A review of large-vocabulary continuous-speech. IEEE Signal Process. Mag. 13(5), 45 (1996)
Article Google Scholar
J. Yuan, M. Liberman, Automatic measurement and comparison of vowel nasalization across languages. In: Proceedings of the 17th International Congress of Phonetic Sciences (2011)
J. Yuan, A. Seidl, A. Cristiá, Automatic detection and comparison of vowel nasalization in American English. J. Acoust. Soc. Am. 128(4), 2291–2291 (2010)
Article Google Scholar
D. Zhu, K.K. Paliwal, Product of power spectrum and group delay function for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I–125. IEEE (2004)

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Memphis, Memphis, TN, USA
Shamima Najnin
Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
Celia Shahnaz

Authors

Shamima Najnin
View author publications
You can also search for this author in PubMed Google Scholar
Celia Shahnaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shamima Najnin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Najnin, S., Shahnaz, C. Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum. Circuits Syst Signal Process 36, 181–201 (2017). https://doi.org/10.1007/s00034-016-0298-3

Download citation

Received: 16 April 2015
Revised: 04 March 2016
Accepted: 07 March 2016
Published: 23 March 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00034-016-0298-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation