Abstract
In this paper, a method based on cepstra derived from the differential product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band-limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. The product spectrum is then differentiated with respect to frequency to obtain differential product spectrum (DPrS) that is argued to provide more noise robustness in the presence of noise. Unlike conventional mel-frequency cepstral coefficient (MFCC), MFCCs computed from the differential product spectrum, namely MFDPrSCCs, are fed to a linear discriminant analysis-based classifier for the detection and classification of nasalized vowels. Detailed simulation results on TIMIT database show that the proposed cepstral features derived from the differential product spectrum are capable of outperforming the cepstral features derived from the conventional power spectrum in the task of detecting and classifying nasalized vowel not only in clean condition but also in different noisy condition with varying signal to noise ratio.
Similar content being viewed by others
References
P.S. Beddor, The perception of nasal vowels, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 171–196
F. Bell-Berti, Understanding velic motor control: studies of segmental context, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 63–85
D.A. Cairns, J. Hansen, J. Riski, A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans. Biomed. Eng. 43(1), 35 (1996). doi:10.1109/10.477699
J. Chen, K.K. Paliwal, S. Nakamura, Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Commun. 41, 469–484 (2003)
M.Y. Chen, Acoustic parameters of nasalized vowels in hearing-impaired and normal-hearing speakers. J. Acoust. Soc. Am. 98(5), 2443–2453 (1995)
M.Y. Chen, Acoustic correlates of English and French nasalized vowels. J. Acoust. Soc. Am. 102(4), 2360–2370 (1997)
N.F. Chen, J.L. Slifka, K.N. Stevens, Vowel nasalization im american english: Acoustic variability due to phonetic context. In: Proceedings of International Congress of Phonetic Sciences, pp. 905–908 (2007)
K. Daqrouq, T.A. Tutunji, Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl. Soft. Comput. 27, 231–239 (2015)
L. Deng, A. Acero, I. Bazzi, Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint. IEEE Trans. Audio Speech Lang. Process. 14(2), 425–434 (2006)
G. Fant, Acoustic Theory of Speech Production, 2nd edn. (Mouton, The Hague, 1960)
J.R. Glass, V.W. Zue, Detection of nasalized vowels in american English. In: Proceedings of IEEE International Conference of Acoustic, Speech, and Signal Processing, pp. 1569–1572 (1985)
M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, S. Mohan, et al., Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. 1213. NIH Public Access (2005)
S. Hawkins, K.N. Stevens, Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. J. Acoust. Soc. Am. 77(4), 1560–1575 (1985)
L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, Y. Huang, Automatic evaluation of hypernasality based on a cleft palate speech database. J. Med. Syst. 39(5), 1–7 (2015)
R.M. Hegde, H.A. Murthy, V.R.R. Gadde, Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)
Y. Horii, An accelerometric measure as a physical correlate of perceived hypernasality in speech. J. Speech Lang. Hear. Res. 26(3), 476–480 (1983)
A. Kanagasundaram, D. Dean, R. Vogt, M. McLaren, S. Sridharan, M. Mason, Weighted lda techniques for i-vector based speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4781–4784. IEEE (2012)
H.K. Kim, R.C. Rose, Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for asr in noisy environments. IEEE Trans. Audio Speech Lang. Process. 11(5), 435–446 (2003)
R. Krakow, Nonsegmental influences on velum movement patterns: syllables, sentences, stress and speaking rate, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 87–116
I. Maddieson, Patterns of Sounds (Cambridge University Press, Cambridge, 1984)
B.B. Monson, A.J. Lotto, B.H. Story, Detection of high-frequency energy level changes in speech and singing. J. Acoust. Soc. Am. 135(1), 400–406 (2014)
H. Murthy, V. Gadde, et al., The modified group delay function and its application to phoneme recognition. In: Proceedings of ICASSP’03, vol. 1, pp. I–68, IEEE (2003)
S. Najnin, B. Banerjee, Improved speech inversion using general regression neural network. J. Acoust. Soc. Am. 138(3), EL229–EL235 (2015)
S. Najnin, C. Shahnaz, A detection and classification method for nasalized vowels in noise using product spectrum based cepstra. Int. J. Speech Technol. 18(1), 97–111 (2015)
A.V. Oppenheim, R.W. Schafer, From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. 21(5), 95–106 (2004)
D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd edn. (Universities Presss, New York, 2000)
J.R. Orozco-Arroyave, J. Vargas-Bonilla, J.D. Arias-Londoño, S. Murillo-Rendón, G. Castellanos-Domínguez, J. Garcés, Nonlinear dynamics for hypernasality detection in Spanish vowels and words. Cogn. Comput. 5(4), 448–457 (2013)
V.K. Prasad, T. Nagarajan, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42(3), 429–446 (2004)
T. Pruthi, Analysis, vocal-tract modeling, and automatic detection of vowel nasalization. Ph.D. thesis, University of Maryland (2007)
T. Pruthi, C.Y. Espy-Wilson, Acoustic parameters for automatic detection of nasal manner. Speech Commun. 43(3), 225–239 (2004)
M.A. Redenbaugh, A.R. Reich, Correspondence between an accelerometric nasal/voice amplitude ratio and listeners’ direct magnitude estimations of hypernasality. J. Speech. Lang. Hear. Res. 28(2), 273–281 (1985)
E.J. Seaver, R.M. Dalston, H.A. Leeper, L.E. Adams, A study of nasometric values for normal nasal resonance. J. Speech Lang. Hear. Res. 34(4), 715–721 (1991)
C. Shahnaz, S. Najnin, S.A. Fattah, W.P. Zhu, M.O. Ahmad, A detection method of nasalised vowels based on an acoustic parameter derived from phase spectrum. In: IEEE International Symposium on Circuits and Systems, pp. 297–300. IEEE (2013)
E.B. Thorp, B.T. Virnik, C.E. Stepp, Comparison of nasal acceleration and nasalance across vowels. J. Speech Lang. Hear. Res. 56(5), 1476–1484 (2013)
TIMIT, Timit acoustic-phonetic continuous speech corpus, national institute of standards and technology speech disc 1-1.1, ntis order no. pb91-5050651996 (1990)
W. Verhelst, O. Steenhaut, A new model for the short-time complex cepstrum of voiced speech. IEEE Trans. Audio Speech Lang. Process. 34(1), 43–51 (1986)
P. Vijayalakshmi, M.R. Reddy, D. O’Shaughnessy, Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)
B. Yegnanarayana, D. Saikia, T. Krishnan, Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Audio Speech Lang. Process. 32(3), 610–623 (1984). doi:10.1109/TASSP.1984.1164365
S. Young, A review of large-vocabulary continuous-speech. IEEE Signal Process. Mag. 13(5), 45 (1996)
J. Yuan, M. Liberman, Automatic measurement and comparison of vowel nasalization across languages. In: Proceedings of the 17th International Congress of Phonetic Sciences (2011)
J. Yuan, A. Seidl, A. Cristiá, Automatic detection and comparison of vowel nasalization in American English. J. Acoust. Soc. Am. 128(4), 2291–2291 (2010)
D. Zhu, K.K. Paliwal, Product of power spectrum and group delay function for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I–125. IEEE (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Najnin, S., Shahnaz, C. Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum. Circuits Syst Signal Process 36, 181–201 (2017). https://doi.org/10.1007/s00034-016-0298-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-016-0298-3