Abstract
In this paper, a method based on cepstra derived from the product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. Unlike conventional mel frequency cepstral coefficients (MFCCs) derived from the power spectrum, MFCCs computed from the product spectrum, namely MFPSCCs are fed to a linear discriminant analysis (LDA) based classifier for the detection and classification of nasalized vowels. The performance of nasalized vowel detection and classification based on some of the state-of-the-art features, namely MFCCs, A1–P1 are compared with that of the proposed feature using not only LDA based classifier but also support vector machine based classifier. A detail simulation results on TIMIT database show that the proposed cepstral features derived from the product spectrum outperform the state-of-the-art features in the task of detecting and classifying nasalized vowels in clean as well as different noisy conditions.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-014-9225-9/MediaObjects/10772_2014_9225_Fig9_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Beddor, P. S. (1993). The perception of nasal vowels. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (Vol. 5, pp. 171–196). San Diego: Academic Press.
Bell-Berti, F. (1993). Understanding velic motor control:studies of segmental context. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (pp. 63–85). San Diego: Academic Press.
Cairns, D. A., Hansen, J., & Riski, J. (1996). A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Transactions on Biomedical Engineering, 43(1), 35. doi:10.1109/10.477699.
Chen, M. Y. (1995). Acoustic parameters of nasalized vowels in hearing impaired and normal hearing speakers. Journal of Acoustic Society of America, 98, 2443–2453.
Chen, M. Y. (1997). Acoustic correlates of english and french nasalized vowels. Journal of Acoustic Society of America, 102(4), 2360–2370.
Chen, N. F., Slifka, J. L., Stevens, K. N. (2007). Vowel nasalization in american english: Acoustic variability due to phonetic context. Speech Communication (pp. 905–918).
Deng, L., Acero, A., & Bazzi, I. (2006). Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 425–434. doi:10.1109/TSA.2005.855841.
Fant, G. (1960). Acoustic theory of speech production (2nd ed.). The Netherlands: Mouton.
Glass, J. R., Zue, V. W. (1985). Detection of nasalized vowels in american english. In Proceedings of IEEE International Conference of Acoustic, Speech, and Signal Processing (pp. 1569–1572)
Hawkins, S., & Stevens, K. N. (1985). Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. Journal of Acoustic Society of America, 77(4), 1560–1574.
Hedge, R. M., & Murthy, H. A. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transaction on Audio, Speech and Language Processing, 5(1), 189–201.
Hori, Y. (1983). An accelerometric measure as a physical correlate of perceived hypernasality in speech. Journal of Speech, Language and Hearing Research, 26, 476–480.
Johnson, M. H. (2005). Landmark-based speech recognition: Report of the 2004 johns hopkins summer workshop. In Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 213–216).
Kanagasundaram, A., Dean, D. B., et al. (2012). Weighted LDA techniques for i-vector based speaker verification. In IEEE Transactions on Acoustics, Speech, and Signal Processing (pp. 4781–4784). Japan: IEEE.
Kim, H. K., & Rose, R. C. (2003). Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Audio, Speech and Language Processing, 11(5), 435–446.
Krakow, R. (1993). Nonsegmental influences on velum movement patterns: Syllables, sentences, stress and speaking rate. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (Vol. 5, pp. 87–116). San Diego: Academic Press.
Maddieson, I. (1984). Patterns of sounds. Cambridge: Cambridge University Press.
Oppenheim, A. V., & Schafer, R. W. (2004). From frequency to quefrency: A history of the cepstrum. IEEE Signal Processing Magazine, 21(5), 95–106.
O’Shaughnessy, D. (2000). Speech communications: Human and machine (2nd ed.). New York: Universities Press.
Prasad, V. K., Nagarajan, T., Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communication, 42, 429–446. doi:10.1016/j.specom.2003.12.002. http://www.sciencedirect.com/science/article/pii/S0167639303001444.
Pruthi, T. (2007). Analysis, vocal-tract modeling, and automatic detection of vowel nasalization. Ph.D. Thesis, University of Maryland, College Park.
Pruthi, T., & Espy-Wilson, C. Y. (2004). Acoustic parameters for automatic detection of nasal manner. Speech Communication, 43(3), 225–239.
Rodenbaugh, M. A., & Reich, A. R. (1985). Correspondence between an accelerometric nasal/voice amplitude ratio and listeners direct magnitude estimation of hypernasality. Journal of Speech and Hearing Research, 28, 273–281.
Seaver, E. J., Dalston, R. M., Leeper, H. A., & Adams, L. E. (1991). A study of nasometric values for normal nasal resonance. Journal of Speech and Hearing Research, 34(4), 715–721.
TIMIT. (1990). TIMIT acoustic-phonetic continuous speech corpus. In National Institute of Standards and Technology Speech Disc 1–1.1, NTIS order no. pb91-5050651996.
Verhelst, W., & Steenhaut, O. (1986). A new model for the shorttime complex cepstrum of voiced speech. IEEE Transactions on Audio, Speech and Language Processing, 34(1), 43–51.
Yegnanarayana, B., Saikia, D., & Krishnan, T. (1984). Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 610–623. doi:10.1109/TASSP.1984.1164365.
Young, S. (1996). A review of large-vocabulary continuous-speech. IEEE Signal Processing Magazine, 13(5), 45. doi:10.1109/79.536824.
Yuan, J., Liberman, M. (2011). Automatic measurement and comparison of vowel nasalization across languages. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong (pp. 2011–2247).
Yuan, J., Seidl, A., & Cristi, A. (2010). Automatic detection and comparison of vowel nasalization in American English. Journal of Acoustic Society of America, 128(4), 2291.
Zhu, D., Paliwal, K. K. (2004). Product of power spectrum and group delay function for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 125–8).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Najnin, S., Shahnaz, C. A detection and classification method for nasalized vowels in noise using product spectrum based cepstra. Int J Speech Technol 18, 97–111 (2015). https://doi.org/10.1007/s10772-014-9225-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-014-9225-9