Skip to main content
Log in

Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this paper, a method based on cepstra derived from the differential product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band-limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. The product spectrum is then differentiated with respect to frequency to obtain differential product spectrum (DPrS) that is argued to provide more noise robustness in the presence of noise. Unlike conventional mel-frequency cepstral coefficient (MFCC), MFCCs computed from the differential product spectrum, namely MFDPrSCCs, are fed to a linear discriminant analysis-based classifier for the detection and classification of nasalized vowels. Detailed simulation results on TIMIT database show that the proposed cepstral features derived from the differential product spectrum are capable of outperforming the cepstral features derived from the conventional power spectrum in the task of detecting and classifying nasalized vowel not only in clean condition but also in different noisy condition with varying signal to noise ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. P.S. Beddor, The perception of nasal vowels, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 171–196

    Chapter  Google Scholar 

  2. F. Bell-Berti, Understanding velic motor control: studies of segmental context, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 63–85

    Chapter  Google Scholar 

  3. D.A. Cairns, J. Hansen, J. Riski, A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans. Biomed. Eng. 43(1), 35 (1996). doi:10.1109/10.477699

    Article  Google Scholar 

  4. J. Chen, K.K. Paliwal, S. Nakamura, Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Commun. 41, 469–484 (2003)

    Article  Google Scholar 

  5. M.Y. Chen, Acoustic parameters of nasalized vowels in hearing-impaired and normal-hearing speakers. J. Acoust. Soc. Am. 98(5), 2443–2453 (1995)

    Article  Google Scholar 

  6. M.Y. Chen, Acoustic correlates of English and French nasalized vowels. J. Acoust. Soc. Am. 102(4), 2360–2370 (1997)

    Article  Google Scholar 

  7. N.F. Chen, J.L. Slifka, K.N. Stevens, Vowel nasalization im american english: Acoustic variability due to phonetic context. In: Proceedings of International Congress of Phonetic Sciences, pp. 905–908 (2007)

  8. K. Daqrouq, T.A. Tutunji, Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl. Soft. Comput. 27, 231–239 (2015)

    Article  Google Scholar 

  9. L. Deng, A. Acero, I. Bazzi, Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint. IEEE Trans. Audio Speech Lang. Process. 14(2), 425–434 (2006)

    Article  Google Scholar 

  10. G. Fant, Acoustic Theory of Speech Production, 2nd edn. (Mouton, The Hague, 1960)

    Google Scholar 

  11. J.R. Glass, V.W. Zue, Detection of nasalized vowels in american English. In: Proceedings of IEEE International Conference of Acoustic, Speech, and Signal Processing, pp. 1569–1572 (1985)

  12. M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, S. Mohan, et al., Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, p. 1213. NIH Public Access (2005)

  13. S. Hawkins, K.N. Stevens, Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. J. Acoust. Soc. Am. 77(4), 1560–1575 (1985)

    Article  Google Scholar 

  14. L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, Y. Huang, Automatic evaluation of hypernasality based on a cleft palate speech database. J. Med. Syst. 39(5), 1–7 (2015)

    Article  Google Scholar 

  15. R.M. Hegde, H.A. Murthy, V.R.R. Gadde, Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)

  16. Y. Horii, An accelerometric measure as a physical correlate of perceived hypernasality in speech. J. Speech Lang. Hear. Res. 26(3), 476–480 (1983)

    Article  Google Scholar 

  17. A. Kanagasundaram, D. Dean, R. Vogt, M. McLaren, S. Sridharan, M. Mason, Weighted lda techniques for i-vector based speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4781–4784. IEEE (2012)

  18. H.K. Kim, R.C. Rose, Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for asr in noisy environments. IEEE Trans. Audio Speech Lang. Process. 11(5), 435–446 (2003)

    Article  Google Scholar 

  19. R. Krakow, Nonsegmental influences on velum movement patterns: syllables, sentences, stress and speaking rate, in Phonetics and Phonology: Nasals, Nasalization, and the Velum, vol. 5, ed. by M.K. Huffman, R.A. Krakow (Academic Press, San Diego, 1993), pp. 87–116

    Chapter  Google Scholar 

  20. I. Maddieson, Patterns of Sounds (Cambridge University Press, Cambridge, 1984)

    Book  Google Scholar 

  21. B.B. Monson, A.J. Lotto, B.H. Story, Detection of high-frequency energy level changes in speech and singing. J. Acoust. Soc. Am. 135(1), 400–406 (2014)

    Article  Google Scholar 

  22. H. Murthy, V. Gadde, et al., The modified group delay function and its application to phoneme recognition. In: Proceedings of ICASSP’03, vol. 1, pp. I–68, IEEE (2003)

  23. S. Najnin, B. Banerjee, Improved speech inversion using general regression neural network. J. Acoust. Soc. Am. 138(3), EL229–EL235 (2015)

    Article  Google Scholar 

  24. S. Najnin, C. Shahnaz, A detection and classification method for nasalized vowels in noise using product spectrum based cepstra. Int. J. Speech Technol. 18(1), 97–111 (2015)

    Article  Google Scholar 

  25. A.V. Oppenheim, R.W. Schafer, From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. 21(5), 95–106 (2004)

    Article  Google Scholar 

  26. D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd edn. (Universities Presss, New York, 2000)

    MATH  Google Scholar 

  27. J.R. Orozco-Arroyave, J. Vargas-Bonilla, J.D. Arias-Londoño, S. Murillo-Rendón, G. Castellanos-Domínguez, J. Garcés, Nonlinear dynamics for hypernasality detection in Spanish vowels and words. Cogn. Comput. 5(4), 448–457 (2013)

    Article  Google Scholar 

  28. V.K. Prasad, T. Nagarajan, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42(3), 429–446 (2004)

    Article  Google Scholar 

  29. T. Pruthi, Analysis, vocal-tract modeling, and automatic detection of vowel nasalization. Ph.D. thesis, University of Maryland (2007)

  30. T. Pruthi, C.Y. Espy-Wilson, Acoustic parameters for automatic detection of nasal manner. Speech Commun. 43(3), 225–239 (2004)

    Article  Google Scholar 

  31. M.A. Redenbaugh, A.R. Reich, Correspondence between an accelerometric nasal/voice amplitude ratio and listeners’ direct magnitude estimations of hypernasality. J. Speech. Lang. Hear. Res. 28(2), 273–281 (1985)

    Article  Google Scholar 

  32. E.J. Seaver, R.M. Dalston, H.A. Leeper, L.E. Adams, A study of nasometric values for normal nasal resonance. J. Speech Lang. Hear. Res. 34(4), 715–721 (1991)

    Article  Google Scholar 

  33. C. Shahnaz, S. Najnin, S.A. Fattah, W.P. Zhu, M.O. Ahmad, A detection method of nasalised vowels based on an acoustic parameter derived from phase spectrum. In: IEEE International Symposium on Circuits and Systems, pp. 297–300. IEEE (2013)

  34. E.B. Thorp, B.T. Virnik, C.E. Stepp, Comparison of nasal acceleration and nasalance across vowels. J. Speech Lang. Hear. Res. 56(5), 1476–1484 (2013)

    Article  Google Scholar 

  35. TIMIT, Timit acoustic-phonetic continuous speech corpus, national institute of standards and technology speech disc 1-1.1, ntis order no. pb91-5050651996 (1990)

  36. W. Verhelst, O. Steenhaut, A new model for the short-time complex cepstrum of voiced speech. IEEE Trans. Audio Speech Lang. Process. 34(1), 43–51 (1986)

    Article  Google Scholar 

  37. P. Vijayalakshmi, M.R. Reddy, D. O’Shaughnessy, Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)

    Article  Google Scholar 

  38. B. Yegnanarayana, D. Saikia, T. Krishnan, Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Audio Speech Lang. Process. 32(3), 610–623 (1984). doi:10.1109/TASSP.1984.1164365

    Article  Google Scholar 

  39. S. Young, A review of large-vocabulary continuous-speech. IEEE Signal Process. Mag. 13(5), 45 (1996)

    Article  Google Scholar 

  40. J. Yuan, M. Liberman, Automatic measurement and comparison of vowel nasalization across languages. In: Proceedings of the 17th International Congress of Phonetic Sciences (2011)

  41. J. Yuan, A. Seidl, A. Cristiá, Automatic detection and comparison of vowel nasalization in American English. J. Acoust. Soc. Am. 128(4), 2291–2291 (2010)

    Article  Google Scholar 

  42. D. Zhu, K.K. Paliwal, Product of power spectrum and group delay function for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I–125. IEEE (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shamima Najnin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Najnin, S., Shahnaz, C. Detection and Classification of Nasalized Vowels in Noise Based on Cepstra Derived from Differential Product Spectrum. Circuits Syst Signal Process 36, 181–201 (2017). https://doi.org/10.1007/s00034-016-0298-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-016-0298-3

Keywords

Navigation