Skip to main content
Log in

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper proposes hybrid classification models and preprocessing methods for enhancing the consonant-vowel (CV) recognition in the presence of background noise. Background Noise is one of the major degradation in real-time environments which strongly effects the performance of speech recognition system. In this work, combined temporal and spectral processing (TSP) methods are explored for preprocessing to improve CV recognition performance. Proposed CV recognition method is carried out in two levels to reduce the similarity among large number of CV classes. In the first level vowel category of CV unit will be recognized, and in the second level consonant category will be recognized. At each level complementary evidences from hybrid models consisting of support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance. Performance of the proposed CV recognition system is evaluated on Telugu broadcast database for white and vehicle noise. The proposed preprocessing methods and hybrid classification models have improved the recognition performance compared to existed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bell, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.

    Article  Google Scholar 

  • Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2).

  • Collobert, R., & Bengio, S. (2001). Svmtorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1, 143–160.

    MathSciNet  Google Scholar 

  • Cui, X., & Alwan, A. (2005). Noise robust speech recognition using feature compensation based on polynomial regression of utterance snr. IEEE Transactions on Speech and Audio Processing, 13(6), 1161–1172.

    Article  Google Scholar 

  • de la Torre, A., Peinado, A. M., Segura, J. C., Perez-Cordoba, J. L., Benitez, M. C., & Rubio, A. J. (2005). Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 13(3), 355–366.

    Article  Google Scholar 

  • Ephrain, Y., & Malah, D. (1984). Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32, 1109–1121.

    Article  Google Scholar 

  • Gales, M., Young, S., & Young, S. J. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359.

    Article  Google Scholar 

  • Gangashetty, S. V. (2004). Neural network models for recognition of consonant-vowel units of speech in multiple languages. Ph.D. dissertation, IIT Madras, October.

  • Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005a). Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In Proc. of ICISIP (pp. 387–391).

    Google Scholar 

  • Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005b). Spotting multilingual consonant-vowel units of speech using neural networks. In An ISCA tutorial and research workshop on non-linear speech processing (pp. 287–297).

    Google Scholar 

  • Hegde, R. M., Murthy, H. A., & Gadde, V. (2004). Continuous speech recognition using joint features derived from the modified group delay function and mfcc. In Proc. INTERSPEECH-ICSLP (pp. 905–908).

    Google Scholar 

  • Hermanski, H., Morgan, N., & Hirsch, H. G. (1994). Recognition of speech in additive and convolutional noise based on rasta spectral processing. In Proc. IEEE int. conf. acoust., speech, signal process.

    Google Scholar 

  • Hermes, D. J. (1990). Vowel onset detection. The Journal of the Acoustical Society of America, 87, 866–873.

    Article  Google Scholar 

  • Hermus, K., & Wambacq, P. (2004). Assessment of signal subspace based speech enhancement for noise robust speech recognition. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 945–948).

    Google Scholar 

  • Hermus, K., Verhelst, W., & Wambacq, P. (2000). Optimized subspace weighting for robust speech recognition in additive noise environments. In Proc. of 6th international conference on spoken language processing (pp. 542–545).

    Google Scholar 

  • Hilger, F., & Ney, H. (2006). Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 845–854.

    Article  Google Scholar 

  • Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.

    Article  Google Scholar 

  • Huang, J., & Zhao, Y. (1997). Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters, 4, 283–285.

    Article  Google Scholar 

  • Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proc. IEEE int. conf. acoust., speech, signal process, Orlando, USA.

    Google Scholar 

  • Kim, D. K., & Gales, M. J. F. (2011). Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 315–325.

    Article  Google Scholar 

  • Kris, H., Patrick, W., & ham Hugo, V. (2007). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Applied Signal Processing, 195–209.

  • Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53, 154–174.

    Article  Google Scholar 

  • Liao, H., & Gales, M. J. F. (2007). Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 389–392).

    Google Scholar 

  • Mokbel, C., & Chollet, G. (1991). Speech recognition in adverse environments: speech enhancement and spectral transformations. In Proc. IEEE int. conf. acoust., speech, signal process.

    Google Scholar 

  • Moreno, P. J. (1996). Speech recognition in noisy environments. Ph.D. dissertation, Carnegie Mellon University.

  • Nolazco-Flores, J. A., & Young, S. (1993). CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise (Technical Report). Cambridge University Engineering Department.

  • Ohkura, K., & Sugiyama, M. (1991). Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique. In Proc. IEEE int. conf. acoust., speech, signal process.

    Google Scholar 

  • Ozlem, K., Michael, L. S., Jasha, D., & Alex, A. (2010). Noise adaptive training for robust automatic speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 1889–1901.

    Article  Google Scholar 

  • Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.

    Article  Google Scholar 

  • Prasanna, S. M. (2004). Event-based analysis of speech. Ph.D. dissertation, IIT Madras, March.

  • Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In Proc. of interspeech (pp. 1133–1136).

    Google Scholar 

  • Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.

    Article  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. of IEEE (pp. 257–286).

    Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Rao, K. S. (2011). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14(1), 19–33.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2009a). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2009b). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269.

    Article  Google Scholar 

  • Sekhar, C. C. (1996). Neural network models for recognition of stop consonant-vowel (scv) segments in continuous speech. Ph.D. dissertation, IIT Madras.

  • Sekhar, C. C., Lee, W. F., Takeda, K., & Itakura, F. (2003). Acoustic modeling of subword units using support vector machines. In Proceedings of WSLP.

    Google Scholar 

  • Suh, Y., Ji, M., & Kim, H. (2007). Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Processing Letters, 14(4), 287–290.

    Article  Google Scholar 

  • Vaseghi, S. V., & Milner, B. P. (1997). Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Transactions on Speech and Audio Processing, 5(1), 11–21.

    Article  Google Scholar 

  • Viiki, O., Bye, B., & Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition in noise. In Proc. IEEE int. conf. acoust., speech, signal process.

    Google Scholar 

  • Vuppala, A. K., Chakrabarti, S., & Rao, K. S. (2010). Effect of speech coding on recognition of consonant-vowel (CV) units. In Proc. int. conf. contemporary computing. Springer communications in computer and information science (pp. 284–294).

    Google Scholar 

  • Yegnanarayana, B., & Murthy, S. (2000). Enhancement of reverberant speech using lp residual signal. IEEE Transactions on Speech and Audio Processing, 8, 267–281.

    Article  Google Scholar 

  • Yegnanarayana, B., Avendano, C., Hermansky, H., & Murthy, S. (1999). Speech enhancement using linear prediction residual. Speech Communication, 28, 25–42.

    Article  Google Scholar 

  • Yegnanarayana, B., Prasanna, S. R. M., Duraiswami, R., & Zotkin, D. (2005). Processing of reverberant speech for time-delay estimation. IEEE Transactions on Speech and Audio Processing, 13, 1110–1118.

    Article  Google Scholar 

  • Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2000). The HTK book version 3.0. Cambridge: Cambridge University Press.

    Google Scholar 

  • Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., & Acero, A. (2008). A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 4041–4044).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anil Kumar Vuppala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vuppala, A.K., Rao, K.S., Chakrabarti, S. et al. Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int J Speech Technol 14, 259–272 (2011). https://doi.org/10.1007/s10772-011-9101-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9101-9

Keywords

Navigation