Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

Vuppala, Anil Kumar; Rao, K. Sreenivasa; Chakrabarti, Saswat; Krishnamoorthy, P.; Prasanna, S. R. M.

doi:10.1007/s10772-011-9101-9

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

Published: 11 August 2011

Volume 14, pages 259–272, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Anil Kumar Vuppala¹,
K. Sreenivasa Rao²,
Saswat Chakrabarti¹,
P. Krishnamoorthy³ &
…
S. R. M. Prasanna⁴

293 Accesses
13 Citations
Explore all metrics

Abstract

This paper proposes hybrid classification models and preprocessing methods for enhancing the consonant-vowel (CV) recognition in the presence of background noise. Background Noise is one of the major degradation in real-time environments which strongly effects the performance of speech recognition system. In this work, combined temporal and spectral processing (TSP) methods are explored for preprocessing to improve CV recognition performance. Proposed CV recognition method is carried out in two levels to reduce the similarity among large number of CV classes. In the first level vowel category of CV unit will be recognized, and in the second level consonant category will be recognized. At each level complementary evidences from hybrid models consisting of support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance. Performance of the proposed CV recognition system is evaluated on Telugu broadcast database for white and vehicle noise. The proposed preprocessing methods and hybrid classification models have improved the recognition performance compared to existed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A detection and classification method for nasalized vowels in noise using product spectrum based cepstra

Article 01 October 2014

Effect of Processing Combined MFCC and DSCC Features with QCN for Hindi Vowel Classification in Noisy Environments

Continuous Tamil Speech Recognition technique under non stationary noisy environments

Article 30 November 2018

References

Bell, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.
Article Google Scholar
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2).
Collobert, R., & Bengio, S. (2001). Svmtorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1, 143–160.
MathSciNet Google Scholar
Cui, X., & Alwan, A. (2005). Noise robust speech recognition using feature compensation based on polynomial regression of utterance snr. IEEE Transactions on Speech and Audio Processing, 13(6), 1161–1172.
Article Google Scholar
de la Torre, A., Peinado, A. M., Segura, J. C., Perez-Cordoba, J. L., Benitez, M. C., & Rubio, A. J. (2005). Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 13(3), 355–366.
Article Google Scholar
Ephrain, Y., & Malah, D. (1984). Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32, 1109–1121.
Article Google Scholar
Gales, M., Young, S., & Young, S. J. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359.
Article Google Scholar
Gangashetty, S. V. (2004). Neural network models for recognition of consonant-vowel units of speech in multiple languages. Ph.D. dissertation, IIT Madras, October.
Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005a). Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In Proc. of ICISIP (pp. 387–391).
Google Scholar
Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005b). Spotting multilingual consonant-vowel units of speech using neural networks. In An ISCA tutorial and research workshop on non-linear speech processing (pp. 287–297).
Google Scholar
Hegde, R. M., Murthy, H. A., & Gadde, V. (2004). Continuous speech recognition using joint features derived from the modified group delay function and mfcc. In Proc. INTERSPEECH-ICSLP (pp. 905–908).
Google Scholar
Hermanski, H., Morgan, N., & Hirsch, H. G. (1994). Recognition of speech in additive and convolutional noise based on rasta spectral processing. In Proc. IEEE int. conf. acoust., speech, signal process.
Google Scholar
Hermes, D. J. (1990). Vowel onset detection. The Journal of the Acoustical Society of America, 87, 866–873.
Article Google Scholar
Hermus, K., & Wambacq, P. (2004). Assessment of signal subspace based speech enhancement for noise robust speech recognition. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 945–948).
Google Scholar
Hermus, K., Verhelst, W., & Wambacq, P. (2000). Optimized subspace weighting for robust speech recognition in additive noise environments. In Proc. of 6th international conference on spoken language processing (pp. 542–545).
Google Scholar
Hilger, F., & Ney, H. (2006). Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 845–854.
Article Google Scholar
Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.
Article Google Scholar
Huang, J., & Zhao, Y. (1997). Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters, 4, 283–285.
Article Google Scholar
Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proc. IEEE int. conf. acoust., speech, signal process, Orlando, USA.
Google Scholar
Kim, D. K., & Gales, M. J. F. (2011). Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 315–325.
Article Google Scholar
Kris, H., Patrick, W., & ham Hugo, V. (2007). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Applied Signal Processing, 195–209.
Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53, 154–174.
Article Google Scholar
Liao, H., & Gales, M. J. F. (2007). Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 389–392).
Google Scholar
Mokbel, C., & Chollet, G. (1991). Speech recognition in adverse environments: speech enhancement and spectral transformations. In Proc. IEEE int. conf. acoust., speech, signal process.
Google Scholar
Moreno, P. J. (1996). Speech recognition in noisy environments. Ph.D. dissertation, Carnegie Mellon University.
Nolazco-Flores, J. A., & Young, S. (1993). CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise (Technical Report). Cambridge University Engineering Department.
Ohkura, K., & Sugiyama, M. (1991). Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique. In Proc. IEEE int. conf. acoust., speech, signal process.
Google Scholar
Ozlem, K., Michael, L. S., Jasha, D., & Alex, A. (2010). Noise adaptive training for robust automatic speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 1889–1901.
Article Google Scholar
Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.
Article Google Scholar
Prasanna, S. M. (2004). Event-based analysis of speech. Ph.D. dissertation, IIT Madras, March.
Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In Proc. of interspeech (pp. 1133–1136).
Google Scholar
Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.
Article Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. of IEEE (pp. 257–286).
Google Scholar
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
Google Scholar
Rao, K. S. (2011). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14(1), 19–33.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2009a). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2009b). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269.
Article Google Scholar
Sekhar, C. C. (1996). Neural network models for recognition of stop consonant-vowel (scv) segments in continuous speech. Ph.D. dissertation, IIT Madras.
Sekhar, C. C., Lee, W. F., Takeda, K., & Itakura, F. (2003). Acoustic modeling of subword units using support vector machines. In Proceedings of WSLP.
Google Scholar
Suh, Y., Ji, M., & Kim, H. (2007). Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Processing Letters, 14(4), 287–290.
Article Google Scholar
Vaseghi, S. V., & Milner, B. P. (1997). Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Transactions on Speech and Audio Processing, 5(1), 11–21.
Article Google Scholar
Viiki, O., Bye, B., & Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition in noise. In Proc. IEEE int. conf. acoust., speech, signal process.
Google Scholar
Vuppala, A. K., Chakrabarti, S., & Rao, K. S. (2010). Effect of speech coding on recognition of consonant-vowel (CV) units. In Proc. int. conf. contemporary computing. Springer communications in computer and information science (pp. 284–294).
Google Scholar
Yegnanarayana, B., & Murthy, S. (2000). Enhancement of reverberant speech using lp residual signal. IEEE Transactions on Speech and Audio Processing, 8, 267–281.
Article Google Scholar
Yegnanarayana, B., Avendano, C., Hermansky, H., & Murthy, S. (1999). Speech enhancement using linear prediction residual. Speech Communication, 28, 25–42.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Duraiswami, R., & Zotkin, D. (2005). Processing of reverberant speech for time-delay estimation. IEEE Transactions on Speech and Audio Processing, 13, 1110–1118.
Article Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2000). The HTK book version 3.0. Cambridge: Cambridge University Press.
Google Scholar
Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., & Acero, A. (2008). A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 4041–4044).
Google Scholar

Download references

Author information

Authors and Affiliations

G. S. Sanyal School of Telecommunications, Indian Institute of Technology, Kharagpur, India
Anil Kumar Vuppala & Saswat Chakrabarti
School of Information Technology, Indian Institute of Technology, Kharagpur, India
K. Sreenivasa Rao
Samsung India Software Center, Noida, India
P. Krishnamoorthy
Department of Electronics and Communication Engineering, Indian Institute of Technology, Guwahati, India
S. R. M. Prasanna

Authors

Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Saswat Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
P. Krishnamoorthy
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anil Kumar Vuppala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vuppala, A.K., Rao, K.S., Chakrabarti, S. et al. Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int J Speech Technol 14, 259–272 (2011). https://doi.org/10.1007/s10772-011-9101-9

Download citation

Received: 24 May 2011
Accepted: 18 July 2011
Published: 11 August 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10772-011-9101-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

Abstract

Access this article

Similar content being viewed by others

A detection and classification method for nasalized vowels in noise using product spectrum based cepstra

Effect of Processing Combined MFCC and DSCC Features with QCN for Hindi Vowel Classification in Noisy Environments

Continuous Tamil Speech Recognition technique under non stationary noisy environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

Abstract

Access this article

Similar content being viewed by others

A detection and classification method for nasalized vowels in noise using product spectrum based cepstra

Effect of Processing Combined MFCC and DSCC Features with QCN for Hindi Vowel Classification in Noisy Environments

Continuous Tamil Speech Recognition technique under non stationary noisy environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation