Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system

Aggarwal, R. K.; Dave, M.

doi:10.1007/s11235-011-9623-0

Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system

Published: 01 September 2011

Volume 52, pages 1457–1466, (2013)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

R. K. Aggarwal¹ &
M. Dave¹

248 Accesses
26 Citations
Explore all metrics

Abstract

State-of-the-art automatic speech recognition (ASR) systems follow a well established statistical paradigm, that of parameterization of speech signals (a.k.a. feature extraction) at front-end and likelihood evaluation of feature vectors at back-end. For feature extraction, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are the two dominant signal processing methods, which have been used mainly in ASR. Although the effects of both techniques have been analyzed individually, it is not known whether any combination of the two can produce an improvement in the recognition accuracy or not. This paper presents an investigation on the possibility to integrate different types of features such as MFCC, PLP and gravity centroids to improve the performance of ASR in the context of Hindi language. Our experimental results show a significant improvement in case of such few combinations when applied to medium size lexicons in typical field conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

Feature Extraction Methods in Language Identification: A Survey

Article 22 April 2019

Weighting Schemes Based Discriminative Model Combination Technique for Robust Speech Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Benzeghiba, M., Mori, R. D., Deroo, O., Dupont, S., et al. (2007). Automatic speech recognition and speech variability, a review. ESCA Transactions on Speech Communication, 49(10–11), 763–786.
Article Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Hermansky, H. (1990). Perceptually predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752.
Article Google Scholar
Paliwal, K. K. (1998). Spectral subband centroid features for speech recognition. In Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP (Vol. 2, pp. 617–620).
Google Scholar
Hermansky, H., & Sharma, S. (1999). Temporal patterns (TRAPs) in ASR of noisy speech. In Proc. IEEE conference on acoustic speech and signal processing (Vol. 2, pp. 289–292).
Google Scholar
Sharma, A., Shrotriya, M. C., Farooq, O., & Abbasi, Z. A. (2008). Hybrid wavelet based LPC features for Hindi speech recognition. International Journal of Information and Communication Technology, 1, 373–381. Inderscience publisher.
Article Google Scholar
Psutka, J., Muller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterization in the speaker independent continuous speech recognition task. In Proceeding of EUROSPEECH, Denmark (pp. 1813–1816).
Google Scholar
Rabiner, L. R., & Juang, B. H. (2006). Speech recognition: statistical methods. In Encyclopedia of linguistics (pp. 1–18).
Google Scholar
Forney, G. D. (1973). The Viterbi algorithm. Proceedings of the IEEE, 61, 268–278.
Article Google Scholar
Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. In IEEE international conference on acoustics, speech and signal processing (Vol. 1, pp. 421–424).
Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Article Google Scholar
Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication, 26, 283–297.
Article Google Scholar
Garau, G., & Renals, S. (2008). Combining spectral representations for large-vocabulary continuous speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(3), 508–518.
Article Google Scholar
Hagen, A., & Morris, A. (2005). Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR. Computer Speech and Language, 19(3), 3–30.
Article Google Scholar
Woodland, P., Gales, M., Pye, D., & Young, S. (1997). Broadcast news transcription using HTK. In Proceeding of IEEE international conference on acoustics, speech and signal processing, ICASSP, Munich, Germany (Vol. 2, pp. 719–722).
Google Scholar
Zolney, A., Kocharov, D., Schluter, R., & Ney, H. (2007). Using multiple acoustic feature sets for speech recognition. Speech Communication, 49, 514–525.
Article Google Scholar
Beyerlein, P. (1997). Discriminative model combination. In Proceeding of IEEE automatic speech recognition and understanding workshop, Santa Barbara, CA (pp. 238–245).
Google Scholar
Tolba, H., Selouani, S. A., & O’Shaughnessy, D. (2002). Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In Proceeding of IEEE international conference on acoustics, speech and signal processing, ICASSP (Vol. 1, pp. 837–840).
Google Scholar
Fiscus, J. (1997). A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In Proceedings of the IEEE ASRU workshop, Santa Barbara (pp. 347–352).
Google Scholar
Vergin, R., O’Shaughnessy, D., & Farhat, A. (1999). Generalized Mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Transactions on Speech and Audio Processing, 7(5), 525–532.
Article Google Scholar
Gowdy, J., & Tufekci, Z. (2000). Mel scaled discrete wavelet coefficients for speech recognition. ICASSP Proceedings, 3, 1351–1354.
Google Scholar
Burget, Lukas, Matejka, Pavel, et al. (2007). Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 1979–1986.
Article Google Scholar
Baum, L. E., & Eagon, J. A. (1967). An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bulletin of the American Mathematical Society, 73, 360–363.
Article Google Scholar
Welch, L. R. (2003). HMMs and the Baum–Welch algorithms. IEEE Information Theory Society Newsletter, 53(4), 10–13.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Jiang, H. (2010). Discriminative training of HMMs model for automatic speech recognition: a survey. Computer Speech and Language, 24, 589–608. Elsevier.
Article Google Scholar
Leggetter, C. J., & Woodland, P. (1995). Speaker adaptation using maximum likelihood linear regression. Computer Speech and Language, 9(2), 171–185.
Article Google Scholar
Digalakis, V. V., & Murveit, H. (1994). Genones: optimizing the degree of tying in a large vocabulary HMM-based speech recognizer. In Proceeding of IEEE ICASSP (pp. 537–540).
Google Scholar
Hwang, M., & Huang, X. (1992). Subphonetic modeling with Markov states—Senone. In Proceeding of IEEE ICASSP (Vol. 1, pp. 33–36).
Google Scholar
Gales, M., & Young, S. (2007). The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304.
Article Google Scholar
Hidden Markov model toolkit (HTK-3.4.1): http://htk.eng.cam.ac.uk.
SPHINX, An open source: http://cmusphinx.sourceforge.net/html/cmusphinx.php.
ELRA catalogue, The EMILLE/CIIL Corpus, catalogue reference: ELRA-W0037, http://catalog.elra.info/product_info.php?products_id=696&keywords=mic.

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, N.I.T., Kurukshetra, Haryana, India
R. K. Aggarwal & M. Dave

Authors

R. K. Aggarwal
View author publications
You can also search for this author inPubMed Google Scholar
M. Dave
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to R. K. Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aggarwal, R.K., Dave, M. Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommun Syst 52, 1457–1466 (2013). https://doi.org/10.1007/s11235-011-9623-0

Download citation

Published: 01 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11235-011-9623-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

Feature Extraction Methods in Language Identification: A Survey

Weighting Schemes Based Discriminative Model Combination Technique for Robust Speech Recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now