Filterbank optimization for robust ASR using GA and PSO

Aggarwal, R. K.; Dave, M.

doi:10.1007/s10772-012-9133-9

Filterbank optimization for robust ASR using GA and PSO

Published: 09 February 2012

Volume 15, pages 191–201, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

R. K. Aggarwal¹ &
M. Dave¹

302 Accesses
18 Citations
Explore all metrics

Abstract

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal, R. K., & Dave, M. (2011a). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems Journal. doi:10.1007/s11235-011-9623-0. Special issue on signal processing applications in human computer interaction.
Google Scholar
Aggarwal, R. K., & Dave, M. (2011b). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308.
Article Google Scholar
Aggarwal, R. K., & Dave, M. (2011c). Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II). International Journal of Speech Technology, 14(4), 309–320.
Article Google Scholar
Benesty, J., Sondhi, M.M., & Huang, Y. (2008). Handbook of speech processing. Berlin: Springer.
Book Google Scholar
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.
Article Google Scholar
Burget, L., & Hermansky, H. (2001). Data driven design of filterbank for speech recognition. In Lecture notes in computer science: Vol. 2166. Text, speech and dialogue (pp. 299–304). Berlin: Springer.
Chapter Google Scholar
Chau, C. W., Kwong, S., Diu, C. K., & Fahrner, W. R. (1997). Optimization of HMM by a genetic algorithm. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 1727–1730).
Google Scholar
Chen, J., Benesty, J., Huang, Y., & Doclo, S. (2006). New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech, & Language Processing, 14(4), 1218–1234.
Article Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.
Article Google Scholar
Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–56.
Article Google Scholar
Gales, M., & Young, S. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359.
Article Google Scholar
Hermansky, H. (1990). Perceptually predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752.
Article Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Article Google Scholar
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Google Scholar
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of international conference on neural networks (pp. 1942–1948). Piscataway: IEEE.
Google Scholar
Kennedy, J., Eberhart, R.C., & Shi, Y. (2001). Swarm intelligence. San Mateo: Morgan Kaufmann.
Google Scholar
Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. In Proceedings IEEE international conference on acoustics, speech and signal processing (Vol. 1, pp. 421–424).
Google Scholar
Kwong, S., Chau, C. W., & Halang, W. A. (1996). Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems. IEEE Transactions on Industrial Electronics, 43(5), 559–566.
Article Google Scholar
Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimization of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522.
Article MATH Google Scholar
Kwong, S., He, Q. H., Ku, K. W., Chan, T. M., Man, K. F., & Tang, K. S. (2002). A genetic classification error method for speech recognition. Signal Processing, 82, 737–748.
Article MATH Google Scholar
Loizou, P. C., & Spanias, A. S. (1996). High-performance alphabet recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 430–445.
Article Google Scholar
Najkar, N., Razzazi, F., & Sameti, H. (2010). A novel approach to HMM-based speech recognition systems using particle swarm optimization. Mathematical and Computer Modelling, 52, 1910–1920.
Article MATH Google Scholar
Paliwal, K. K. (1987). A speech enhancement method based on Kalman filtering. In Proceedings IEEE ICASSP (pp. 177–180).
Google Scholar
Rabanal, P., Rodriguez, I., & Rubio, F. (2009). Applying river formation dynamics to solve NP-complete problems. In Studies in computational intelligence: Vol. 193. Nature-inspired algorithms for optimization (pp. 333–368). Springer, Berlin.
Chapter Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295.
Article Google Scholar
Rao, K. S. (2011). Role of neural network models for developing speech systems. Sadhana, 36(5), 783–836.
Article Google Scholar
Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization. In Proceedings of seventh annual conference on evolutionary programming (pp. 591–601).
Chapter Google Scholar
Skowronski, M. D., & Harris, J. G. (2003). Improving the filterbank of a classic speech feature extraction algorithm. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS’03), (Vol. 4, pp. 281–284).
Google Scholar
Skowronski, M. D., & Harris, J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. The Journal of the Acoustical Society of America, 116(3), 1774–1780.
Article Google Scholar
Valle, Y. D., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J.-C., & Harley, R. G. (2008). Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Transactions on Evolutionary Computation, 12(2), 171–195.
Article Google Scholar
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. ESCA Journal of Speech Communication, 12(3), 247–251.
Article Google Scholar
Welch, L. R. (2003). HMMs and the Baum-Welch algorithms. IEEE Information Theory Society Newsletter, 53(4), 10–13.
MathSciNet Google Scholar
Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, N.I.T., Kurukshetra, India
R. K. Aggarwal & M. Dave

Authors

R. K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
M. Dave
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. K. Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aggarwal, R.K., Dave, M. Filterbank optimization for robust ASR using GA and PSO. Int J Speech Technol 15, 191–201 (2012). https://doi.org/10.1007/s10772-012-9133-9

Download citation

Received: 18 October 2011
Accepted: 23 January 2012
Published: 09 February 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-012-9133-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Filterbank optimization for robust ASR using GA and PSO

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Filterbank optimization for robust ASR using GA and PSO

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation