Skip to main content
Log in

Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper describes an efficient constructive training algorithm using a Multi Layer Perceptron (MLP) neural network dedicated for Isolated Word Recognition (IWR) systems. Incremental training procedure was employed and this approach was based on novel hidden neurons recruiting for a single hidden-layer. During Neural Network (NN) training phase, the number of pronunciation samples extracted from the Training Data (TD) was sequentially increased. Optimal structure of the NN classifier with optimized TD size was obtained using this proposed MLP constructive training algorithm.

Isolated word recognition system based on MLP neural network was then constructed and tested for recognizing ten words extracted from TIMIT database. Mel Frequency Cepstral Coefficient (MFCC) feature extraction method was employed including energy, first and second derivative coefficients.

A proposed Frame-by-Frame Neural Network (FFNN) classification method was explored and compared with the Conventional Neural Network (CNN) classification approach. Principal Component Analysis (PCA) technique was also investigated in order to reduce both TD size as well as recognition system complexity.

Experimental results showed superior performance of the proposed FFNN classifier compared to the CNN counter part which was illustrated by the significant improvement obtained in terms of recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bourlard, H. A., & Morgan, N. (1998). Hybrid HMM/ANN systems for speech recognition: Overview and new research directions. In Lecture notes comput. sci. (Vol. 1387, pp. 389–417).

    Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28, 357–366.

    Article  Google Scholar 

  • Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, and Speech Signal Processing, 34(1), 52–59.

    Article  Google Scholar 

  • Gandhiraj, R., & Sathidevi, P. S. (2007). Auditory-based wavelet packet filterbank for speech recognition using neural network. In Proceedings of the 15th international conference on advanced computing and communications, Dec. 18–21, Guwahati, India (pp. 666–673).

    Chapter  Google Scholar 

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87, 1738–1752.

    Article  Google Scholar 

  • Hermansky, H. (1997). The modulation spectrum in the automatic recognition of speech. In Proceedings of the IEEE workshop on automatic speech recognition and understanding, Dec. 14–17, Santa Barbara, CA (pp. 140–147).

    Chapter  Google Scholar 

  • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feed forward networks are universal approximators. Neural Networks, 2, 359–366.

    Article  Google Scholar 

  • Juang, C. F., Chiou, C. T., & Lai, C. L. (2007). Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition. IEEE Transactions on Neural Networks, 18, 833–843.

    Article  Google Scholar 

  • Kandali, A. B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12, 1–13.

    Article  Google Scholar 

  • Kuang, Z., & Kuh, A. (1992). A combined self-organizing feature map and multilayer perceptron for isolated word recognition. IEEE Transactions on Signal Processing, 40(11), 2651–2657.

    Article  Google Scholar 

  • Lee, L. M., & Wang, H. C. (1994). A study on adaptations of cepstral and delta cepstral coefficients for noisy speech recognition. In Proc. int. conf. on spoken language processing, Yokohama, Japan (Vol. 3, pp. 1011–1014).

    Google Scholar 

  • Lee, T., Ching, P. C., & Chan, L. W. (1998). Isolated word recognition using modular recurrent neural networks. Pattern Recognition, 31, 751–760.

    Article  Google Scholar 

  • Levin, E. (1990). Word recognition using hidden control neural network architecture. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (ICASSP’90), Apr. 3–6, Albuquerque, NM (pp. 433–436).

    Google Scholar 

  • Li, Y. X., Kwong, S., He, Q. H., He, J., & Yang, J. C. (2010). Genetic algorithm based simultaneous optimization of feature subsets and hidden Markov model parameters for discrimination between speech and non-speech events. International Journal of Speech Technology, 13, 61–73.

    Article  Google Scholar 

  • Liang, Q., & Harris, J. G. (2003). The feature of artificial neural networks and speech recognition. In C. T. Leondes (Ed.), Intelligent systems: technology and applications signal, image, and speech processing (Vol. 3, pp. 215–236). Boca Raton: CRC Press.

    Google Scholar 

  • Lim, C. P., Woo, S. C., Loh, A. S., & Osman, R. (2000). Speech recognition using artificial neural networks. In Proceeding of the first international conference on web information systems engineering, Jun 19–21, Hong Kong (Vol. 1, pp. 419–423).

    Chapter  Google Scholar 

  • Lippmann, R. P. (1989). Pattern classification using neural networks. IEEE Communications Magazine, 27, 47–50, 59–64.

    Article  Google Scholar 

  • Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feed forward neural networks with incremental training. IEEE Transactions on Circuits and Systems, 49, 1876–1879.

    Article  Google Scholar 

  • Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • Masmoudi, S., Chtourou, M., & Hamida, A. B. (2009). Isolated word recognition using MLP neural network constructive training algorithm. In Proceeding of the 6 th international multi-conference on systems, signals and devices, SSD’09, March 23–26, Djerba, Tunisia (pp. 1–6).

    Chapter  Google Scholar 

  • Morgan, N., & Bourlard, H. A. (1995). Neural networks for statistical recognition of continuous speech. Proceedings of the IEEE, 83, 742–772.

    Article  Google Scholar 

  • Puurula, A., & Compernolle, D. V. (2010). Dual stream speech recognition using articulatory syllable models. International Journal of Speech Technology, 13(4), 219–230.

    Article  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.

    Article  Google Scholar 

  • Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.

    Article  MATH  Google Scholar 

  • Schwenk, H., & Gauvain, J. L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP’02), May 13–17, Orlando, FL, USA (pp. 765–768).

    Google Scholar 

  • Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the IEEE international conference acoustic speech signal processing, April 3–6, Albuquerque, NM (pp. 437–440).

    Google Scholar 

  • Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustic, Speech and Signal Procesing, 37(3), 328–339.

    Article  Google Scholar 

  • Wang, L., Chen, K., & Chi, H. (2002). Capture interspeaker information with a neural network for speaker identification. IEEE Transactions on Neural Networks, 13, 436–445.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabeur Masmoudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masmoudi, S., Frikha, M., Chtourou, M. et al. Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system. Int J Speech Technol 14, 1–10 (2011). https://doi.org/10.1007/s10772-010-9082-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-010-9082-0

Keywords

Navigation