Skip to main content
Log in

A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Nowadays, spoken digit recognition is widely used for various applications. It is a technique used in automatic speech recognition to identify and classify the language spoken by a particular speaker. Several techniques and methods for spoken digit recognition have been proposed. Most of the method includes artificial neural networks, Gaussian mixture models and neuro-fuzzy approach. However, there needs to be a more efficient algorithm that gives more satisfactory and better results. Therefore, to enhance the results in terms of accuracy and running time, a hybrid adaptive neuro-fuzzy approach called KANFIS-SCG is proposed in this paper. This approach combines the Adaptive Network-Based Fuzzy Inference System (ANFIS) and a fast supervised learning algorithm called the Scaled Conjugate Gradient (SCG). In addition, the K-means clustering algorithm is used to reduce the number of fuzzy inference rules. The features are extracted from speech samples using the Mel Frequency Cepstral Coefficient (MFCC) algorithm. Furthermore, the covariance matrix provides more precise information about the features. The experiment was carried out in two languages, namely Hindi and English, and the database was prepared for both languages where utterances were digits from zero to nine. The proposed method KANFIS-SCG gives average recognition rates of 99.12% and 98.54% for Hindi and English, respectively, for the speaker-dependent system, which is better than the existing ANFIS method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The dataset used in the experiment will be available upon reasonable request.

References

  • Albadr, M. A. A., Tiun, S., AL-Dhief, F. T., et al. (2018). Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS ONE, 13(4), 1–27.

    Article  Google Scholar 

  • Anderberg, M. R. (1973). Cluster analysis for applications. Academic Press.

    Google Scholar 

  • Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. (IJCSIS) International Journal of Computer Science and Information Security, 6(3), 181–205.

    Google Scholar 

  • Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.

    Article  Google Scholar 

  • Davis, K. H., Biddulph, R., & Balashek, S. (1952). Automatic recognition of spoken digits. The Journal of the Acoustic Society of America, 24(6), 637–642.

    Article  Google Scholar 

  • El Ouahabi, S., Atounti, M., & Bellouki, M. (2020). Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using hidden Markov model toolkit. International Journal of Speech Technology, 23, 861–871.

    Article  Google Scholar 

  • Gevaert, W. J. R., Tsenov, G. T., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.

    Article  Google Scholar 

  • Gupta, S., Jaafar, J., Ahmad, W. F. W., et al. (2013). Feature extraction using mfcc. Signal & Image Processing: An International Journal, 4(4), 101–108.

    Google Scholar 

  • Hussain, I., & Roy, P. (2015). A survey of classification techniques using fuzzy neural networks for speech recognition. In IWCIA (special track on applications) (pp. 61–74). Indian Statistical Institute.

    Google Scholar 

  • Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall Inc.

    Google Scholar 

  • Jang, J. S. R. (1993). Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665–685.

    Article  Google Scholar 

  • Kanungo, T., Mount, D. M., Netanyahu, N. S., et al. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(7), 881–892.

    Article  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.

    Book  Google Scholar 

  • Kumar, P., Biswas, A., Mishra, A. N., et al. (2010). Spoken language identification using hybrid feature extraction methods. Journal of Telecommunication, 1, 11–15.

    Google Scholar 

  • Mehrotra, K., Mohan, C. K., & Ranka, S. (1997). Elements of artificial neural networks. MIT Press.

    Google Scholar 

  • Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6, 525–533.

    Article  Google Scholar 

  • Oruh, J., Viriri, S., et al. (2022). Deep learning-based classification of spoken English digits. Computational Intelligence and Neuroscience 2022.

  • Pandey, B., Ranjan, A., Kumar, R., et al. (2010). Multilingual speaker recognition using Anfis. In 2010 2nd international conference on signal processing systems (pp. V3–714). IEEE.

  • Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. In 1997 IEEE workshop on automatic speech recognition and understanding proceedings (pp. 501–510). IEEE.

  • Roy, P., & Das, P. K. (2010). Review of language identification techniques. In 2010 IEEE international conference on computational intelligence and computing research (pp. 1–4). IEEE.

  • Roy, P., & Das, P. K. (2013). A hybrid VQ-GMM approach for identifying Indian languages. International Journal of Speech Technology, 16(1), 33–39.

    Article  Google Scholar 

  • Salmela, P., Lehtokangas, M., & Saarinen, J. (1999). Neural network based digit recognition system for voice dialling in noisy environments. Information Sciences, 121(3–4), 171–199.

    Article  Google Scholar 

  • Sharan, R. V. (2020). Spoken digit recognition using wavelet scalogram and convolutional neural networks. In 2020 IEEE recent advances in intelligent computational systems (RAICS) (pp. 101–105). IEEE.

  • Sharmin, R., Rahut, S. K., & Huq, M. R. (2020). Bengali spoken digit classification: A deep learning approach using convolutional neural network. Procedia Computer Science, 171, 1381–1388.

    Article  Google Scholar 

  • Silarbi, S., Abderrahmane, B., Benyettou, A., et al. (2014). Phonetic classification by adaptive network based fuzzy inference system and subtractive clustering. In First international conference on computer science & information technology (CS & IT) conference proceedings.

  • Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irshed Hussain.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, I., Roy, P. A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition. Int J Speech Technol 26, 825–832 (2023). https://doi.org/10.1007/s10772-023-10057-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-023-10057-6

Keywords

Navigation