A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

Hussain, Irshed; Roy, Pinki

doi:10.1007/s10772-023-10057-6

A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

Published: 31 October 2023

Volume 26, pages 825–832, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

49 Accesses
Explore all metrics

Abstract

Nowadays, spoken digit recognition is widely used for various applications. It is a technique used in automatic speech recognition to identify and classify the language spoken by a particular speaker. Several techniques and methods for spoken digit recognition have been proposed. Most of the method includes artificial neural networks, Gaussian mixture models and neuro-fuzzy approach. However, there needs to be a more efficient algorithm that gives more satisfactory and better results. Therefore, to enhance the results in terms of accuracy and running time, a hybrid adaptive neuro-fuzzy approach called KANFIS-SCG is proposed in this paper. This approach combines the Adaptive Network-Based Fuzzy Inference System (ANFIS) and a fast supervised learning algorithm called the Scaled Conjugate Gradient (SCG). In addition, the K-means clustering algorithm is used to reduce the number of fuzzy inference rules. The features are extracted from speech samples using the Mel Frequency Cepstral Coefficient (MFCC) algorithm. Furthermore, the covariance matrix provides more precise information about the features. The experiment was carried out in two languages, namely Hindi and English, and the database was prepared for both languages where utterances were digits from zero to nine. The proposed method KANFIS-SCG gives average recognition rates of 99.12% and 98.54% for Hindi and English, respectively, for the speaker-dependent system, which is better than the existing ANFIS method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imagined speech classification exploiting EEG power spectrum features

Article 18 April 2024

Analysis and Evaluation of Feature Selection and Feature Extraction Methods

Article Open access 20 September 2023

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Data availability

The dataset used in the experiment will be available upon reasonable request.

References

Albadr, M. A. A., Tiun, S., AL-Dhief, F. T., et al. (2018). Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS ONE, 13(4), 1–27.
Article Google Scholar
Anderberg, M. R. (1973). Cluster analysis for applications. Academic Press.
Google Scholar
Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. (IJCSIS) International Journal of Computer Science and Information Security, 6(3), 181–205.
Google Scholar
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.
Article Google Scholar
Davis, K. H., Biddulph, R., & Balashek, S. (1952). Automatic recognition of spoken digits. The Journal of the Acoustic Society of America, 24(6), 637–642.
Article Google Scholar
El Ouahabi, S., Atounti, M., & Bellouki, M. (2020). Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using hidden Markov model toolkit. International Journal of Speech Technology, 23, 861–871.
Article Google Scholar
Gevaert, W. J. R., Tsenov, G. T., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.
Article Google Scholar
Gupta, S., Jaafar, J., Ahmad, W. F. W., et al. (2013). Feature extraction using mfcc. Signal & Image Processing: An International Journal, 4(4), 101–108.
Google Scholar
Hussain, I., & Roy, P. (2015). A survey of classification techniques using fuzzy neural networks for speech recognition. In IWCIA (special track on applications) (pp. 61–74). Indian Statistical Institute.
Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall Inc.
Google Scholar
Jang, J. S. R. (1993). Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665–685.
Article Google Scholar
Kanungo, T., Mount, D. M., Netanyahu, N. S., et al. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(7), 881–892.
Article Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
Book Google Scholar
Kumar, P., Biswas, A., Mishra, A. N., et al. (2010). Spoken language identification using hybrid feature extraction methods. Journal of Telecommunication, 1, 11–15.
Google Scholar
Mehrotra, K., Mohan, C. K., & Ranka, S. (1997). Elements of artificial neural networks. MIT Press.
Google Scholar
Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6, 525–533.
Article Google Scholar
Oruh, J., Viriri, S., et al. (2022). Deep learning-based classification of spoken English digits. Computational Intelligence and Neuroscience 2022.
Pandey, B., Ranjan, A., Kumar, R., et al. (2010). Multilingual speaker recognition using Anfis. In 2010 2nd international conference on signal processing systems (pp. V3–714). IEEE.
Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. In 1997 IEEE workshop on automatic speech recognition and understanding proceedings (pp. 501–510). IEEE.
Roy, P., & Das, P. K. (2010). Review of language identification techniques. In 2010 IEEE international conference on computational intelligence and computing research (pp. 1–4). IEEE.
Roy, P., & Das, P. K. (2013). A hybrid VQ-GMM approach for identifying Indian languages. International Journal of Speech Technology, 16(1), 33–39.
Article Google Scholar
Salmela, P., Lehtokangas, M., & Saarinen, J. (1999). Neural network based digit recognition system for voice dialling in noisy environments. Information Sciences, 121(3–4), 171–199.
Article Google Scholar
Sharan, R. V. (2020). Spoken digit recognition using wavelet scalogram and convolutional neural networks. In 2020 IEEE recent advances in intelligent computational systems (RAICS) (pp. 101–105). IEEE.
Sharmin, R., Rahut, S. K., & Huq, M. R. (2020). Bengali spoken digit classification: A deep learning approach using convolutional neural network. Procedia Computer Science, 171, 1381–1388.
Article Google Scholar
Silarbi, S., Abderrahmane, B., Benyettou, A., et al. (2014). Phonetic classification by adaptive network based fuzzy inference system and subtractive clustering. In First international conference on computer science & information technology (CS & IT) conference proceedings.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Article MathSciNet Google Scholar

Download references

Author information

Irshed Hussain and Pinki Roy contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Information Technology, Siksha ’O’ Anusandhan (Deemed to be University), Jagamara, Bhubaneswar, Odisha, 751030, India
Irshed Hussain
Department of Computer Science and Engineering, National Institute of Technology Silchar, Silchar, Assam, 788010, India
Pinki Roy

Authors

Irshed Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Pinki Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irshed Hussain.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hussain, I., Roy, P. A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition. Int J Speech Technol 26, 825–832 (2023). https://doi.org/10.1007/s10772-023-10057-6

Download citation

Received: 30 May 2023
Accepted: 03 October 2023
Published: 31 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10772-023-10057-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

Abstract

Access this article

Similar content being viewed by others

Imagined speech classification exploiting EEG power spectrum features

Analysis and Evaluation of Feature Selection and Feature Extraction Methods

Databases, features and classifiers for speech emotion recognition: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

Abstract

Access this article

Similar content being viewed by others

Imagined speech classification exploiting EEG power spectrum features

Analysis and Evaluation of Feature Selection and Feature Extraction Methods

Databases, features and classifiers for speech emotion recognition: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation