Abstract
Nowadays, spoken digit recognition is widely used for various applications. It is a technique used in automatic speech recognition to identify and classify the language spoken by a particular speaker. Several techniques and methods for spoken digit recognition have been proposed. Most of the method includes artificial neural networks, Gaussian mixture models and neuro-fuzzy approach. However, there needs to be a more efficient algorithm that gives more satisfactory and better results. Therefore, to enhance the results in terms of accuracy and running time, a hybrid adaptive neuro-fuzzy approach called KANFIS-SCG is proposed in this paper. This approach combines the Adaptive Network-Based Fuzzy Inference System (ANFIS) and a fast supervised learning algorithm called the Scaled Conjugate Gradient (SCG). In addition, the K-means clustering algorithm is used to reduce the number of fuzzy inference rules. The features are extracted from speech samples using the Mel Frequency Cepstral Coefficient (MFCC) algorithm. Furthermore, the covariance matrix provides more precise information about the features. The experiment was carried out in two languages, namely Hindi and English, and the database was prepared for both languages where utterances were digits from zero to nine. The proposed method KANFIS-SCG gives average recognition rates of 99.12% and 98.54% for Hindi and English, respectively, for the speaker-dependent system, which is better than the existing ANFIS method.
Similar content being viewed by others
Data availability
The dataset used in the experiment will be available upon reasonable request.
References
Albadr, M. A. A., Tiun, S., AL-Dhief, F. T., et al. (2018). Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS ONE, 13(4), 1–27.
Anderberg, M. R. (1973). Cluster analysis for applications. Academic Press.
Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. (IJCSIS) International Journal of Computer Science and Information Security, 6(3), 181–205.
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.
Davis, K. H., Biddulph, R., & Balashek, S. (1952). Automatic recognition of spoken digits. The Journal of the Acoustic Society of America, 24(6), 637–642.
El Ouahabi, S., Atounti, M., & Bellouki, M. (2020). Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using hidden Markov model toolkit. International Journal of Speech Technology, 23, 861–871.
Gevaert, W. J. R., Tsenov, G. T., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.
Gupta, S., Jaafar, J., Ahmad, W. F. W., et al. (2013). Feature extraction using mfcc. Signal & Image Processing: An International Journal, 4(4), 101–108.
Hussain, I., & Roy, P. (2015). A survey of classification techniques using fuzzy neural networks for speech recognition. In IWCIA (special track on applications) (pp. 61–74). Indian Statistical Institute.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall Inc.
Jang, J. S. R. (1993). Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665–685.
Kanungo, T., Mount, D. M., Netanyahu, N. S., et al. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(7), 881–892.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
Kumar, P., Biswas, A., Mishra, A. N., et al. (2010). Spoken language identification using hybrid feature extraction methods. Journal of Telecommunication, 1, 11–15.
Mehrotra, K., Mohan, C. K., & Ranka, S. (1997). Elements of artificial neural networks. MIT Press.
Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6, 525–533.
Oruh, J., Viriri, S., et al. (2022). Deep learning-based classification of spoken English digits. Computational Intelligence and Neuroscience 2022.
Pandey, B., Ranjan, A., Kumar, R., et al. (2010). Multilingual speaker recognition using Anfis. In 2010 2nd international conference on signal processing systems (pp. V3–714). IEEE.
Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. In 1997 IEEE workshop on automatic speech recognition and understanding proceedings (pp. 501–510). IEEE.
Roy, P., & Das, P. K. (2010). Review of language identification techniques. In 2010 IEEE international conference on computational intelligence and computing research (pp. 1–4). IEEE.
Roy, P., & Das, P. K. (2013). A hybrid VQ-GMM approach for identifying Indian languages. International Journal of Speech Technology, 16(1), 33–39.
Salmela, P., Lehtokangas, M., & Saarinen, J. (1999). Neural network based digit recognition system for voice dialling in noisy environments. Information Sciences, 121(3–4), 171–199.
Sharan, R. V. (2020). Spoken digit recognition using wavelet scalogram and convolutional neural networks. In 2020 IEEE recent advances in intelligent computational systems (RAICS) (pp. 101–105). IEEE.
Sharmin, R., Rahut, S. K., & Huq, M. R. (2020). Bengali spoken digit classification: A deep learning approach using convolutional neural network. Procedia Computer Science, 171, 1381–1388.
Silarbi, S., Abderrahmane, B., Benyettou, A., et al. (2014). Phonetic classification by adaptive network based fuzzy inference system and subtractive clustering. In First international conference on computer science & information technology (CS & IT) conference proceedings.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hussain, I., Roy, P. A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition. Int J Speech Technol 26, 825–832 (2023). https://doi.org/10.1007/s10772-023-10057-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-023-10057-6