research-article

Image Approach to Speech Recognition on CNN

Authors:
Muhammadjon Musaev

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
View Profile

,
Ilyos Khujayorov

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
View Profile

,
Mannon Ochilov

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan

Computer Systems, Tashkent University of Information technologies named after, Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
View Profile

ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent ControlSeptember 2019Article No.: 57Pages 1–6https://doi.org/10.1145/3386164.3389100

Published:06 June 2020Publication History

ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control

Pages 1–6

ABSTRACT

In this paper has been discussed about speech recognition using spectrogram images and deep convolution neural network(CNN) of Uzbek spoken digits. Spectrogram images from speech signal were generated and it were used for deep CNN training. Presented CNN model contains 3 convolution layers and 2 fully connected layers that discriminative features can be divided and estimated of spectrogram images by those layers. In current research period, dataset of Uzbek spoken digits were made and in based on presented CNN model they were trained. Testing results shows that, proposed approach for Uzbek spoken digits classified 100% accuracy.

References

A. Incze, Henrietta-Bernadette Jancsó, Z. Szilagyi, A. Farkas, C. Sulyok. Bird Sound Recognition Using a Convolutional Neural Network. SISY 2018 - IEEE 16th International Symposium on Intelligent Systems and Informatics, Proceedings. 2018, pp.295--300.Google ScholarCross Ref
A.M. Badshah, J. Ahmad, N.Rahim, S.W.Baik. Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 International Conference on Platform Technology and Service, PlatCon 2017 - Proceedings.Google Scholar
Adrian Rosebrock. Deep Learning for Computer Vision with Python Starter Bundle. 1st Edition (1.2.2). PyImageSearch.com. 2017.Google Scholar
Al-Darkazali Mohammed. Image processing methods to segment speech spectrograms for word level recognition. Doctoral thesis (PhD), University of Sussex. (2017).Google Scholar
Andrew Ng, Yan Zhang. Speech Recognition Using Deep Learning Algorithms. Published in 2013.Google Scholar
B.D. Sarma, S.R.M. Prasanna. Acoustic--Phonetic Analysis for Speech Recognition: A Review. IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India). 2018. pp.305--327.Google Scholar
C. Glackin, J. Wall, G. Chollet, N. Dugan, N. Cannings. Convolutional neural networks for phoneme recognition. ICPRAM 2018 - Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods. 2018. pp.190--195.Google ScholarCross Ref
D. Polap, M. Woźniak. Image approach to voice recognition. 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings. 2018. pp.1--7.Google Scholar
Dennis, J., Tran, H. D., & Li, H. Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions. IEEE Signal Processing Letters, 18(2), 130--133. doi:10.1109/lsp.2010.2100380.Google Scholar
Diederik P. Kingma, Jimmy Lei Ba. ADAM: A Method for stochastic optimization. Published as a conference paper at ICLR 2015.Google Scholar
Fisher, William M.; Doddington, George R.; Goudie-Marshall, Kathleen M. (1986). The DARPA Speech Recognition Research Database: Specifications and Status. pp. 93--99.Google Scholar
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevskiy, Ilya Sutskever, Ruslan R. Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15(1):1929--1958. June 2014.Google Scholar
Gulmezoglu, M.B., et al., A novel approach to isolated word recognition. IEEE Transactions on Speech and Audio Processing, 1999. 7(6): p. 620--628.Google Scholar
H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, & H. S. Seung, (2000). Erratum: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789), 947--951. doi:10.1038/35016072.Google Scholar
Ibrahim Patel, Dr. Y. Srinivas Rao. Speech recognition using HMM with MFCC-an analysis using frequency Spectral decomposing technique. Signal & Image Processing: An International Journal (SIPIJ) Vol.1, No.2, December 2010.Google Scholar
J. Ahmad;, M. Fiaz;, S.-i. Kwon;, M. Sodanil;, B. Vo;, and S. W. Baik, "Gender Identification using MFCC for Telephone Applications - A Comparative Study," International Journal of Computer Science and Electronics Engineering, vol. 3, pp. 351- 355, 2015.Google Scholar
J. Baker, L. Deng, J. Glass, S. Khudanpur, Chin hui Lee, N. Morgan, and D. O'Shaughnessy, "Developments and directions in speech recognition and understanding, part 1," Signal Processing Magazine, IEEE, vol. 26, no. 3, pp. 75--80, may 2009.Google ScholarCross Ref
J. Padmanabhan, M.J.J. Premkumar. Machine learning in automatic speech recognition: A survey. IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India).2015. pp. 240--251.Google Scholar
J. Zhang, S. Xiao, H. Zhang, L. Jiang. Isolated word recognition with audio derivation and CNN. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI. 2018, pp. 336--341.Google Scholar
Jaron Collis. "Glossary of Deep Learning: Batch Normalization". medium.com. Retrieved 24 April 2018.Google Scholar
Klára, V., Viktor, I., Krisztina, M.: Voice disorder detection on the basis of continuous speech. In: 5th European Conference of the International Federation for Medical and Biological Engineering. Springer, Berlin (2011).Google Scholar
Lonce Wyse. Audio Spectrogram Representations for Processing with Convolutional Neural Networks. Published 2017 in ArXiv.org.Google Scholar
Longhao Yuan, Jianting Cao. Patients' EEG Data Analysis via Spectrogram Image with a Convolution Neural Network. Conference: International Conference on Intelligent Decision Technologies. DOI: 10.1007/978-3-319-59421-7_2.Google Scholar
M Ahmadi, N J Bailey, B S Hoyle. Phoneme recognition using speech image (spectrogram). Published in IEEE: Proceedings of Third International Conference on Signal Processing (ICSP'96). doi: 10.1109/ICSIGP.1996.567353.Google Scholar
M.M.Musaev, U.A.Berdanov, M.F.Rahimov, Shukurov K.E, "Parallel algorithms for acoustic processing of speech signals" International Conference on Signal and Image Processing (ICSIP 2016). China during August 13--15.Google Scholar
Mark Gales, Steve Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing Vol. 1, No. 3 (2007) 195--304.DOI: 10.1561/2000000004.Google ScholarDigital Library
Mohamed O.M. Khelifa, Yahya Mohamed Elhadj, Yousfi Abdellah, Mostafa Belkasmi. Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology. December 2017, Volume 20, Issue 4, pp 937--949.Google Scholar
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. Convolutional Neural Networks for Speech Recognition. IEEE/ACM Transactions on Audio, speech, and language processing, vol. 22, NO. 10, October 2014.Google Scholar
Q. T. Nguyen et al., "Speech classification using sift features on spectrogram images," Vietnam Journal of Computer Science, vol. 3, no. 4, pp. 247--257, 2016.Google ScholarDigital Library
Rekik, S., Guerchi, D., Selouani, S.A., et al.: Speech steganography using wavelet and Fourier transforms. EURASIP J. Audio Speech Music Process. 2012(1), 20 (2012).Google ScholarCross Ref
S. Chu, S. Narayanan, and C.-C. J. Kuo, "Environmental sound recognition with time--frequency audio features," IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, pp. 1142--1158, 2009.Google ScholarDigital Library
Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv.org > cs > arXiv:1502.03167.Google Scholar
Sukmawati Nur Endah, Satriyo Adhy, Sutikno, Rizky Akbar. Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM). Proceeding of 5th International Seminar on New Paradigm and Innovation on Natural Science and Its Application (5th ISNPINSA), 7-8 October 2015, Semarang.Google Scholar
Tungikar, V.V. and J. Mokashi, Study of Hidden Markov Model for Isolated Word Recognition. SYSTEM, 2016. 4(8).Google Scholar
Venkatesh Boddapati, Andrej Petef, Jim Rasmusson, Lars Lundberg. Classifying environmental sounds using image recognition networks. December 2017. Procedia Computer Science 112:2048--2056. DOI: 10.1016/j.procs.2017.08.250.Google ScholarDigital Library
Vinod Nair, Geoffrey E, Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair. Conference: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21--24, 2010, Haifa, Israel.Google Scholar
Waibel A. H, Hanazawa T, Hinton G, Shikano K, Lang K. "Phoneme Recognition Using Time-Delay Neural Networks.", IEEE Trans. on ASSP, Vol. ASSP-37, No. 3, March 1989.Google ScholarCross Ref
Wang, S., Chen, X., Cai, G., et al.: Matching demodulation transform and synchro squeezing in time-frequency analysis. IEEE Trans. Signal Process. 62(1), 69--84 (2013).Google Scholar
X. Glorot, A. Bordes, Y. Bengio. Deep Sparse Rectifier Neural Networks. Conference: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). 2015.Google Scholar
Yingying Li, Siyuan Pi, Nanfeng Xiao. Speech Recognition Method Based on Spectrogram. Proceedings of the International Conference on Mechatronics and Intelligent Robotics (ICMIR2017) - Volume 1.doi:10.1007/978-3-319-70990-1.Google Scholar

Index Terms

Image Approach to Speech Recognition on CNN
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis

Recommendations

Deep Learning for Robust Speech Command Recognition Using Convolutional Neural Networks (CNN)
IC3INA '22: Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications

The rapid development of mobile devices has made human-computer interaction through voice increasingly popular and effective. This condition is made possible by the rapid growth of Automatic Speech Recognition (ASR) technologies. ASR can convert human ...
Read More
Characterizing and detecting spontaneous speech: Application to speaker role recognition

Processing spontaneous speech is one of the many challenges that automatic speech recognition systems have to deal with. The main characteristics of this kind of speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have ...
Read More
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control
September 2019
397 pages
ISBN:9781450376617
DOI:10.1145/3386164

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Convolutional Neural Network
Spectrogram Image
Speech Classification
Speech Recognition
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ISCSIC 2019 Paper Acceptance Rate77of152submissions,51%Overall Acceptance Rate192of401submissions,48%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 284
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Image Approach to Speech Recognition on CNN

ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Learning for Robust Speech Command Recognition Using Convolutional Neural Networks (CNN)

Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Image Approach to Speech Recognition on CNN

ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Learning for Robust Speech Command Recognition Using Convolutional Neural Networks (CNN)

Characterizing and detecting spontaneous speech: Application to speaker role recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media