Abstract
In this paper, a speech based interaction system using Deep Neural Network (DNN) and i-vector based DNN approaches are proposed. In DNN based approach, Mel-frequency cepstral coefficients (MFCC) features are extracted from the speech signal and it is directly given to DNN. In i-vector based DNN approach, DNN is trained using i-vector which is formed from Gaussian Mixture Model-Universal Background Model (GMM-UBM). For both approaches, the performance of the system is obtained in the form of confusion matrix and compared. In addition to that, GMM-UBM based approach is also compared with the proposed work. MFCC is used for representing the characteristics of the speech and auto encoder is used for classification purpose. It uses stacked two auto encoder layers and one soft max layer. The proposed system achieves improvement in performance when increasing the number of hidden units and the input dimension of MFCC features. The proposed work is to develop ASR system for isolated words in Tamil language and the experiments are conducted for speaker independent case. The results demonstrated that i-vector based DNN approach provides 100% recognition rate for 17 classes with 20 hidden units in each of the 2 layers. The dimension of i-vector is 100.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rehmam, B., Halim, Z., Abbas, G., Muhammad, T.: Artificial neural network-based speech recognition using DWT analysis applied on isolated words from oriental languages. Malays. J. Comput. Sci. 28(3), 242–262 (2015)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Iswarya, P., Radha, V.: Speaker independent isolated Tamil words recognition system using different classifiers. Int. J. Comput. Sci. Eng. Technol. (IJCSET), 6 (2015)
Sigappi, A.N., Palanivel, S.: Spoken word recognition strategy for Tamil language. Int. J. Comput. Sci. 9, 227–233 (2012)
Vimala, C., Radha, V.: Isolated speech recognition system for Tamil language using statistical pattern matching and machine learning techniques. J. Eng. Sci. Technol. 10(5), 617–632 (2015)
Patil, U.G., Shirbahadurkar, S.D., Paithane, A.N.: Automatic speech recognition of isolated words in Hindi language using MFCC. In: International Conference on Computing, Analytics and Security Trends (CAST). IEEE, May 2017
Manjutha, M., Gracy, J., Subashini, P., Krishnaveni, M.: Automated speech recognition system–a literature review. Int. J. Eng. Trends Appl. (IJETA) 4(2) (2017)
Harisha, S.B., Amarappa, S., Sathyanarayana, S.V.: Automatic speech recognition-a literature survey on Indian languages and ground work for isolated Kannada digit recognition using MFCC and ANN. Int. J. Electron. Comput. Sci. Eng. (IJCSE) 4(1), 91–105 (2015)
Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) ICDECT, vol. 468, pp. 9–17. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-1675-2_2
Dhonde, S.B., Jagade, S.M.: Mel-frequency cepstral coefficients for speaker recognition: a review. Int. J. Adv. Eng. Res. Dev. 2 (2015)
Dhonde, S.B., Jagade, S.M.: Feature extraction techniques in speaker recognition: a review. Int. J. Recent Technol. Mech. Electr. Eng. (IJRMEE) 2(5), 104–106 (2015)
Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Methods Based Percept. 1, 1–4 (2013)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report Crim-06/08-13 (2005). http://www.crim.ca/perso/patrick.kenny/
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: InterSpeech, pp. 857–860 (2011)
Mukherjee, H., Obaidullah, Sk.Md., Santosh, K.C., Phadikar, S., Roy, K.: Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int. J. Speech Technol. 1–8 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shanmugapriya, P., Mohan, V., Yogapriya, S., Venkataramani, Y. (2019). Speech Based Interaction System Using DNN and i-vector. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1035. Springer, Singapore. https://doi.org/10.1007/978-981-13-9181-1_41
Download citation
DOI: https://doi.org/10.1007/978-981-13-9181-1_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9180-4
Online ISBN: 978-981-13-9181-1
eBook Packages: Computer ScienceComputer Science (R0)