Speech Based Interaction System Using DNN and i-vector

Shanmugapriya, P.; Mohan, V.; Yogapriya, S.; Venkataramani, Y.

doi:10.1007/978-981-13-9181-1_41

P. Shanmugapriya⁹,
V. Mohan⁹,
S. Yogapriya⁹ &
…
Y. Venkataramani⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1035))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

644 Accesses

Abstract

In this paper, a speech based interaction system using Deep Neural Network (DNN) and i-vector based DNN approaches are proposed. In DNN based approach, Mel-frequency cepstral coefficients (MFCC) features are extracted from the speech signal and it is directly given to DNN. In i-vector based DNN approach, DNN is trained using i-vector which is formed from Gaussian Mixture Model-Universal Background Model (GMM-UBM). For both approaches, the performance of the system is obtained in the form of confusion matrix and compared. In addition to that, GMM-UBM based approach is also compared with the proposed work. MFCC is used for representing the characteristics of the speech and auto encoder is used for classification purpose. It uses stacked two auto encoder layers and one soft max layer. The proposed system achieves improvement in performance when increasing the number of hidden units and the input dimension of MFCC features. The proposed work is to develop ASR system for isolated words in Tamil language and the experiments are conducted for speaker independent case. The results demonstrated that i-vector based DNN approach provides 100% recognition rate for 17 classes with 20 hidden units in each of the 2 layers. The dimension of i-vector is 100.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rehmam, B., Halim, Z., Abbas, G., Muhammad, T.: Artificial neural network-based speech recognition using DWT analysis applied on isolated words from oriental languages. Malays. J. Comput. Sci. 28(3), 242–262 (2015)
Article Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Article Google Scholar
Iswarya, P., Radha, V.: Speaker independent isolated Tamil words recognition system using different classifiers. Int. J. Comput. Sci. Eng. Technol. (IJCSET), 6 (2015)
Google Scholar
Sigappi, A.N., Palanivel, S.: Spoken word recognition strategy for Tamil language. Int. J. Comput. Sci. 9, 227–233 (2012)
Google Scholar
Vimala, C., Radha, V.: Isolated speech recognition system for Tamil language using statistical pattern matching and machine learning techniques. J. Eng. Sci. Technol. 10(5), 617–632 (2015)
Google Scholar
Patil, U.G., Shirbahadurkar, S.D., Paithane, A.N.: Automatic speech recognition of isolated words in Hindi language using MFCC. In: International Conference on Computing, Analytics and Security Trends (CAST). IEEE, May 2017
Google Scholar
Manjutha, M., Gracy, J., Subashini, P., Krishnaveni, M.: Automated speech recognition system–a literature review. Int. J. Eng. Trends Appl. (IJETA) 4(2) (2017)
Google Scholar
Harisha, S.B., Amarappa, S., Sathyanarayana, S.V.: Automatic speech recognition-a literature survey on Indian languages and ground work for isolated Kannada digit recognition using MFCC and ANN. Int. J. Electron. Comput. Sci. Eng. (IJCSE) 4(1), 91–105 (2015)
Google Scholar
Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) ICDECT, vol. 468, pp. 9–17. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-1675-2_2
Chapter Google Scholar
Dhonde, S.B., Jagade, S.M.: Mel-frequency cepstral coefficients for speaker recognition: a review. Int. J. Adv. Eng. Res. Dev. 2 (2015)
Google Scholar
Dhonde, S.B., Jagade, S.M.: Feature extraction techniques in speaker recognition: a review. Int. J. Recent Technol. Mech. Electr. Eng. (IJRMEE) 2(5), 104–106 (2015)
Google Scholar
Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Methods Based Percept. 1, 1–4 (2013)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report Crim-06/08-13 (2005). http://www.crim.ca/perso/patrick.kenny/
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: InterSpeech, pp. 857–860 (2011)
Google Scholar
Mukherjee, H., Obaidullah, Sk.Md., Santosh, K.C., Phadikar, S., Roy, K.: Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int. J. Speech Technol. 1–8 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, Saranathan College of Engineering, Trichy, Tamilnadu, India
P. Shanmugapriya, V. Mohan, S. Yogapriya & Y. Venkataramani

Authors

P. Shanmugapriya
View author publications
You can also search for this author in PubMed Google Scholar
V. Mohan
View author publications
You can also search for this author in PubMed Google Scholar
S. Yogapriya
View author publications
You can also search for this author in PubMed Google Scholar
Y. Venkataramani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Shanmugapriya .

Editor information

Editors and Affiliations

Department of Computer Science, University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Solapur University, Solapur, India
Ravindra S. Hegadi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shanmugapriya, P., Mohan, V., Yogapriya, S., Venkataramani, Y. (2019). Speech Based Interaction System Using DNN and i-vector. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1035. Springer, Singapore. https://doi.org/10.1007/978-981-13-9181-1_41

Download citation

DOI: https://doi.org/10.1007/978-981-13-9181-1_41
Published: 20 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9180-4
Online ISBN: 978-981-13-9181-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics