Skip to main content

Speech Based Interaction System Using DNN and i-vector

  • Conference paper
  • First Online:
Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1035))

  • 644 Accesses

Abstract

In this paper, a speech based interaction system using Deep Neural Network (DNN) and i-vector based DNN approaches are proposed. In DNN based approach, Mel-frequency cepstral coefficients (MFCC) features are extracted from the speech signal and it is directly given to DNN. In i-vector based DNN approach, DNN is trained using i-vector which is formed from Gaussian Mixture Model-Universal Background Model (GMM-UBM). For both approaches, the performance of the system is obtained in the form of confusion matrix and compared. In addition to that, GMM-UBM based approach is also compared with the proposed work. MFCC is used for representing the characteristics of the speech and auto encoder is used for classification purpose. It uses stacked two auto encoder layers and one soft max layer. The proposed system achieves improvement in performance when increasing the number of hidden units and the input dimension of MFCC features. The proposed work is to develop ASR system for isolated words in Tamil language and the experiments are conducted for speaker independent case. The results demonstrated that i-vector based DNN approach provides 100% recognition rate for 17 classes with 20 hidden units in each of the 2 layers. The dimension of i-vector is 100.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rehmam, B., Halim, Z., Abbas, G., Muhammad, T.: Artificial neural network-based speech recognition using DWT analysis applied on isolated words from oriental languages. Malays. J. Comput. Sci. 28(3), 242–262 (2015)

    Article  Google Scholar 

  2. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  3. Iswarya, P., Radha, V.: Speaker independent isolated Tamil words recognition system using different classifiers. Int. J. Comput. Sci. Eng. Technol. (IJCSET), 6 (2015)

    Google Scholar 

  4. Sigappi, A.N., Palanivel, S.: Spoken word recognition strategy for Tamil language. Int. J. Comput. Sci. 9, 227–233 (2012)

    Google Scholar 

  5. Vimala, C., Radha, V.: Isolated speech recognition system for Tamil language using statistical pattern matching and machine learning techniques. J. Eng. Sci. Technol. 10(5), 617–632 (2015)

    Google Scholar 

  6. Patil, U.G., Shirbahadurkar, S.D., Paithane, A.N.: Automatic speech recognition of isolated words in Hindi language using MFCC. In: International Conference on Computing, Analytics and Security Trends (CAST). IEEE, May 2017

    Google Scholar 

  7. Manjutha, M., Gracy, J., Subashini, P., Krishnaveni, M.: Automated speech recognition system–a literature review. Int. J. Eng. Trends Appl. (IJETA) 4(2) (2017)

    Google Scholar 

  8. Harisha, S.B., Amarappa, S., Sathyanarayana, S.V.: Automatic speech recognition-a literature survey on Indian languages and ground work for isolated Kannada digit recognition using MFCC and ANN. Int. J. Electron. Comput. Sci. Eng. (IJCSE) 4(1), 91–105 (2015)

    Google Scholar 

  9. Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) ICDECT, vol. 468, pp. 9–17. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-1675-2_2

    Chapter  Google Scholar 

  10. Dhonde, S.B., Jagade, S.M.: Mel-frequency cepstral coefficients for speaker recognition: a review. Int. J. Adv. Eng. Res. Dev. 2 (2015)

    Google Scholar 

  11. Dhonde, S.B., Jagade, S.M.: Feature extraction techniques in speaker recognition: a review. Int. J. Recent Technol. Mech. Electr. Eng. (IJRMEE) 2(5), 104–106 (2015)

    Google Scholar 

  12. Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Methods Based Percept. 1, 1–4 (2013)

    Google Scholar 

  13. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  14. Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report Crim-06/08-13 (2005). http://www.crim.ca/perso/patrick.kenny/

  15. Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: InterSpeech, pp. 857–860 (2011)

    Google Scholar 

  16. Mukherjee, H., Obaidullah, Sk.Md., Santosh, K.C., Phadikar, S., Roy, K.: Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int. J. Speech Technol. 1–8 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Shanmugapriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shanmugapriya, P., Mohan, V., Yogapriya, S., Venkataramani, Y. (2019). Speech Based Interaction System Using DNN and i-vector. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1035. Springer, Singapore. https://doi.org/10.1007/978-981-13-9181-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9181-1_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9180-4

  • Online ISBN: 978-981-13-9181-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics