Skip to main content

From Bottom to Top: A Coordinated Feature Representation Method for Speech Recognition

  • Conference paper
  • First Online:
  • 2204 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12666))

Abstract

This article introduces a novel coordinated representation method, termed MFCC aided sparse representation (MSR), for speech recognition. The proposed MSR combines a top level sparse representation feature with the conventional MFCC, i.e., a bottom level feature of speech, so that complex information of various hidden attributes in the speech can be contained. A neural network architecture with attention mechanism has also been designed to validate the effective of the proposed MSR for speech recognition. Experiments on the TIMIT database show that significant performance improvements, in terms of recognition accuracy, can be obtained by the proposed MSR compared with the scenarios that adopt the MFCC or the sparse representation solely.

This work was supported in part by the NSFC under Grant 61973088, and in part by the NSF of Guangdong under Grant 2019A1515011371.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted Boltzmann machines. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011)

    Google Scholar 

  2. Palaz, D., Collobert, R., Doss, M.M.: Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks. Computer Science (2013)

    Google Scholar 

  3. Palaz, D., Magimai.-Doss, M., Collobert, R.: Convolutional neural networks-based continuous speech recognition using raw speech signal. In: IEEE International Conference on Acoustics (2015)

    Google Scholar 

  4. Kim, C., Stern, R.M.: Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4574–4577 (2010). https://doi.org/10.1109/ICASSP.2010.5495570

  5. Sailor, H.B., Patil, H.A.: Novel unsupervised auditory filter bank learning using convolutional RBM for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. PP(12), 1 (2016)

    Google Scholar 

  6. Kanevsky, D., Nahamoo, D., Ramabhadran, B., Sainath, T.N.: Sparse representation features for speech recognition (2012)

    Google Scholar 

  7. Sharma, P., Abrol, V., Dileep, A.D., Sao, A.K.: Sparse coding based features for speech units classification. Comput. Speech Lang. 47, 333–350 (2017)

    Article  Google Scholar 

  8. Sharma, P., Abrol, V., Sao, A.K.: Deep sparse representation based features for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. PP(11), 1 (2017)

    Google Scholar 

  9. Tripathi, K., Rao, K.S.: Analysis of sparse representation based feature on speech mode classification. In: INTERSPEECH (2018)

    Google Scholar 

  10. Tripathi, K., Rao, K.S.: Discriminative sparse representation for speech mode classification. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 655–659 (2018). https://doi.org/10.1109/ICACCI.2018.8554644

  11. Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)

    Article  Google Scholar 

  12. Chen, S.S., Saunders, D.M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

    Article  MathSciNet  Google Scholar 

  13. Yılmaz, E., Gemmeke, J.F., Hamme, H.V.: Noise-robust speech recognition with exemplar-based sparse representations using alpha-beta divergence. In: IEEE International Conference on Acoustics (2014)

    Google Scholar 

  14. Gemmeke, J.F., Virtanen, T., Hurmalainen, A.: Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(7), 2067–2080 (2011)

    Article  Google Scholar 

  15. Smit, W.J.: Sparse coding for speech recognition. In: IEEE International Conference on Acoustics Speech & Signal Processing (2008)

    Google Scholar 

  16. Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3453 (2017). https://doi.org/10.1109/CVPR.2017.367

  17. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016)

    Google Scholar 

  18. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom TIMIT

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, L., Zhang, J. (2021). From Bottom to Top: A Coordinated Feature Representation Method for Speech Recognition. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68780-9_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68779-3

  • Online ISBN: 978-3-030-68780-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics