Skip to main content

Effects of Architecture Choices on Sparse Coding in Speech Recognition

  • Conference paper
Artificial Neural Networks and Machine Learning – ICANN 2012 (ICANN 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7552))

Included in the following conference series:

  • 4126 Accesses

Abstract

A common technique in visual object recognition is to use a sparse encoding of low-level input with a feature dictionary followed by a spatial pooling over local neighbourhoods. While some methods stack these in alternating layers within hierarchies, using these two stages alone can also produce state-of-the-art results. Following from vision, this framework is moving in to speech and audio processing tasks. We investigate the effect of architectural choices when applied to a spoken digit recognition task. We find that the unsupervised learning of features has a negligible effect on the classification, with the number of and size of the features being a greater determinant for recognition. Finally, we show that, given an optimised architecture, sparse coding performs comparably with Hidden Markov Models (HMMs) and outperforms K-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shamma, S.: On the role of space and time in auditory processing. Trends Cogn. Sci. 5(8), 340–348 (2001)

    Article  Google Scholar 

  2. Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)

    Article  Google Scholar 

  3. Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2(11), 1019–1025 (1999)

    Article  Google Scholar 

  4. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  5. Labusch, K., Barth, E., Martinetz, T.: Simple method for high performance digit recognition based on sparse coding. IEEE Trans. Neural Netw. 19(11), 1985–1989 (2008)

    Article  Google Scholar 

  6. Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: International Conference on Computer Vision (2009)

    Google Scholar 

  7. Kleinschmidt, M.: Methods for capturing spectro-temporal modulations in automatic speech recognition. Acta Acust. Acust. 88(3), 416–422 (2002)

    Google Scholar 

  8. Heckmann, M., Domont, X., Joublin, F., Goerick, C.: A hierarchical framework for spectro-temporal feature extraction. Speech Communication 53(5), 736–752 (2011)

    Article  Google Scholar 

  9. Cho, Y., Choi, S.: Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Lett. 26(9), 1327–1336 (2005)

    Article  Google Scholar 

  10. Henaff, M., Jarrett, K., Kavukcuoglu, K., LeCun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of International Symposium on Music Information Retrieval (2011)

    Google Scholar 

  11. Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, vol. 14 (2011)

    Google Scholar 

  12. Coates, A., Ng, A.: The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the Twenty-Eighth International Conference on Machine Learning (2010)

    Google Scholar 

  13. Saxe, A., Koh, P., Chen, Z., Bhand, M., Suresh, B., Ng, A.: On random weights and unsupervised feature learning. In: Twenty-Eighth International Conference on Machine Learning (2011)

    Google Scholar 

  14. Tošić, I., Frossard, P.: Dictionary learning: What is the right representation for my signal? IEEE Signal Processing Magazine 28(2), 27–38 (2011)

    Article  Google Scholar 

  15. Pati, Y., Rezaiifar, R., Krishnaprasad, P.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: Proc. Asilomar Conf. Signals, Syst., Comput., vol. 1, pp. 40–44. Pacific Grove, CA (1993)

    Google Scholar 

  16. van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Scherer, D., Müller, A., Behnke, S.: Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010, Part III. LNCS, vol. 6354, pp. 92–101. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Pedregosa, J., et al.: Scikit-learn: Machine Learning in Python. JMLR 12, 2825–2830 (2011), Software, http://scikit-learn.org/stable/

  19. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A Library for Large Linear Classification, Software, http://www.csie.ntu.edu.tw/~cjlin/liblinear

  20. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm

  21. Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. of ISCA ASR 2000 Workshop, Paris, France, pp. 181–188 (2000)

    Google Scholar 

  22. Plannerer, B.: Mel-Spectral Toolbox (2004), http://www.speech-recognition.de/matlab-examples.html

  23. Aharon, M., Elad, M., Bruckstein, A.M.: K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54(11), 4311–4322 (2006)

    Article  Google Scholar 

  24. Rubenstein, R.: KSVD-Box v13, http://www.cs.technion.ac.il/~ronrubin/software.html

  25. Li, Y., Osher, S.: Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging 3(3), 487–503 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

O’Donnell, F., Triefenbach, F., Martens, JP., Schrauwen, B. (2012). Effects of Architecture Choices on Sparse Coding in Speech Recognition. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33269-2_79

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33269-2_79

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33268-5

  • Online ISBN: 978-3-642-33269-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics