Effects of Architecture Choices on Sparse Coding in Speech Recognition

O’Donnell, Fionntán; Triefenbach, Fabian; Martens, Jean-Pierre; Schrauwen, Benjamin

doi:10.1007/978-3-642-33269-2_79

Fionntán O’Donnell²¹,
Fabian Triefenbach²¹,
Jean-Pierre Martens²¹ &
…
Benjamin Schrauwen²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7552))

Included in the following conference series:

International Conference on Artificial Neural Networks

4126 Accesses

Abstract

A common technique in visual object recognition is to use a sparse encoding of low-level input with a feature dictionary followed by a spatial pooling over local neighbourhoods. While some methods stack these in alternating layers within hierarchies, using these two stages alone can also produce state-of-the-art results. Following from vision, this framework is moving in to speech and audio processing tasks. We investigate the effect of architectural choices when applied to a spoken digit recognition task. We find that the unsupervised learning of features has a negligible effect on the classification, with the number of and size of the features being a greater determinant for recognition. Finally, we show that, given an optimised architecture, sparse coding performs comparably with Hidden Markov Models (HMMs) and outperforms K-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shamma, S.: On the role of space and time in auditory processing. Trends Cogn. Sci. 5(8), 340–348 (2001)
Article Google Scholar
Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Article Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2(11), 1019–1025 (1999)
Article Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Labusch, K., Barth, E., Martinetz, T.: Simple method for high performance digit recognition based on sparse coding. IEEE Trans. Neural Netw. 19(11), 1985–1989 (2008)
Article Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: International Conference on Computer Vision (2009)
Google Scholar
Kleinschmidt, M.: Methods for capturing spectro-temporal modulations in automatic speech recognition. Acta Acust. Acust. 88(3), 416–422 (2002)
Google Scholar
Heckmann, M., Domont, X., Joublin, F., Goerick, C.: A hierarchical framework for spectro-temporal feature extraction. Speech Communication 53(5), 736–752 (2011)
Article Google Scholar
Cho, Y., Choi, S.: Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Lett. 26(9), 1327–1336 (2005)
Article Google Scholar
Henaff, M., Jarrett, K., Kavukcuoglu, K., LeCun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of International Symposium on Music Information Retrieval (2011)
Google Scholar
Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, vol. 14 (2011)
Google Scholar
Coates, A., Ng, A.: The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the Twenty-Eighth International Conference on Machine Learning (2010)
Google Scholar
Saxe, A., Koh, P., Chen, Z., Bhand, M., Suresh, B., Ng, A.: On random weights and unsupervised feature learning. In: Twenty-Eighth International Conference on Machine Learning (2011)
Google Scholar
Tošić, I., Frossard, P.: Dictionary learning: What is the right representation for my signal? IEEE Signal Processing Magazine 28(2), 27–38 (2011)
Article Google Scholar
Pati, Y., Rezaiifar, R., Krishnaprasad, P.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: Proc. Asilomar Conf. Signals, Syst., Comput., vol. 1, pp. 40–44. Pacific Grove, CA (1993)
Google Scholar
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Chapter Google Scholar
Scherer, D., Müller, A., Behnke, S.: Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010, Part III. LNCS, vol. 6354, pp. 92–101. Springer, Heidelberg (2010)
Chapter Google Scholar
Pedregosa, J., et al.: Scikit-learn: Machine Learning in Python. JMLR 12, 2825–2830 (2011), Software, http://scikit-learn.org/stable/
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A Library for Large Linear Classification, Software, http://www.csie.ntu.edu.tw/~cjlin/liblinear
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Hirsch, H.G., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. of ISCA ASR 2000 Workshop, Paris, France, pp. 181–188 (2000)
Google Scholar
Plannerer, B.: Mel-Spectral Toolbox (2004), http://www.speech-recognition.de/matlab-examples.html
Aharon, M., Elad, M., Bruckstein, A.M.: K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54(11), 4311–4322 (2006)
Article Google Scholar
Rubenstein, R.: KSVD-Box v13, http://www.cs.technion.ac.il/~ronrubin/software.html
Li, Y., Osher, S.: Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging 3(3), 487–503 (2009)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Systems, Ghent University, Sint-Pietersnieuwstraat 41, 9000, Ghent, Belgium
Fionntán O’Donnell, Fabian Triefenbach, Jean-Pierre Martens & Benjamin Schrauwen

Authors

Fionntán O’Donnell
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Triefenbach
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Martens
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Schrauwen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Neuro Heuristic Research Group, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Informatics, Nicolaus Copernicus University, 87-100, Toruń, Poland
Włodzisław Duch
Center for Complex Systems Studies, Kalamazoo College, 49006, Kalamazoo, MI, USA
Péter Érdi
Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, 16146, Genoa, Italy
Francesco Masulli
Institut für Neuroinformatik, Universität Ulm, 89069, Ulm, Germany
Günther Palm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

O’Donnell, F., Triefenbach, F., Martens, JP., Schrauwen, B. (2012). Effects of Architecture Choices on Sparse Coding in Speech Recognition. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33269-2_79

Download citation

DOI: https://doi.org/10.1007/978-3-642-33269-2_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33268-5
Online ISBN: 978-3-642-33269-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics