Abstract
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN), have demonstrated significant success in solving various challenging tasks of speech and language processing (SLP), including speech recognition, speech synthesis, document classification and question answering. This growing impact corroborates the neurobiological evidence concerning the presence of layer-wise deep processing in the human brain. On the other hand, sparse coding representation has also gained similar success in SLP, particularly in signal processing, demonstrating sparsity as another important neurobiological characteristic. Recently, research in these two directions is leading to increasing cross-fertlisation of ideas, thus a unified Sparse Deep or Deep Sparse learning framework warrants much attention. This paper aims to provide an overview of growing interest in this unified framework, and also outlines future research possibilities in this multi-disciplinary area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15(1), 3563–3593 (2014)
Arpit, D., Zhou, Y., Ngo, H., Govindaraju, V.: Why regularized auto-encoders learn sparse representation? arXiv preprint arXiv:1505.05561 (2015)
Asaei, A., Taghizadeh, M.J., Haghighatshoar, S., Raj, B., Bourlard, H., Cevher, V.: Binary sparse coding of convolutive mixtures for sound localization and separation via spatialization. IEEE Trans. Signal Proces. 64(3), 567–579 (2016)
Barlow, H.: Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1, 371–394 (1972)
Benesty, J.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Bengio, Y.: Learning deep architectures for AI. Found. Trends\(\textregistered \) in Mach. Learn. 2(1), 1–127 (2009)
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning (2012)
Blumensath, T., Davies, M.: Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Proces. 14(1), 50–57 (2006)
Bordes, A., Glorot, X., Weston, J.: Joint learning of words and meaning representations for open-text semantic parsing. In: International Conference on Artificial Intelligence and Statistics (2012)
Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science (2014)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Cun, Y.L., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Proceedings of NIPS 1990 (1990)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Proces. 20(1), 30–42 (2012)
Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT press, Cambridge (2001)
Fahlman, S.E., Hinton, G.E.: Connectionist architectures for artificial intelligence. Computer 20(1), 100–109 (1987). (United States)
Földiák, P., Young, M.P.: Sparse coding in the primate cortex. In: The Handbook of Brain Theory and Neural Networks, vol. 1, pp. 1064–1068 (1995)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 315–323 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016). http://www.deeplearningbook.org. Book in preparation for MIT Press
Goodfellow, I., Courville, A., Bengio, Y.: Large-scale feature learning with spike-and-slab sparse coding. In: International Conference on Machine Learning, pp. 1439–1446 (2012)
He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)
Huang, P., Kim, M., Hasegawajohnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: ICASSP 2014 (2014)
Jaitly, N.: Exploring deep learning methods for discovering features in speech signals. Ph.D. thesis, University of Toronto (2014)
Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic filter maps. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1605–1612. IEEE (2009)
Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. arXiv preprint arXiv:1010.3467 (2010)
Klein, D.J., König, P., Körding, K.P.: Sparse spectrotemporal coding of sounds. EURASIP J. Adv. Signal Proces. 2003(7), 1–9 (2003)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. Adv. Neural Inf. Process. Syst. (NIPS) 4, 950–957 (1992)
Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Ph.D. thesis, Stanford University (2010)
Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems, pp. 873–880 (2008)
Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2016)
Li, J., Chang, H., Yang, J.: Sparse deep stacking network for image classification. arXiv preprint arXiv:1501.00777 (2015)
Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814 (2015)
Liu, C., Zhang, Z., Wang, D.: Pruning deep neural networks by optimal brain damage. In: Interspeech 2014 (2014)
Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Artificial Intelligence (2015)
Luo, H., Shen, R., Niu, C.: Sparse group restricted Boltzmann machines. arXiv preprint arXiv:1008.4988 (2010)
Luo, Y., Bao, G., Xu, Y., Ye, Z.: Supervised monaural speech enhancement using complementary joint sparse representations. IEEE Signal Process. Lett. 23(2), 237–241 (2016)
Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders. In: NIPS Deep Learning Workshop (2014)
Martin, J.H., Jurafsky, D.: Speech and Language Processing. International Edition (2000)
Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology (2012)
Nam, J., Herrera, J., Slaney, M., Smith, J.O.: Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp. 565–570 (2012)
Northoff, G.: Unlocking the Brain. Coding, vol. 1. Oxford, New York (2014)
Ogrady, P.D., Pearlmutter, B.A.: Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1), 88–101 (2008)
O’Grady, P.D., Pearlmutter, B.A., Rickard, S.T.: Survey of sparse and non-sparse methods in source separation. Int. J. Imaging Syst. Technol. 15(1), 18–33 (2005)
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)
Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2010)
Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2006)
Ranzato, M.A., Boureau, Y.L., Cun, Y.L.: Sparse feature learning for deep belief networks. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1185–1192. Curran Associates, Inc. (2008). http://papers.nips.cc/paper/3363-sparse-feature-learning-for-deep-belief-networks.pdf
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: International Conference on Machine Learning (2011)
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. 165, 33–56 (2007)
Setiono, R.: A penalty function approach for prunning feedforward neural networks. Neural Comput. 9(1), 185–204 (1994)
Sigg, C.D., Dikk, T., Buhmann, J.M.: Speech enhancement with sparse coding in learned dictionaries. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4758–4761. IEEE (2010)
Sivaram, G., Nemala, S.K., Elhilali, M., Tran, T.D., Hermansky, H.: Sparse coding for speech recognition. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4346–4349, March 2010
Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5336–5339. IEEE (2011)
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv. Neural Inf. Process. Syst. 24, 801–809 (2011)
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of Sigdat, A Special Interest Group of the ACL, pp. 151–161 (2011)
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using \(\ell {1}\) regularized online learning. In: IJCAI 2016, pp. 2915–2921 (2016)
Teng, P., Jia, Y.: Voice activity detection using convolutive non-negative sparse coding. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7373–7377. IEEE (2013)
Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. Neural Comput. 14(10), 2497–2529 (2002)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
Vinyals, O., Deng, L.: Are sparse representations rich enough for acoustic modeling? In: INTERSPEECH, pp. 2570–2573 (2012)
Vipperla, R., Bozonnet, S., Wang, D., Evans, N.: Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization. In: Proceedings of CHiME, pp. 74–79 (2011)
Vipperla, R., Geiger, J.T., Bozonnet, S., Wang, D., Evans, N., Schuller, B., Rigoll, G.: Speech overlap detection and attribution using convolutive non-negative sparse coding. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4181–4184. IEEE (2012)
Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016, pp. 1187–1197 (2016)
Wang, D., Tejedor, J.: Heterogeneous convolutive non-negative sparse coding. In: INTERSPEECH, pp. 2150–2153 (2012)
Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Trans. Signal Process. 61(1), 44–56 (2013)
Wang, D., Vipperla, R., Evans, N.W.: Online pattern learning for non-negative convolutive sparse coding. In: INTERSPEECH, pp. 65–68 (2011)
Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse Poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)
Xu, T., Wang, W., Dai, W.: Sparse coding with adaptive dictionary learning for underdetermined blind speech separation. Speech Commun. 55(3), 432–450 (2013)
Xu, Y., Du, J., Dai, L., Lee, C.: A regression approach to speech enhancement based on deep neural networks. IEEE Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Yogatama, D.: Sparse models of natural language text. Ph.D. thesis, Carnegie Mellon University (2015)
Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer, London (2014). Incorporated
Yu, D., Seide, F., Li, G., Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of ICASSP 2012 (2012)
Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1713–1720. IEEE (2011)
Zen, H., Senior, A.W., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: ICASSP2013 (2013)
Zhang, A., Zhu, J., Zhang, B.: Sparse online topic models. In: WWW 2013 (2013)
Zhao, M., Wang, D., Zhang, Z., Zhang, X.: Music removal by denoising autoencoder in speech recognition. In: APSIPA 2015 (2015)
Zhu, J., Xing, E.P: Sparse topical coding. In: UAI 2012 (2012)
Acknowledgments
This research was supported by the RSE-NSFC joint project (No. 61411130162), the National Science Foundation of China (NSFC) under the project No. 61371136, the UK Engineering and Physical Sciences Research Council Grant (EPSRC) Grant No. EP/M026981/1, and the MESTDC PhD Foundation Project No. 20130002120011. It is also supported by Huilan Ltd., Tongfang Corp., and FreeNeb.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, D., Zhou, Q., Hussain, A. (2016). Deep and Sparse Learning in Speech and Language Processing: An Overview. In: Liu, CL., Hussain, A., Luo, B., Tan, K., Zeng, Y., Zhang, Z. (eds) Advances in Brain Inspired Cognitive Systems. BICS 2016. Lecture Notes in Computer Science(), vol 10023. Springer, Cham. https://doi.org/10.1007/978-3-319-49685-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-49685-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49684-9
Online ISBN: 978-3-319-49685-6
eBook Packages: Computer ScienceComputer Science (R0)