Deep and Sparse Learning in Speech and Language Processing: An Overview

Wang, Dong; Zhou, Qiang; Hussain, Amir

doi:10.1007/978-3-319-49685-6_16

Dong Wang^19,20,
Qiang Zhou^19,20 &
Amir Hussain²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10023))

Included in the following conference series:

International Conference on Brain Inspired Cognitive Systems

1914 Accesses
1 Citations

Abstract

Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN), have demonstrated significant success in solving various challenging tasks of speech and language processing (SLP), including speech recognition, speech synthesis, document classification and question answering. This growing impact corroborates the neurobiological evidence concerning the presence of layer-wise deep processing in the human brain. On the other hand, sparse coding representation has also gained similar success in SLP, particularly in signal processing, demonstrating sparsity as another important neurobiological characteristic. Recently, research in these two directions is leading to increasing cross-fertlisation of ideas, thus a unified Sparse Deep or Deep Sparse learning framework warrants much attention. This paper aims to provide an overview of growing interest in this unified framework, and also outlines future research possibilities in this multi-disciplinary area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15(1), 3563–3593 (2014)
MathSciNet MATH Google Scholar
Arpit, D., Zhou, Y., Ngo, H., Govindaraju, V.: Why regularized auto-encoders learn sparse representation? arXiv preprint arXiv:1505.05561 (2015)
Asaei, A., Taghizadeh, M.J., Haghighatshoar, S., Raj, B., Bourlard, H., Cevher, V.: Binary sparse coding of convolutive mixtures for sound localization and separation via spatialization. IEEE Trans. Signal Proces. 64(3), 567–579 (2016)
Article MathSciNet Google Scholar
Barlow, H.: Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1, 371–394 (1972)
Article Google Scholar
Benesty, J.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Book Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends\(\textregistered \) in Mach. Learn. 2(1), 1–127 (2009)
Google Scholar
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning (2012)
Google Scholar
Blumensath, T., Davies, M.: Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Proces. 14(1), 50–57 (2006)
Article Google Scholar
Bordes, A., Glorot, X., Weston, J.: Joint learning of words and meaning representations for open-text semantic parsing. In: International Conference on Artificial Intelligence and Statistics (2012)
Google Scholar
Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science (2014)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Cun, Y.L., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Proceedings of NIPS 1990 (1990)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Proces. 20(1), 30–42 (2012)
Article Google Scholar
Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT press, Cambridge (2001)
MATH Google Scholar
Fahlman, S.E., Hinton, G.E.: Connectionist architectures for artificial intelligence. Computer 20(1), 100–109 (1987). (United States)
Article Google Scholar
Földiák, P., Young, M.P.: Sparse coding in the primate cortex. In: The Handbook of Brain Theory and Neural Networks, vol. 1, pp. 1064–1068 (1995)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 315–323 (2011)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016). http://www.deeplearningbook.org. Book in preparation for MIT Press
Goodfellow, I., Courville, A., Bengio, Y.: Large-scale feature learning with spike-and-slab sparse coding. In: International Conference on Machine Learning, pp. 1439–1446 (2012)
Google Scholar
He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)
Huang, P., Kim, M., Hasegawajohnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: ICASSP 2014 (2014)
Google Scholar
Jaitly, N.: Exploring deep learning methods for discovering features in speech signals. Ph.D. thesis, University of Toronto (2014)
Google Scholar
Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic filter maps. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1605–1612. IEEE (2009)
Google Scholar
Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. arXiv preprint arXiv:1010.3467 (2010)
Klein, D.J., König, P., Körding, K.P.: Sparse spectrotemporal coding of sounds. EURASIP J. Adv. Signal Proces. 2003(7), 1–9 (2003)
Article MATH Google Scholar
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. Adv. Neural Inf. Process. Syst. (NIPS) 4, 950–957 (1992)
Google Scholar
Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Ph.D. thesis, Stanford University (2010)
Google Scholar
Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems, pp. 873–880 (2008)
Google Scholar
Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2016)
Google Scholar
Li, J., Chang, H., Yang, J.: Sparse deep stacking network for image classification. arXiv preprint arXiv:1501.00777 (2015)
Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814 (2015)
Google Scholar
Liu, C., Zhang, Z., Wang, D.: Pruning deep neural networks by optimal brain damage. In: Interspeech 2014 (2014)
Google Scholar
Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Artificial Intelligence (2015)
Google Scholar
Luo, H., Shen, R., Niu, C.: Sparse group restricted Boltzmann machines. arXiv preprint arXiv:1008.4988 (2010)
Luo, Y., Bao, G., Xu, Y., Ye, Z.: Supervised monaural speech enhancement using complementary joint sparse representations. IEEE Signal Process. Lett. 23(2), 237–241 (2016)
Article Google Scholar
Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders. In: NIPS Deep Learning Workshop (2014)
Google Scholar
Martin, J.H., Jurafsky, D.: Speech and Language Processing. International Edition (2000)
Google Scholar
Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology (2012)
Google Scholar
Nam, J., Herrera, J., Slaney, M., Smith, J.O.: Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp. 565–570 (2012)
Google Scholar
Northoff, G.: Unlocking the Brain. Coding, vol. 1. Oxford, New York (2014)
Google Scholar
Ogrady, P.D., Pearlmutter, B.A.: Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1), 88–101 (2008)
Article Google Scholar
O’Grady, P.D., Pearlmutter, B.A., Rickard, S.T.: Survey of sparse and non-sparse methods in source separation. Int. J. Imaging Syst. Technol. 15(1), 18–33 (2005)
Article Google Scholar
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)
Article Google Scholar
Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2010)
Article Google Scholar
Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2006)
Google Scholar
Ranzato, M.A., Boureau, Y.L., Cun, Y.L.: Sparse feature learning for deep belief networks. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1185–1192. Curran Associates, Inc. (2008). http://papers.nips.cc/paper/3363-sparse-feature-learning-for-deep-belief-networks.pdf
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: International Conference on Machine Learning (2011)
Google Scholar
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. 165, 33–56 (2007)
Article Google Scholar
Setiono, R.: A penalty function approach for prunning feedforward neural networks. Neural Comput. 9(1), 185–204 (1994)
Article MATH Google Scholar
Sigg, C.D., Dikk, T., Buhmann, J.M.: Speech enhancement with sparse coding in learned dictionaries. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4758–4761. IEEE (2010)
Google Scholar
Sivaram, G., Nemala, S.K., Elhilali, M., Tran, T.D., Hermansky, H.: Sparse coding for speech recognition. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4346–4349, March 2010
Google Scholar
Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5336–5339. IEEE (2011)
Google Scholar
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv. Neural Inf. Process. Syst. 24, 801–809 (2011)
Google Scholar
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of Sigdat, A Special Interest Group of the ACL, pp. 151–161 (2011)
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using \(\ell {1}\) regularized online learning. In: IJCAI 2016, pp. 2915–2921 (2016)
Google Scholar
Teng, P., Jia, Y.: Voice activity detection using convolutive non-negative sparse coding. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7373–7377. IEEE (2013)
Google Scholar
Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. Neural Comput. 14(10), 2497–2529 (2002)
Article MATH Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Vinyals, O., Deng, L.: Are sparse representations rich enough for acoustic modeling? In: INTERSPEECH, pp. 2570–2573 (2012)
Google Scholar
Vipperla, R., Bozonnet, S., Wang, D., Evans, N.: Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization. In: Proceedings of CHiME, pp. 74–79 (2011)
Google Scholar
Vipperla, R., Geiger, J.T., Bozonnet, S., Wang, D., Evans, N., Schuller, B., Rigoll, G.: Speech overlap detection and attribution using convolutive non-negative sparse coding. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4181–4184. IEEE (2012)
Google Scholar
Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016, pp. 1187–1197 (2016)
Google Scholar
Wang, D., Tejedor, J.: Heterogeneous convolutive non-negative sparse coding. In: INTERSPEECH, pp. 2150–2153 (2012)
Google Scholar
Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Trans. Signal Process. 61(1), 44–56 (2013)
Article MathSciNet Google Scholar
Wang, D., Vipperla, R., Evans, N.W.: Online pattern learning for non-negative convolutive sparse coding. In: INTERSPEECH, pp. 65–68 (2011)
Google Scholar
Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse Poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)
Google Scholar
Xu, T., Wang, W., Dai, W.: Sparse coding with adaptive dictionary learning for underdetermined blind speech separation. Speech Commun. 55(3), 432–450 (2013)
Article Google Scholar
Xu, Y., Du, J., Dai, L., Lee, C.: A regression approach to speech enhancement based on deep neural networks. IEEE Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
Yogatama, D.: Sparse models of natural language text. Ph.D. thesis, Carnegie Mellon University (2015)
Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer, London (2014). Incorporated
Google Scholar
Yu, D., Seide, F., Li, G., Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of ICASSP 2012 (2012)
Google Scholar
Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1713–1720. IEEE (2011)
Google Scholar
Zen, H., Senior, A.W., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: ICASSP2013 (2013)
Google Scholar
Zhang, A., Zhu, J., Zhang, B.: Sparse online topic models. In: WWW 2013 (2013)
Google Scholar
Zhao, M., Wang, D., Zhang, Z., Zhang, X.: Music removal by denoising autoencoder in speech recognition. In: APSIPA 2015 (2015)
Google Scholar
Zhu, J., Xing, E.P: Sparse topical coding. In: UAI 2012 (2012)
Google Scholar

Download references

Acknowledgments

This research was supported by the RSE-NSFC joint project (No. 61411130162), the National Science Foundation of China (NSFC) under the project No. 61371136, the UK Engineering and Physical Sciences Research Council Grant (EPSRC) Grant No. EP/M026981/1, and the MESTDC PhD Foundation Project No. 20130002120011. It is also supported by Huilan Ltd., Tongfang Corp., and FreeNeb.

Author information

Authors and Affiliations

CSLT, RIIT, Tsinghua University, Beijing, 100084, China
Dong Wang & Qiang Zhou
Tsinghua National Lab for Information Science and Technology, Beijing, China
Dong Wang & Qiang Zhou
University of Stirling, Scotland, FK9 4LA, UK
Amir Hussain

Authors

Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Wang .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Computing Science and Mathematics, University of Stirling, Stirling, United Kingdom
Amir Hussain
Anhui University, Anhui, China
Bin Luo
National University of Singapore, Singapore, Singapore
Kay Chen Tan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yi Zeng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Zhou, Q., Hussain, A. (2016). Deep and Sparse Learning in Speech and Language Processing: An Overview. In: Liu, CL., Hussain, A., Luo, B., Tan, K., Zeng, Y., Zhang, Z. (eds) Advances in Brain Inspired Cognitive Systems. BICS 2016. Lecture Notes in Computer Science(), vol 10023. Springer, Cham. https://doi.org/10.1007/978-3-319-49685-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-49685-6_16
Published: 13 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49684-9
Online ISBN: 978-3-319-49685-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics