Skip to main content

Deep and Sparse Learning in Speech and Language Processing: An Overview

  • Conference paper
  • First Online:
Advances in Brain Inspired Cognitive Systems (BICS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10023))

Included in the following conference series:

Abstract

Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN), have demonstrated significant success in solving various challenging tasks of speech and language processing (SLP), including speech recognition, speech synthesis, document classification and question answering. This growing impact corroborates the neurobiological evidence concerning the presence of layer-wise deep processing in the human brain. On the other hand, sparse coding representation has also gained similar success in SLP, particularly in signal processing, demonstrating sparsity as another important neurobiological characteristic. Recently, research in these two directions is leading to increasing cross-fertlisation of ideas, thus a unified Sparse Deep or Deep Sparse learning framework warrants much attention. This paper aims to provide an overview of growing interest in this unified framework, and also outlines future research possibilities in this multi-disciplinary area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15(1), 3563–3593 (2014)

    MathSciNet  MATH  Google Scholar 

  2. Arpit, D., Zhou, Y., Ngo, H., Govindaraju, V.: Why regularized auto-encoders learn sparse representation? arXiv preprint arXiv:1505.05561 (2015)

  3. Asaei, A., Taghizadeh, M.J., Haghighatshoar, S., Raj, B., Bourlard, H., Cevher, V.: Binary sparse coding of convolutive mixtures for sound localization and separation via spatialization. IEEE Trans. Signal Proces. 64(3), 567–579 (2016)

    Article  MathSciNet  Google Scholar 

  4. Barlow, H.: Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1, 371–394 (1972)

    Article  Google Scholar 

  5. Benesty, J.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)

    Book  Google Scholar 

  6. Bengio, Y.: Learning deep architectures for AI. Found. Trends\(\textregistered \) in Mach. Learn. 2(1), 1–127 (2009)

    Google Scholar 

  7. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning (2012)

    Google Scholar 

  8. Blumensath, T., Davies, M.: Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Proces. 14(1), 50–57 (2006)

    Article  Google Scholar 

  9. Bordes, A., Glorot, X., Weston, J.: Joint learning of words and meaning representations for open-text semantic parsing. In: International Conference on Artificial Intelligence and Statistics (2012)

    Google Scholar 

  10. Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science (2014)

    Google Scholar 

  11. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)

    Google Scholar 

  12. Cun, Y.L., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Proceedings of NIPS 1990 (1990)

    Google Scholar 

  13. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)

    Google Scholar 

  14. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Proces. 20(1), 30–42 (2012)

    Article  Google Scholar 

  15. Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT press, Cambridge (2001)

    MATH  Google Scholar 

  16. Fahlman, S.E., Hinton, G.E.: Connectionist architectures for artificial intelligence. Computer 20(1), 100–109 (1987). (United States)

    Article  Google Scholar 

  17. Földiák, P., Young, M.P.: Sparse coding in the primate cortex. In: The Handbook of Brain Theory and Neural Networks, vol. 1, pp. 1064–1068 (1995)

    Google Scholar 

  18. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 315–323 (2011)

    Google Scholar 

  19. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016). http://www.deeplearningbook.org. Book in preparation for MIT Press

  20. Goodfellow, I., Courville, A., Bengio, Y.: Large-scale feature learning with spike-and-slab sparse coding. In: International Conference on Machine Learning, pp. 1439–1446 (2012)

    Google Scholar 

  21. He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)

  22. Huang, P., Kim, M., Hasegawajohnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: ICASSP 2014 (2014)

    Google Scholar 

  23. Jaitly, N.: Exploring deep learning methods for discovering features in speech signals. Ph.D. thesis, University of Toronto (2014)

    Google Scholar 

  24. Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic filter maps. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1605–1612. IEEE (2009)

    Google Scholar 

  25. Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. arXiv preprint arXiv:1010.3467 (2010)

  26. Klein, D.J., König, P., Körding, K.P.: Sparse spectrotemporal coding of sounds. EURASIP J. Adv. Signal Proces. 2003(7), 1–9 (2003)

    Article  MATH  Google Scholar 

  27. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. Adv. Neural Inf. Process. Syst. (NIPS) 4, 950–957 (1992)

    Google Scholar 

  28. Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)

    Google Scholar 

  29. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  30. Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Ph.D. thesis, Stanford University (2010)

    Google Scholar 

  31. Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems, pp. 873–880 (2008)

    Google Scholar 

  32. Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2016)

    Google Scholar 

  33. Li, J., Chang, H., Yang, J.: Sparse deep stacking network for image classification. arXiv preprint arXiv:1501.00777 (2015)

  34. Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814 (2015)

    Google Scholar 

  35. Liu, C., Zhang, Z., Wang, D.: Pruning deep neural networks by optimal brain damage. In: Interspeech 2014 (2014)

    Google Scholar 

  36. Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Artificial Intelligence (2015)

    Google Scholar 

  37. Luo, H., Shen, R., Niu, C.: Sparse group restricted Boltzmann machines. arXiv preprint arXiv:1008.4988 (2010)

  38. Luo, Y., Bao, G., Xu, Y., Ye, Z.: Supervised monaural speech enhancement using complementary joint sparse representations. IEEE Signal Process. Lett. 23(2), 237–241 (2016)

    Article  Google Scholar 

  39. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders. In: NIPS Deep Learning Workshop (2014)

    Google Scholar 

  40. Martin, J.H., Jurafsky, D.: Speech and Language Processing. International Edition (2000)

    Google Scholar 

  41. Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology (2012)

    Google Scholar 

  42. Nam, J., Herrera, J., Slaney, M., Smith, J.O.: Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp. 565–570 (2012)

    Google Scholar 

  43. Northoff, G.: Unlocking the Brain. Coding, vol. 1. Oxford, New York (2014)

    Google Scholar 

  44. Ogrady, P.D., Pearlmutter, B.A.: Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1), 88–101 (2008)

    Article  Google Scholar 

  45. O’Grady, P.D., Pearlmutter, B.A., Rickard, S.T.: Survey of sparse and non-sparse methods in source separation. Int. J. Imaging Syst. Technol. 15(1), 18–33 (2005)

    Article  Google Scholar 

  46. Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)

    Article  Google Scholar 

  47. Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2010)

    Article  Google Scholar 

  48. Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2006)

    Google Scholar 

  49. Ranzato, M.A., Boureau, Y.L., Cun, Y.L.: Sparse feature learning for deep belief networks. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1185–1192. Curran Associates, Inc. (2008). http://papers.nips.cc/paper/3363-sparse-feature-learning-for-deep-belief-networks.pdf

  50. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: International Conference on Machine Learning (2011)

    Google Scholar 

  51. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. 165, 33–56 (2007)

    Article  Google Scholar 

  52. Setiono, R.: A penalty function approach for prunning feedforward neural networks. Neural Comput. 9(1), 185–204 (1994)

    Article  MATH  Google Scholar 

  53. Sigg, C.D., Dikk, T., Buhmann, J.M.: Speech enhancement with sparse coding in learned dictionaries. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4758–4761. IEEE (2010)

    Google Scholar 

  54. Sivaram, G., Nemala, S.K., Elhilali, M., Tran, T.D., Hermansky, H.: Sparse coding for speech recognition. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4346–4349, March 2010

    Google Scholar 

  55. Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5336–5339. IEEE (2011)

    Google Scholar 

  56. Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv. Neural Inf. Process. Syst. 24, 801–809 (2011)

    Google Scholar 

  57. Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of Sigdat, A Special Interest Group of the ACL, pp. 151–161 (2011)

    Google Scholar 

  58. Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using \(\ell {1}\) regularized online learning. In: IJCAI 2016, pp. 2915–2921 (2016)

    Google Scholar 

  59. Teng, P., Jia, Y.: Voice activity detection using convolutive non-negative sparse coding. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7373–7377. IEEE (2013)

    Google Scholar 

  60. Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. Neural Comput. 14(10), 2497–2529 (2002)

    Article  MATH  Google Scholar 

  61. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  62. Vinyals, O., Deng, L.: Are sparse representations rich enough for acoustic modeling? In: INTERSPEECH, pp. 2570–2573 (2012)

    Google Scholar 

  63. Vipperla, R., Bozonnet, S., Wang, D., Evans, N.: Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization. In: Proceedings of CHiME, pp. 74–79 (2011)

    Google Scholar 

  64. Vipperla, R., Geiger, J.T., Bozonnet, S., Wang, D., Evans, N., Schuller, B., Rigoll, G.: Speech overlap detection and attribution using convolutive non-negative sparse coding. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4181–4184. IEEE (2012)

    Google Scholar 

  65. Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016, pp. 1187–1197 (2016)

    Google Scholar 

  66. Wang, D., Tejedor, J.: Heterogeneous convolutive non-negative sparse coding. In: INTERSPEECH, pp. 2150–2153 (2012)

    Google Scholar 

  67. Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Trans. Signal Process. 61(1), 44–56 (2013)

    Article  MathSciNet  Google Scholar 

  68. Wang, D., Vipperla, R., Evans, N.W.: Online pattern learning for non-negative convolutive sparse coding. In: INTERSPEECH, pp. 65–68 (2011)

    Google Scholar 

  69. Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse Poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)

    Google Scholar 

  70. Xu, T., Wang, W., Dai, W.: Sparse coding with adaptive dictionary learning for underdetermined blind speech separation. Speech Commun. 55(3), 432–450 (2013)

    Article  Google Scholar 

  71. Xu, Y., Du, J., Dai, L., Lee, C.: A regression approach to speech enhancement based on deep neural networks. IEEE Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)

    Article  Google Scholar 

  72. Yogatama, D.: Sparse models of natural language text. Ph.D. thesis, Carnegie Mellon University (2015)

    Google Scholar 

  73. Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer, London (2014). Incorporated

    Google Scholar 

  74. Yu, D., Seide, F., Li, G., Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of ICASSP 2012 (2012)

    Google Scholar 

  75. Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1713–1720. IEEE (2011)

    Google Scholar 

  76. Zen, H., Senior, A.W., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: ICASSP2013 (2013)

    Google Scholar 

  77. Zhang, A., Zhu, J., Zhang, B.: Sparse online topic models. In: WWW 2013 (2013)

    Google Scholar 

  78. Zhao, M., Wang, D., Zhang, Z., Zhang, X.: Music removal by denoising autoencoder in speech recognition. In: APSIPA 2015 (2015)

    Google Scholar 

  79. Zhu, J., Xing, E.P: Sparse topical coding. In: UAI 2012 (2012)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the RSE-NSFC joint project (No. 61411130162), the National Science Foundation of China (NSFC) under the project No. 61371136, the UK Engineering and Physical Sciences Research Council Grant (EPSRC) Grant No. EP/M026981/1, and the MESTDC PhD Foundation Project No. 20130002120011. It is also supported by Huilan Ltd., Tongfang Corp., and FreeNeb.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Wang, D., Zhou, Q., Hussain, A. (2016). Deep and Sparse Learning in Speech and Language Processing: An Overview. In: Liu, CL., Hussain, A., Luo, B., Tan, K., Zeng, Y., Zhang, Z. (eds) Advances in Brain Inspired Cognitive Systems. BICS 2016. Lecture Notes in Computer Science(), vol 10023. Springer, Cham. https://doi.org/10.1007/978-3-319-49685-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49685-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49684-9

  • Online ISBN: 978-3-319-49685-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics