Skip to main content
Log in

Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

We propose a novel dictionary learning technique for compressive sensing of speech signals based on the recurrent neural network. First, we exploit the recurrent neural network to solve an \(\ell _{0}\)-norm optimization problem based on a sequential linear prediction model for estimating the linear prediction coefficients for voiced and unvoiced speech, respectively. Then, the extracted linear prediction coefficient vectors are clustered through an improved Linde–Buzo–Gray algorithm to generate codebooks for voiced and unvoiced speech, respectively. A dictionary is then constructed for each type of speech by concatenating a union of structured matrices derived from the column vectors in the corresponding codebook. Next, a decision module is designed to determine the appropriate dictionary for the recovery algorithm in the compressive sensing system. Finally, based on the sequential linear prediction model and the proposed dictionary, a sequential recovery algorithm is proposed to further improve the quality of the reconstructed speech. Experimental results show that when compared to the selected state-of-the-art approaches, our proposed method can achieve superior performance in terms of several objective measures including segmental signal-to-noise ratio, perceptual evaluation of speech quality and short-time objective intelligibility under both noise-free and noise-aware conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. C.C. Aggarwal, C.K. Reddy, Data Clustering: Algorithms and Applications (CRC Press, New York, 2013), pp. 60–65

    Book  Google Scholar 

  2. M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)

    Article  MATH  Google Scholar 

  3. N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  4. C.L. Bao, H. Ji, Y.H. Quan, Z.W. Shen, Dictionary learning for sparse coding: algorithms and convergence analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1356–1369 (2016)

    Article  Google Scholar 

  5. E.J. Candes, M.B. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 20–21 (2008)

    Article  Google Scholar 

  6. E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. E.J. Candes, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Y.C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University Press, New York, 2012), pp. 20–25

    Book  Google Scholar 

  12. K. Engan, S.O. Aase, J.H. Husoy, Multi-frame compression: theory and design. Signal Process. 80(10), 2121–2140 (2000)

    Article  MATH  Google Scholar 

  13. S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Birkhauser, New York, 2013), pp. 40–50

    Book  MATH  Google Scholar 

  14. D. Giacobello, M.G. Christensen, M.N. Murthi, S.H. Jensen, M. Moonen, Sparse linear prediction and its applications to speech processing. IEEE Trans. Audio Speech Lang. Process. 20(5), 1610–1644 (2012)

    Article  Google Scholar 

  15. R. Gribonval, M. Nielsen, Sparse representations in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. A. Hosseini, J. Wang, S.M. Hosseini, A recurrent neural network for solving a class of generalized convex optimization problems. Neural Netw. 44, 78–86 (2013)

    Article  MATH  Google Scholar 

  17. X.L. Hu, J. Wang, A recurrent neural network for solving a class of general variational inequalities. IEEE Trans. Syst. Man Cybern. B (Cybern.) 37(3), 528–539 (2007)

    Article  Google Scholar 

  18. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)

    Article  Google Scholar 

  19. J.N. Laska, P.T. Boufounos, M.A. Davenport, R.G. Baraniuk, Democracy in action: quantization, saturation, and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. S.H. Liu, Y.D. Zhang, T. Shan, R. Tao, Structure-aware Bayesian compressive sensing for frequency-hopping spectrum estimation with missing observations. IEEE Trans. Signal Process. 66(8), 2153–2166 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  21. J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11(1), 19–60 (2010)

    MathSciNet  MATH  Google Scholar 

  22. D. Needle, J.A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  23. R. Rubinstein, M. Zibulevsky, M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Signal Process. 58(3), 1553–1564 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  24. S.J. Sengijpta, Fundamentals of Statistical Signal Processing: Estimation Theory (Taylor and Francis Group, Abingdon, 1995), pp. 100–105

    Google Scholar 

  25. P. Sharma, V. Abrol, A.D. Dileep, A.K. Sao, Sparse coding based features for speech units classification. Comput. Speech Lang. 47, 333–350 (2018)

    Article  Google Scholar 

  26. C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process. 20(6), 1698–1712 (2012)

    Article  Google Scholar 

  27. P. Stoica, R.L. Moses, Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, 2005), pp. 80–90

    Google Scholar 

  28. L.H. Sun, Z. Yang, Y.Y. Ji, L. Ye, Reconstruction of compressed speech sensing based on overcomplete linear prediction dictionary. Chin. J. Sci. Instrum. 4, 733–739 (2012)

    Google Scholar 

  29. C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)

    Article  Google Scholar 

  30. D. Tank, J. Hopfield, Simple neural optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit. IEEE Trans. Circuits Syst. 33(5), 533–541 (1986)

    Article  Google Scholar 

  31. I. Tosic, P. Frossard, Dictionary learning. IEEE Signal Process. Mag. 28(2), 27–38 (2011)

    Article  MATH  Google Scholar 

  32. J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  33. T.H. Vu, V. Monga, Fast low-rank shared dictionary learning for image classification. IEEE Trans. Image Process. 26(11), 5160–5175 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  34. J.C. Wang, Y.S. Lee, C.H. Lin, S.F. Wang, C.H. Shih, C.H. Wu, Compressive sensing-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2122–2131 (2016)

    Article  Google Scholar 

  35. D.L. Wu, W.P. Zhu, M. Swamy, The theory of compressive sensing matching pursuit considering time-domain noise with application to speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 682–696 (2014)

    Article  Google Scholar 

  36. Y.S. Xia, M.S. Kamel, A generalized least absolute deviation method for parameter estimation of autoregressive signals. IEEE Trans. Neural Netw. 19(1), 107–118 (2008)

    Article  Google Scholar 

  37. Y.S. Xia, M.S. Kamel, H. Leung, A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method. Neural Netw. 23(3), 396–405 (2010)

    Article  Google Scholar 

  38. Y.S. Xia, J. Wang, Low-dimensional recurrent neural network-based Kalman filter for speech enhancement. Neural Netw. 67, 131–139 (2015)

    Article  Google Scholar 

  39. Z. Zhang, Y. Xu, J. Yang, X.L. Li, D. Zhang, A survey of sparse representation: algorithms and applications. IEEE Access 3, 490–500 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the Natural Sciences and Engineering Research Council of Canada, the National Natural Science Foundation of China (Grant Nos. 61601248, 61771263, 61871241) and the University Natural Science Research Foundation of Jiangsu Province, China (Grant No. 16KJB510037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunyun Ji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, Y., Zhu, WP. & Champagne, B. Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing. Circuits Syst Signal Process 38, 3616–3643 (2019). https://doi.org/10.1007/s00034-019-01058-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01058-5

Keywords

Navigation