Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

Ji, Yunyun; Zhu, Wei-Ping; Champagne, Benoit

doi:10.1007/s00034-019-01058-5

Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

Published: 14 February 2019

Volume 38, pages 3616–3643, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

510 Accesses
6 Citations
Explore all metrics

Abstract

We propose a novel dictionary learning technique for compressive sensing of speech signals based on the recurrent neural network. First, we exploit the recurrent neural network to solve an \(\ell _{0}\)-norm optimization problem based on a sequential linear prediction model for estimating the linear prediction coefficients for voiced and unvoiced speech, respectively. Then, the extracted linear prediction coefficient vectors are clustered through an improved Linde–Buzo–Gray algorithm to generate codebooks for voiced and unvoiced speech, respectively. A dictionary is then constructed for each type of speech by concatenating a union of structured matrices derived from the column vectors in the corresponding codebook. Next, a decision module is designed to determine the appropriate dictionary for the recovery algorithm in the compressive sensing system. Finally, based on the sequential linear prediction model and the proposed dictionary, a sequential recovery algorithm is proposed to further improve the quality of the reconstructed speech. Experimental results show that when compared to the selected state-of-the-art approaches, our proposed method can achieve superior performance in terms of several objective measures including segmental signal-to-noise ratio, perceptual evaluation of speech quality and short-time objective intelligibility under both noise-free and noise-aware conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

Convolutional Radio Modulation Recognition Networks

References

C.C. Aggarwal, C.K. Reddy, Data Clustering: Algorithms and Applications (CRC Press, New York, 2013), pp. 60–65
Book Google Scholar
M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Article MATH Google Scholar
N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
C.L. Bao, H. Ji, Y.H. Quan, Z.W. Shen, Dictionary learning for sparse coding: algorithms and convergence analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1356–1369 (2016)
Article Google Scholar
E.J. Candes, M.B. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 20–21 (2008)
Article Google Scholar
E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
Article MathSciNet MATH Google Scholar
E.J. Candes, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)
Article MathSciNet MATH Google Scholar
E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MathSciNet MATH Google Scholar
S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Article MathSciNet MATH Google Scholar
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Y.C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University Press, New York, 2012), pp. 20–25
Book Google Scholar
K. Engan, S.O. Aase, J.H. Husoy, Multi-frame compression: theory and design. Signal Process. 80(10), 2121–2140 (2000)
Article MATH Google Scholar
S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Birkhauser, New York, 2013), pp. 40–50
Book MATH Google Scholar
D. Giacobello, M.G. Christensen, M.N. Murthi, S.H. Jensen, M. Moonen, Sparse linear prediction and its applications to speech processing. IEEE Trans. Audio Speech Lang. Process. 20(5), 1610–1644 (2012)
Article Google Scholar
R. Gribonval, M. Nielsen, Sparse representations in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)
Article MathSciNet MATH Google Scholar
A. Hosseini, J. Wang, S.M. Hosseini, A recurrent neural network for solving a class of generalized convex optimization problems. Neural Netw. 44, 78–86 (2013)
Article MATH Google Scholar
X.L. Hu, J. Wang, A recurrent neural network for solving a class of general variational inequalities. IEEE Trans. Syst. Man Cybern. B (Cybern.) 37(3), 528–539 (2007)
Article Google Scholar
Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)
Article Google Scholar
J.N. Laska, P.T. Boufounos, M.A. Davenport, R.G. Baraniuk, Democracy in action: quantization, saturation, and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)
Article MathSciNet MATH Google Scholar
S.H. Liu, Y.D. Zhang, T. Shan, R. Tao, Structure-aware Bayesian compressive sensing for frequency-hopping spectrum estimation with missing observations. IEEE Trans. Signal Process. 66(8), 2153–2166 (2018)
Article MathSciNet MATH Google Scholar
J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11(1), 19–60 (2010)
MathSciNet MATH Google Scholar
D. Needle, J.A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)
Article MathSciNet MATH Google Scholar
R. Rubinstein, M. Zibulevsky, M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Signal Process. 58(3), 1553–1564 (2010)
Article MathSciNet MATH Google Scholar
S.J. Sengijpta, Fundamentals of Statistical Signal Processing: Estimation Theory (Taylor and Francis Group, Abingdon, 1995), pp. 100–105
Google Scholar
P. Sharma, V. Abrol, A.D. Dileep, A.K. Sao, Sparse coding based features for speech units classification. Comput. Speech Lang. 47, 333–350 (2018)
Article Google Scholar
C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process. 20(6), 1698–1712 (2012)
Article Google Scholar
P. Stoica, R.L. Moses, Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, 2005), pp. 80–90
Google Scholar
L.H. Sun, Z. Yang, Y.Y. Ji, L. Ye, Reconstruction of compressed speech sensing based on overcomplete linear prediction dictionary. Chin. J. Sci. Instrum. 4, 733–739 (2012)
Google Scholar
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
D. Tank, J. Hopfield, Simple neural optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit. IEEE Trans. Circuits Syst. 33(5), 533–541 (1986)
Article Google Scholar
I. Tosic, P. Frossard, Dictionary learning. IEEE Signal Process. Mag. 28(2), 27–38 (2011)
Article MATH Google Scholar
J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)
Article MathSciNet MATH Google Scholar
T.H. Vu, V. Monga, Fast low-rank shared dictionary learning for image classification. IEEE Trans. Image Process. 26(11), 5160–5175 (2017)
Article MathSciNet MATH Google Scholar
J.C. Wang, Y.S. Lee, C.H. Lin, S.F. Wang, C.H. Shih, C.H. Wu, Compressive sensing-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2122–2131 (2016)
Article Google Scholar
D.L. Wu, W.P. Zhu, M. Swamy, The theory of compressive sensing matching pursuit considering time-domain noise with application to speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 682–696 (2014)
Article Google Scholar
Y.S. Xia, M.S. Kamel, A generalized least absolute deviation method for parameter estimation of autoregressive signals. IEEE Trans. Neural Netw. 19(1), 107–118 (2008)
Article Google Scholar
Y.S. Xia, M.S. Kamel, H. Leung, A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method. Neural Netw. 23(3), 396–405 (2010)
Article Google Scholar
Y.S. Xia, J. Wang, Low-dimensional recurrent neural network-based Kalman filter for speech enhancement. Neural Netw. 67, 131–139 (2015)
Article Google Scholar
Z. Zhang, Y. Xu, J. Yang, X.L. Li, D. Zhang, A survey of sparse representation: algorithms and applications. IEEE Access 3, 490–500 (2015)
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the Natural Sciences and Engineering Research Council of Canada, the National Natural Science Foundation of China (Grant Nos. 61601248, 61771263, 61871241) and the University Natural Science Research Foundation of Jiangsu Province, China (Grant No. 16KJB510037).

Author information

Authors and Affiliations

School of Electronics and Information, Nantong University, Nantong, China
Yunyun Ji
Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada
Yunyun Ji & Wei-Ping Zhu
Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
Benoit Champagne

Authors

Yunyun Ji
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ping Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Champagne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunyun Ji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, Y., Zhu, WP. & Champagne, B. Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing. Circuits Syst Signal Process 38, 3616–3643 (2019). https://doi.org/10.1007/s00034-019-01058-5

Download citation

Received: 06 September 2018
Revised: 23 January 2019
Accepted: 07 February 2019
Published: 14 February 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s00034-019-01058-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Convolutional Radio Modulation Recognition Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Convolutional Radio Modulation Recognition Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation