Sparse representation and reproduction of speech signals in complex Fourier basis

Kwek, Lee-Chung; Tan, Alan Wee-Chiat; Lim, Heng-Siong; Tan, Cheah-Heng; Alaghbari, Khaled A.

doi:10.1007/s10772-021-09941-w

Sparse representation and reproduction of speech signals in complex Fourier basis

Published: 26 November 2021

Volume 25, pages 211–217, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Lee-Chung Kwek¹,
Alan Wee-Chiat Tan¹,
Heng-Siong Lim¹,
Cheah-Heng Tan² &
…
Khaled A. Alaghbari ORCID: orcid.org/0000-0002-1983-1694¹

284 Accesses
Explore all metrics

Abstract

Sparse representation concerns the task of determining the most compact representation of a signal via a linear combination of bases of an overcomplete dictionary. As the problem is non-convex, it is common to consider approximate suboptimal solutions, and one such method is the orthogonal matching pursuit (OMP) algorithm. OMP is an iterative greedy algorithm, where at each step, the basis vector which is most correlated with the current residual is selected. For the most part, attention in the past has been directed towards using real-valued dictionaries as the considered signal of interest is also real-valued. From the perspective of speech representation, the use of complex dictionaries in sparse representation is intuitively appealing as audio signals are generally assumed to be a mixture of exponentials, with time-varying amplitudes and phases. However, sparse representation of speech signal based on complex dictionary is less investigated mainly because the measurements are normally real-valued. In this paper, we pursue this intuition by modelling the complex dictionary on the popular discrete Fourier transform, and then proceed to introduce a new orthogonalization mechanism in the OMP for such cases. The customization of the conventional OMP algorithm to the complex setting enables high-quality compact representation of the speech signals with low computational complexity. Results from experiments demonstrate that the proposed approach is able to retain high perceptual similarity of the reconstructed speech signals to the original ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit

Speech Denoising Based on Sparse Representation Algorithm

Blind and Semi-blind Anechoic Mixing System Identification Using Multichannel Matching Pursuit

Article 09 March 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The data that support the findings of this study are available from the TIMIT Speech corpus (Linguistic Data Consortium (LDC)) (Jankowski et al. 1990) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of LDC.

Notes

A basis vector is said to be active if the corresponding sparse coefficient is non-zero.

Abbreviations

OMP:: Orthogonal matching pursuit
DFT:: Discrete Fourier transform
PESQ:: Perceptual evaluation of speech quality
DCR:: Degradation category rating
MSE:: Mean-square error
DCT:: Discrete cosine transform
K-SVD:: K-means singular value decomposition
dB:: Desibel
DMOS:: Degradation mean opinion score

References

Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Article Google Scholar
Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.
Article MathSciNet Google Scholar
Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH
Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12.
Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.
Article MathSciNet Google Scholar
Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.
Article Google Scholar
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
Article MathSciNet Google Scholar
Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:1206.2197.
Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.
Article Google Scholar
Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.
Article Google Scholar
ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality.
ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations.
Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.
Article Google Scholar
Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112).
Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press.
Google Scholar
Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:1312.4695.
Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884.
Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5.
Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.
Article MathSciNet Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752).
Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.
Article Google Scholar
Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761.
Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.
Article Google Scholar
Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.
Article MathSciNet Google Scholar
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Faculty of Engineering and Technology, Multimedia University, 75450, Melaka, Malaysia
Lee-Chung Kwek, Alan Wee-Chiat Tan, Heng-Siong Lim & Khaled A. Alaghbari
Motorola Solutions Malaysia Sdn Bhd, 11900, Penang, Malaysia
Cheah-Heng Tan

Authors

Lee-Chung Kwek
View author publications
You can also search for this author inPubMed Google Scholar
Alan Wee-Chiat Tan
View author publications
You can also search for this author inPubMed Google Scholar
Heng-Siong Lim
View author publications
You can also search for this author inPubMed Google Scholar
Cheah-Heng Tan
View author publications
You can also search for this author inPubMed Google Scholar
Khaled A. Alaghbari
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Khaled A. Alaghbari.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwek, LC., Tan, A.WC., Lim, HS. et al. Sparse representation and reproduction of speech signals in complex Fourier basis. Int J Speech Technol 25, 211–217 (2022). https://doi.org/10.1007/s10772-021-09941-w

Download citation

Received: 23 November 2020
Accepted: 14 November 2021
Published: 26 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10772-021-09941-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse representation and reproduction of speech signals in complex Fourier basis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit

Speech Denoising Based on Sparse Representation Algorithm

Blind and Semi-blind Anechoic Mixing System Identification Using Multichannel Matching Pursuit

Explore related subjects

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now