Skip to main content
Log in

Sparse representation and reproduction of speech signals in complex Fourier basis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Sparse representation concerns the task of determining the most compact representation of a signal via a linear combination of bases of an overcomplete dictionary. As the problem is non-convex, it is common to consider approximate suboptimal solutions, and one such method is the orthogonal matching pursuit (OMP) algorithm. OMP is an iterative greedy algorithm, where at each step, the basis vector which is most correlated with the current residual is selected. For the most part, attention in the past has been directed towards using real-valued dictionaries as the considered signal of interest is also real-valued. From the perspective of speech representation, the use of complex dictionaries in sparse representation is intuitively appealing as audio signals are generally assumed to be a mixture of exponentials, with time-varying amplitudes and phases. However, sparse representation of speech signal based on complex dictionary is less investigated mainly because the measurements are normally real-valued. In this paper, we pursue this intuition by modelling the complex dictionary on the popular discrete Fourier transform, and then proceed to introduce a new orthogonalization mechanism in the OMP for such cases. The customization of the conventional OMP algorithm to the complex setting enables high-quality compact representation of the speech signals with low computational complexity. Results from experiments demonstrate that the proposed approach is able to retain high perceptual similarity of the reconstructed speech signals to the original ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available from the TIMIT Speech corpus (Linguistic Data Consortium (LDC)) (Jankowski et al. 1990) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of LDC.

Notes

  1. A basis vector is said to be active if the corresponding sparse coefficient is non-zero.

Abbreviations

OMP:

Orthogonal matching pursuit

DFT:

Discrete Fourier transform

PESQ:

Perceptual evaluation of speech quality

DCR:

Degradation category rating

MSE:

Mean-square error

DCT:

Discrete cosine transform

K-SVD:

K-means singular value decomposition

dB:

Desibel

DMOS:

Degradation mean opinion score

References

  • Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.

    Article  Google Scholar 

  • Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.

    Article  MathSciNet  Google Scholar 

  • Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH

  • Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12.

  • Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.

    Article  MathSciNet  Google Scholar 

  • Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.

    Article  Google Scholar 

  • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.

    Article  MathSciNet  Google Scholar 

  • Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:1206.2197.

  • Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.

    Article  Google Scholar 

  • Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.

    Article  Google Scholar 

  • ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality.

  • ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations.

  • Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.

    Article  Google Scholar 

  • Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112).

  • Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press.

    Google Scholar 

  • Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:1312.4695.

  • Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884.

  • Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5.

  • Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.

    Article  MathSciNet  Google Scholar 

  • Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752).

  • Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.

    Article  Google Scholar 

  • Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761.

  • Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.

    Article  Google Scholar 

  • Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.

    Article  MathSciNet  Google Scholar 

  • Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled A. Alaghbari.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kwek, LC., Tan, A.WC., Lim, HS. et al. Sparse representation and reproduction of speech signals in complex Fourier basis. Int J Speech Technol 25, 211–217 (2022). https://doi.org/10.1007/s10772-021-09941-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09941-w

Keywords

Navigation