Abstract
Sparse representation concerns the task of determining the most compact representation of a signal via a linear combination of bases of an overcomplete dictionary. As the problem is non-convex, it is common to consider approximate suboptimal solutions, and one such method is the orthogonal matching pursuit (OMP) algorithm. OMP is an iterative greedy algorithm, where at each step, the basis vector which is most correlated with the current residual is selected. For the most part, attention in the past has been directed towards using real-valued dictionaries as the considered signal of interest is also real-valued. From the perspective of speech representation, the use of complex dictionaries in sparse representation is intuitively appealing as audio signals are generally assumed to be a mixture of exponentials, with time-varying amplitudes and phases. However, sparse representation of speech signal based on complex dictionary is less investigated mainly because the measurements are normally real-valued. In this paper, we pursue this intuition by modelling the complex dictionary on the popular discrete Fourier transform, and then proceed to introduce a new orthogonalization mechanism in the OMP for such cases. The customization of the conventional OMP algorithm to the complex setting enables high-quality compact representation of the speech signals with low computational complexity. Results from experiments demonstrate that the proposed approach is able to retain high perceptual similarity of the reconstructed speech signals to the original ones.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data that support the findings of this study are available from the TIMIT Speech corpus (Linguistic Data Consortium (LDC)) (Jankowski et al. 1990) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of LDC.
Notes
A basis vector is said to be active if the corresponding sparse coefficient is non-zero.
Abbreviations
- OMP:
-
Orthogonal matching pursuit
- DFT:
-
Discrete Fourier transform
- PESQ:
-
Perceptual evaluation of speech quality
- DCR:
-
Degradation category rating
- MSE:
-
Mean-square error
- DCT:
-
Discrete cosine transform
- K-SVD:
-
K-means singular value decomposition
- dB:
-
Desibel
- DMOS:
-
Degradation mean opinion score
References
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.
Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH
Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12.
Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.
Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:1206.2197.
Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.
Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.
ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality.
ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations.
Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.
Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112).
Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press.
Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:1312.4695.
Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884.
Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5.
Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752).
Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.
Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761.
Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.
Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Rights and permissions
About this article
Cite this article
Kwek, LC., Tan, A.WC., Lim, HS. et al. Sparse representation and reproduction of speech signals in complex Fourier basis. Int J Speech Technol 25, 211–217 (2022). https://doi.org/10.1007/s10772-021-09941-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-021-09941-w