Abstract
Current paper devoted to the sparse audio and speech signal modelling via the matching pursuit (MP) algorithm. Redundant dictionary of the time-frequency functions is constructed through the frame-based psychoacoustic optimized wavelet packet (WP) transform. Anthropomorphic adaptation of the time-frequency plan allows minimizing perceptual redundancy of the signal modelling. Psychoacoustic information at MP stage for the best atom selection from the dictionary is used. It improves algorithm performance in terms of human hearing system and computational complexity. Described signal model can be applied in many audio and speech processing tasks such as source separation, watermarking, classification and so on. Presented research focused on the signal encoding. Universal audio/speech coding algorithm that is suitable for the input signals with different sound content is proposed.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)
Chardon, G., Necciari, T., Balazs, P.: Perceptual matching pursuit with gabor dictionaries and time-frequency masking. In: ICASSP 2014, Florence, Italy, pp. 3126–3130 (2014)
Ravelli, E., Richard, G., Daudet, L.: Matching pursuit in adaptive dictionaries for scalable audio coding. In: EUSIPCO 2008, Lausanne, Switzerland, pp. 1–5 (2008)
Ruiz Reyes, N., Vera Candeas, P.: Adaptive signal modeling based on sparse approximations for scalable parametric audio coding. IEEE Trans. Audio Speech Lang. Process. 18(3), 447–460 (2010)
Petrovsky, Al., Azarov, E., Petrovsky, A.: Hybrid signal decomposition based on instantaneous harmonic parameters and perceptually motivated wavelet packets for scalable audio coding. Sig. Process. 91, 1489–1504 (2011). Special Issue “Fourier Related Transforms for Non-Stationary Signals”. Elsevier
Valin, J.-M., Maxwell, G., Terriberry, T., Vos, K.: High-quality, low-delay music coding in the opus codec. In: AES 135th Convention, paper 8942, New York, USA (2013)
Vos, K., Sørensen, K.V., Jensen, S.S., Valin, J.-M.: Voice coding with opus. In: AES 135th Convention, paper 8941, New York, USA (2013)
Goodwin, M., Vetterli, M.: Atomic decompositions of audio signals. In: IEEE Audio Signal Processing Workshop (1997)
Petrovsky, A., Krahe, D., Petrovsky, A.A.: Real-time wavelet packet-based low bit rate audio coding on a dynamic reconfiguration system. In: AES 114th Convention, paper 5778, Amsterdam, The Netherlands (2003)
Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley (1997)
Huber, R., Kollmeier, B.: PEMO-Q – a new method for objective audio quality assessment using a model of auditory perception. IEEE Trans. Audio Speech Lang. Process. 14(6), 1902–1911 (2006)
Acknowledgement
This work was supported by ITForYou company.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Petrovsky, A., Herasimovich, V., Petrovsky, A. (2016). Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)