Abstract
Object-based audio content is becoming the main form of audio content, because it is more interactive and flexible than traditional channel-based audio content. The Spatial Audio Object Coding (SAOC) method is proposed to encode multiple audio objects at low bitrate. However, SAOC extracts only a few parameters for each frame signal, which leads to low parameter frequency resolution. So the decoded signals have serious aliasing distortion which will destroy the sound quality. In this paper, we present a novel audio object coding method. We are the first to analyze how the signal distortion varies with parameter frequency resolution, and determine the optimal resolution to reduce aliasing distortion. In addition, we also achieve low coding bitrate by the dimensional reduction algorithm. Both the objective and subjective experiments confirm that the proposed method can provide higher sound quality of output signals than the state-of-the-art methods at equivalent bitrate.
Similar content being viewed by others
References
Ando A (2011) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475
Blauert J (1974) Spatial Hearing. MIT Press, Cambridge
Cañadas-Quesada FJ, Vera-Candeas P, Martinez-Munoz D, Ruiz-Reyes N, Carabias-Orti JJ, Cabanas-Molero P (2016) Constrained non-negative matrix factorization for score-informed piano music restoration. Digit Signal Process 50:240–257
Elfitri I, Muharam M, Shobirin M (2014) Distortion analysis of hierarchical mixing technique on MPEG surround standard. In: IEEE international conference on advanced computer science and information systems (ICACSIS). IEEE. pp 396–400
Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio. Speech Lang Process 19(7):2046–2057
Fevotte C, Gribonval R, Vincent E (2005) BSS_EVAL Toolbox User Guide. IRISA, Tech. Rep. 1706, Available: http://www.irisa.fr/metiss/bss_eval/
Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1894–1897
Herre J, Purnhagen H, Koppens J, Hellmuth O, Engdegrd J, Hilper J, Valero ML (2012) MPEG Spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J Audio Eng Soc 60(9):655–673
Herre J, Hilpert J, Kuntz A, Plogsties J (2015) MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J Sel Top Signal Process 9 (5):770–779
Hou J, Chen J, Chau LP, He Y (2016) Sparse two-dimensional singular value decomposition. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6
ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), Part 2: Spatial Audio Object Coding
ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), Part 3: 3D Audio
Jia MS, Yang ZY, Bao CC, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23 (6):1082–1095
Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216
Krbz S, Ozerov A, Liutkus A, Girin L (2014) Perceptual coding-based informed source separation. Proceedings of the 22nd European Signal Processing Conference (EUSIPCO). IEEE, pp 959– 963
Liutkus A, Pinel J, Badeau R, Girin L, Richard G (2012) Informed source separation through spectrogram coding and data embedding. Signal Process 92 (8):1937–1949
Nikunen J, Virtanen T (2010) Object-based audio coding using non-negative matrix factorization for the spectrogram representation Audio Engineering Society Convention 128 (AES), Audio Engineering Society
Ozerov A, Liutkus A, Badeau R, Richard G (2013) Coding-based informed source separation: Nonnegative tensor factorization approach. IEEE Trans Audio Speech Lang Process 21(8):1699–1712
QUASI database - a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/
Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. Proc. International Telecommunications Union, Switzerland
Rodriguez-Serrano FJ, Ewert S, Vera-Candeas P, Sandler M (2016) A score-informed shift-invariant extension of complex matrix factorization for improving the separation of overlapped partials in music recordings. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 61–65
Rohlfing C, Cohen JE, Liutkus A (2017) Very low bitrate spatial audio coding with dimensionality reduction. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 741–745
Rufai AM, Anbarjafari G, Demirel H (2014) Lossy image compression using singular value decomposition and wavelet difference reduction. Digit Signal Process 24:117–123
SAOC Verification Test Report. http://www.chiariglione.org/mpeg/working_documents/mpeg-d/sac/verification_tests.zip
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469
Vincent E, Gribonval R, Pumbley M (2007) Oracle estimators for the benchmarking of source separation algorithms. Signal Process 87(8):1933–1950
Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Proc Mag 31(3):107–115
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. a practical approach to microarray data analysis. Springer, US, pp 91–109
Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41
Zhang SH, Girin L, Liutkus A (2013) Informed source separation from compressed mixtures using spatial Wiener filter and quantization noise estimation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 61–65
Zheng X, Ritz C, Xi J (2013) Encoding navigable speech sources: A psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38
Zheng X, Ritz C, Xi J (2016) Encoding and communicating navigable speech soundfields. Multimed Tools Appl 75(9):5183–5204
Acknowledgements
This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206), Hubei Province Technological Innovation Major Project (No. 2016AAA015).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, T., Hu, R., Wang, X. et al. Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78, 20723–20738 (2019). https://doi.org/10.1007/s11042-019-7409-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7409-7