Audio object coding based on optimal parameter frequency resolution

Wu, Tingzhao; Hu, Ruimin; Wang, Xiaochen; Ke, Shanfa

doi:10.1007/s11042-019-7409-7

Audio object coding based on optimal parameter frequency resolution

Published: 05 March 2019

Volume 78, pages 20723–20738, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tingzhao Wu^1,2,3,
Ruimin Hu^1,3,
Xiaochen Wang^1,2 &
…
Shanfa Ke^1,4

314 Accesses
12 Citations
Explore all metrics

Abstract

Object-based audio content is becoming the main form of audio content, because it is more interactive and flexible than traditional channel-based audio content. The Spatial Audio Object Coding (SAOC) method is proposed to encode multiple audio objects at low bitrate. However, SAOC extracts only a few parameters for each frame signal, which leads to low parameter frequency resolution. So the decoded signals have serious aliasing distortion which will destroy the sound quality. In this paper, we present a novel audio object coding method. We are the first to analyze how the signal distortion varies with parameter frequency resolution, and determine the optimal resolution to reduce aliasing distortion. In addition, we also achieve low coding bitrate by the dimensional reduction algorithm. Both the objective and subjective experiments confirm that the proposed method can provide higher sound quality of output signals than the state-of-the-art methods at equivalent bitrate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on N-step residual compensating

Article 19 February 2021

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Ando A (2011) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475
Article Google Scholar
Blauert J (1974) Spatial Hearing. MIT Press, Cambridge
Google Scholar
Cañadas-Quesada FJ, Vera-Candeas P, Martinez-Munoz D, Ruiz-Reyes N, Carabias-Orti JJ, Cabanas-Molero P (2016) Constrained non-negative matrix factorization for score-informed piano music restoration. Digit Signal Process 50:240–257
Article Google Scholar
Elfitri I, Muharam M, Shobirin M (2014) Distortion analysis of hierarchical mixing technique on MPEG surround standard. In: IEEE international conference on advanced computer science and information systems (ICACSIS). IEEE. pp 396–400
Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio. Speech Lang Process 19(7):2046–2057
Article Google Scholar
Fevotte C, Gribonval R, Vincent E (2005) BSS_EVAL Toolbox User Guide. IRISA, Tech. Rep. 1706, Available: http://www.irisa.fr/metiss/bss_eval/
Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1894–1897
Herre J, Purnhagen H, Koppens J, Hellmuth O, Engdegrd J, Hilper J, Valero ML (2012) MPEG Spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J Audio Eng Soc 60(9):655–673
Google Scholar
Herre J, Hilpert J, Kuntz A, Plogsties J (2015) MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J Sel Top Signal Process 9 (5):770–779
Article Google Scholar
Hou J, Chen J, Chau LP, He Y (2016) Sparse two-dimensional singular value decomposition. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6
ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), Part 2: Spatial Audio Object Coding
ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), Part 3: 3D Audio
Jia MS, Yang ZY, Bao CC, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23 (6):1082–1095
Article Google Scholar
Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216
Article Google Scholar
Krbz S, Ozerov A, Liutkus A, Girin L (2014) Perceptual coding-based informed source separation. Proceedings of the 22nd European Signal Processing Conference (EUSIPCO). IEEE, pp 959– 963
Liutkus A, Pinel J, Badeau R, Girin L, Richard G (2012) Informed source separation through spectrogram coding and data embedding. Signal Process 92 (8):1937–1949
Article Google Scholar
Nikunen J, Virtanen T (2010) Object-based audio coding using non-negative matrix factorization for the spectrogram representation Audio Engineering Society Convention 128 (AES), Audio Engineering Society
Ozerov A, Liutkus A, Badeau R, Richard G (2013) Coding-based informed source separation: Nonnegative tensor factorization approach. IEEE Trans Audio Speech Lang Process 21(8):1699–1712
Article Google Scholar
QUASI database - a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/
Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. Proc. International Telecommunications Union, Switzerland
Rodriguez-Serrano FJ, Ewert S, Vera-Candeas P, Sandler M (2016) A score-informed shift-invariant extension of complex matrix factorization for improving the separation of overlapped partials in music recordings. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 61–65
Rohlfing C, Cohen JE, Liutkus A (2017) Very low bitrate spatial audio coding with dimensionality reduction. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 741–745
Rufai AM, Anbarjafari G, Demirel H (2014) Lossy image compression using singular value decomposition and wavelet difference reduction. Digit Signal Process 24:117–123
Article Google Scholar
SAOC Verification Test Report. http://www.chiariglione.org/mpeg/working_documents/mpeg-d/sac/verification_tests.zip
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469
Article Google Scholar
Vincent E, Gribonval R, Pumbley M (2007) Oracle estimators for the benchmarking of source separation algorithms. Signal Process 87(8):1933–1950
Article MATH Google Scholar
Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Proc Mag 31(3):107–115
Article Google Scholar
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. a practical approach to microarray data analysis. Springer, US, pp 91–109
Google Scholar
Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41
Article Google Scholar
Zhang SH, Girin L, Liutkus A (2013) Informed source separation from compressed mixtures using spatial Wiener filter and quantization noise estimation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 61–65
Zheng X, Ritz C, Xi J (2013) Encoding navigable speech sources: A psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38
Article Google Scholar
Zheng X, Ritz C, Xi J (2016) Encoding and communicating navigable speech soundfields. Multimed Tools Appl 75(9):5183–5204
Article Google Scholar

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206), Hubei Province Technological Innovation Major Project (No. 2016AAA015).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, China
Tingzhao Wu, Ruimin Hu, Xiaochen Wang & Shanfa Ke
Research Institute of Wuhan University in Shenzhen, Shenzhen, 518057, China
Tingzhao Wu & Xiaochen Wang
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China
Tingzhao Wu & Ruimin Hu
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430079, China
Shanfa Ke

Authors

Tingzhao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shanfa Ke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, T., Hu, R., Wang, X. et al. Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78, 20723–20738 (2019). https://doi.org/10.1007/s11042-019-7409-7

Download citation

Received: 05 July 2018
Revised: 10 February 2019
Accepted: 22 February 2019
Published: 05 March 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s11042-019-7409-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio object coding based on optimal parameter frequency resolution

Abstract

Access this article

Similar content being viewed by others

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on N-step residual compensating

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio object coding based on optimal parameter frequency resolution

Abstract

Access this article

Similar content being viewed by others

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on N-step residual compensating

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation