Abstract
The spatial audio object coding (SAOC) is an effective meth-od which compresses multiple audio objects and provides flexibility for personalized rendering in interactive services. It divides each frame signal into 28 sub-bands and extracts one set object spatial parameters for each sub-band. Objects can be coded into a downmix signal and a few parameters by this way. However, using same parameters in one sub-band will cause frequency aliasing distortion, which seriously impacts listening experience. Existing studies to improve SAOC cannot guarantee that all audio objects can be decoded well. This paper describes a new multi-step object coding structure to efficient calculate residual of each object as additional side information to compensate the aliasing distortion of each object. In this multi-step structure, a sorting strategy based on sub-band energy of each object is proposed to determine which audio object should be encoded in each step, because the object encoding order will affect the final decoded quality. The singular value decomposition (SVD) is used to reduce the increasing bit-rate due to the added side information. From the experiment results, the performance of proposed method is better than SAOC and SAOC-TSC, and each object can be decoded well with respect to the bit-rate and the sound quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breebaart, J., Engdegård, J., Falch, C., et al.: Spatial audio object coding (saoc)-the upcoming MPEG standard on parametric object based audio coding. Audio Eng. Soc. Convention Audio Eng. Soc. 124, 613–627 (2008)
Füg, S., Hoelzer, A., Borss, C., Ertel, C., Kratschmer, M., Plogsties, J.: Design, coding and processing of metadata for object-based interactive audio. In: Audio Engineering Society Convention 137. Audio Engineering Society (2014)
Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: Mpeg-h 3D audio–the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)
Coleman, P., Franck, A., Francombe, J., et al.: An audio-visual system for object-based audio: from recording to listening. IEEE Trans. Multimedia 20(8), 1919–1931 (2018)
Shirley, B., Oldfield, R., et al.: Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J. Audio Eng. Soc. 63(4), 245–256 (2015)
Oldfield, R., Shirley, B., Spille, J.: Object-based audio for interactive football broadcast. Multimedia Tools Appl. 74(8), 2717–2741 (2015)
Kasuya, T., et al.: Livration: remote VR live platform with interactive 3D audio-visual service (2019)
Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools and Appl. 78(15), 20723–20738 (2019)
Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)
Lee, B., Kim, K., Hahn, M.: Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans. Inf. Syst. 99(7), 1949–1952 (2016)
ISO/IEC 23003–2:2010, MPEG-D (MPEG audio technologies), Part 2: Spatial audio object coding (2010)
Koo, K., Kim, K., Seo, J., Kang, K., Hahn, M.: Variable subband analysis for high quality spatial audio object coding. In: 2008 10th International Conference on Advanced Communication Technology, vol. 2, pp. 1205–1208. IEEE (2008)
Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)
Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(6), 1082–1095 (2015)
Jia, M., Zhang, J., Bao, C., Zheng, X.: A psychoacoustic-based multiple audio object coding approach via intra-object sparsity. Appl. Sci. 7(12), 1301 (2017)
Villemoes, L., Hirvonen, T., Purnhagen, H.: Decorrelation for audio object coding. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 706–710. IEEE (2017)
Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)
Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention 146. Audio Engineering Society (2019)
Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, Boston (2003). https://doi.org/10.1007/0-306-47815-3_5
ADASP Homepage, QUASI database. http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/. Accessed 12 Mar 2012
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1462–1469 (2006)
Recommendation ITU-R BS.1534-3. Method for the subjective assessment of intermediate quality level of coding systems (MUSHRA). In: Proceedings International Telecommunications Union, Switzerland (2015)
Acknowledgement
This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61701194, No. U1736206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, C., Hu, R., Wang, X., Wu, T., Li, D. (2020). Multi-step Coding Structure of Spatial Audio Object Coding. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-37731-1_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)