Multi-step Coding Structure of Spatial Audio Object Coding

Hu, Chenhao; Hu, Ruimin; Wang, Xiaochen; Wu, Tingzhao; Li, Dengshi

doi:10.1007/978-3-030-37731-1_54

Chenhao Hu^16,17,
Ruimin Hu^16,17,
Xiaochen Wang^16,17,
Tingzhao Wu^16,17 &
…
Dengshi Li^16,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

International Conference on Multimedia Modeling

2777 Accesses
6 Citations

Abstract

The spatial audio object coding (SAOC) is an effective meth-od which compresses multiple audio objects and provides flexibility for personalized rendering in interactive services. It divides each frame signal into 28 sub-bands and extracts one set object spatial parameters for each sub-band. Objects can be coded into a downmix signal and a few parameters by this way. However, using same parameters in one sub-band will cause frequency aliasing distortion, which seriously impacts listening experience. Existing studies to improve SAOC cannot guarantee that all audio objects can be decoded well. This paper describes a new multi-step object coding structure to efficient calculate residual of each object as additional side information to compensate the aliasing distortion of each object. In this multi-step structure, a sorting strategy based on sub-band energy of each object is proposed to determine which audio object should be encoded in each step, because the object encoding order will affect the final decoded quality. The singular value decomposition (SVD) is used to reduce the increasing bit-rate due to the added side information. From the experiment results, the performance of proposed method is better than SAOC and SAOC-TSC, and each object can be decoded well with respect to the bit-rate and the sound quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breebaart, J., Engdegård, J., Falch, C., et al.: Spatial audio object coding (saoc)-the upcoming MPEG standard on parametric object based audio coding. Audio Eng. Soc. Convention Audio Eng. Soc. 124, 613–627 (2008)
Google Scholar
Füg, S., Hoelzer, A., Borss, C., Ertel, C., Kratschmer, M., Plogsties, J.: Design, coding and processing of metadata for object-based interactive audio. In: Audio Engineering Society Convention 137. Audio Engineering Society (2014)
Google Scholar
Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: Mpeg-h 3D audio–the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)
Article Google Scholar
Coleman, P., Franck, A., Francombe, J., et al.: An audio-visual system for object-based audio: from recording to listening. IEEE Trans. Multimedia 20(8), 1919–1931 (2018)
Article Google Scholar
Shirley, B., Oldfield, R., et al.: Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J. Audio Eng. Soc. 63(4), 245–256 (2015)
Article Google Scholar
Oldfield, R., Shirley, B., Spille, J.: Object-based audio for interactive football broadcast. Multimedia Tools Appl. 74(8), 2717–2741 (2015)
Article Google Scholar
Kasuya, T., et al.: Livration: remote VR live platform with interactive 3D audio-visual service (2019)
Google Scholar
Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools and Appl. 78(15), 20723–20738 (2019)
Article Google Scholar
Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)
Article Google Scholar
Lee, B., Kim, K., Hahn, M.: Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans. Inf. Syst. 99(7), 1949–1952 (2016)
Article Google Scholar
ISO/IEC 23003–2:2010, MPEG-D (MPEG audio technologies), Part 2: Spatial audio object coding (2010)
Google Scholar
Koo, K., Kim, K., Seo, J., Kang, K., Hahn, M.: Variable subband analysis for high quality spatial audio object coding. In: 2008 10th International Conference on Advanced Communication Technology, vol. 2, pp. 1205–1208. IEEE (2008)
Google Scholar
Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)
Google Scholar
Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(6), 1082–1095 (2015)
Article Google Scholar
Jia, M., Zhang, J., Bao, C., Zheng, X.: A psychoacoustic-based multiple audio object coding approach via intra-object sparsity. Appl. Sci. 7(12), 1301 (2017)
Article Google Scholar
Villemoes, L., Hirvonen, T., Purnhagen, H.: Decorrelation for audio object coding. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 706–710. IEEE (2017)
Google Scholar
Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)
Article Google Scholar
Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention 146. Audio Engineering Society (2019)
Google Scholar
Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, Boston (2003). https://doi.org/10.1007/0-306-47815-3_5
Chapter Google Scholar
ADASP Homepage, QUASI database. http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/. Accessed 12 Mar 2012
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar
Recommendation ITU-R BS.1534-3. Method for the subjective assessment of intermediate quality level of coding systems (MUSHRA). In: Proceedings International Telecommunications Union, Switzerland (2015)
Google Scholar

Download references

Acknowledgement

This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61701194, No. U1736206).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China
Chenhao Hu, Ruimin Hu, Xiaochen Wang, Tingzhao Wu & Dengshi Li
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China
Chenhao Hu, Ruimin Hu, Xiaochen Wang & Tingzhao Wu
School of Mathematics and Computer, Jianghan University, Wuhan, China
Dengshi Li

Authors

Chenhao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tingzhao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dengshi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, C., Hu, R., Wang, X., Wu, T., Li, D. (2020). Multi-step Coding Structure of Spatial Audio Object Coding. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-37731-1_54
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics