Skip to main content

Multi-step Coding Structure of Spatial Audio Object Coding

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

Abstract

The spatial audio object coding (SAOC) is an effective meth-od which compresses multiple audio objects and provides flexibility for personalized rendering in interactive services. It divides each frame signal into 28 sub-bands and extracts one set object spatial parameters for each sub-band. Objects can be coded into a downmix signal and a few parameters by this way. However, using same parameters in one sub-band will cause frequency aliasing distortion, which seriously impacts listening experience. Existing studies to improve SAOC cannot guarantee that all audio objects can be decoded well. This paper describes a new multi-step object coding structure to efficient calculate residual of each object as additional side information to compensate the aliasing distortion of each object. In this multi-step structure, a sorting strategy based on sub-band energy of each object is proposed to determine which audio object should be encoded in each step, because the object encoding order will affect the final decoded quality. The singular value decomposition (SVD) is used to reduce the increasing bit-rate due to the added side information. From the experiment results, the performance of proposed method is better than SAOC and SAOC-TSC, and each object can be decoded well with respect to the bit-rate and the sound quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breebaart, J., Engdegård, J., Falch, C., et al.: Spatial audio object coding (saoc)-the upcoming MPEG standard on parametric object based audio coding. Audio Eng. Soc. Convention Audio Eng. Soc. 124, 613–627 (2008)

    Google Scholar 

  2. Füg, S., Hoelzer, A., Borss, C., Ertel, C., Kratschmer, M., Plogsties, J.: Design, coding and processing of metadata for object-based interactive audio. In: Audio Engineering Society Convention 137. Audio Engineering Society (2014)

    Google Scholar 

  3. Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: Mpeg-h 3D audio–the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)

    Article  Google Scholar 

  4. Coleman, P., Franck, A., Francombe, J., et al.: An audio-visual system for object-based audio: from recording to listening. IEEE Trans. Multimedia 20(8), 1919–1931 (2018)

    Article  Google Scholar 

  5. Shirley, B., Oldfield, R., et al.: Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J. Audio Eng. Soc. 63(4), 245–256 (2015)

    Article  Google Scholar 

  6. Oldfield, R., Shirley, B., Spille, J.: Object-based audio for interactive football broadcast. Multimedia Tools Appl. 74(8), 2717–2741 (2015)

    Article  Google Scholar 

  7. Kasuya, T., et al.: Livration: remote VR live platform with interactive 3D audio-visual service (2019)

    Google Scholar 

  8. Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools and Appl. 78(15), 20723–20738 (2019)

    Article  Google Scholar 

  9. Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)

    Article  Google Scholar 

  10. Lee, B., Kim, K., Hahn, M.: Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans. Inf. Syst. 99(7), 1949–1952 (2016)

    Article  Google Scholar 

  11. ISO/IEC 23003–2:2010, MPEG-D (MPEG audio technologies), Part 2: Spatial audio object coding (2010)

    Google Scholar 

  12. Koo, K., Kim, K., Seo, J., Kang, K., Hahn, M.: Variable subband analysis for high quality spatial audio object coding. In: 2008 10th International Conference on Advanced Communication Technology, vol. 2, pp. 1205–1208. IEEE (2008)

    Google Scholar 

  13. Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)

    Google Scholar 

  14. Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(6), 1082–1095 (2015)

    Article  Google Scholar 

  15. Jia, M., Zhang, J., Bao, C., Zheng, X.: A psychoacoustic-based multiple audio object coding approach via intra-object sparsity. Appl. Sci. 7(12), 1301 (2017)

    Article  Google Scholar 

  16. Villemoes, L., Hirvonen, T., Purnhagen, H.: Decorrelation for audio object coding. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 706–710. IEEE (2017)

    Google Scholar 

  17. Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)

    Article  Google Scholar 

  18. Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention 146. Audio Engineering Society (2019)

    Google Scholar 

  19. Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, Boston (2003). https://doi.org/10.1007/0-306-47815-3_5

    Chapter  Google Scholar 

  20. ADASP Homepage, QUASI database. http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/. Accessed 12 Mar 2012

  21. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech, Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  22. Recommendation ITU-R BS.1534-3. Method for the subjective assessment of intermediate quality level of coding systems (MUSHRA). In: Proceedings International Telecommunications Union, Switzerland (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Key R&D Program of China (No. 2017YFB1002803) and National Nature Science Foundation of China (No. 61701194, No. U1736206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, C., Hu, R., Wang, X., Wu, T., Li, D. (2020). Multi-step Coding Structure of Spatial Audio Object Coding. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37731-1_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37730-4

  • Online ISBN: 978-3-030-37731-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics