Skip to main content
Log in

Audio object coding based on N-step residual compensating

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object-based audio techniques provide more flexibility and convenience for personalized rendering under various playback configurations. Many methods have been proposed to encode and transmit multiple audio objects at a low bit-rate. However, the recovered audio objects have severe frequency aliasing distortion, which will destroy the immersive sound quality. This paper describes a new structure to reduce every object’s aliasing distortion. In this method, we extract residual and gain parameters of all objects after N-step operation and use singular value decomposition to compress the residual matrices. The residual matrices can compensate for aliasing distortion in the decoding part. Moreover, we find a proper ordering strategy experimentally to determine the object coding order because it will affect the final decoded quality. From experiment results, the energy sorting strategy is chosen as the best ordering strategy, and the residual information bit-rate can be reduced from 14.11 kbps/per object to 5.87 kbps/per object. Compared with previous studies, our method gets better performance in objective and subjective experiments. The proposed N-step residual compensating structure can reduce every object’s aliasing distortion better than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475

    Article  Google Scholar 

  2. Breebaart J, Engdegard J, Falch C, et al (2008) Spatial audio object coding (saoc)-the upcoming mpeg standard on parametric object based audio coding. In: Audio Engineering Society convention, vol 124, pp 1–15

  3. Faller C (2006) Parametric joint-coding of audio sources. In: Audio Engineering Society convention, vol 120, pp 1–12

  4. Fevotte C, Gribonval R, Vincent E (2005) BSS-EVAL toolbox user guide. IRISA, Tech. Rep. 1706. Available: http://www.irisa.fr/metiss/bsseval/

  5. Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE international conference on multimedia and expo (ICME), pp 1894–1897

  6. ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), part 2: spatial audio object coding

  7. ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), part 3: 3D audio

  8. Jia MS, Yang ZY, Bao CC, Zheng XG, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23(6):1082–1095

    Article  Google Scholar 

  9. Kasuya T, Tsukada M, Komohara Y, Takasaka S, Mizuno T, Nomura Y, Ueda Y, Esaki H (2019) Livration: remote vr live platform with interactive 3d audiovisual service. In: IEEE games, entertainment, media conference (GEM), pp 1–7

  10. Kim C (2014) Object-based spatial audio: concept, advantages, and challenges. In: 3D future internet media. Springer, New York, pp 79–84

  11. Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216

    Article  Google Scholar 

  12. Koo K, Kim K, Seo J, Kang K, Hahn M (2008) Variable subband analysis for high quality spatial audio object coding. In: International conference on advanced communication technology (ICACT), pp 1205–1208

  13. Lathauwer D, Bart D, Joos V (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278

    Article  MathSciNet  Google Scholar 

  14. Lee B, Kim K, Hahn M (2016) Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans Inf Syst 99(7):1949–1952

    Article  Google Scholar 

  15. Michel D, Jean-Louis D, Thomas F, Gaël R, Olivier LB, Emmanuel V (2012) QUASI database—a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/

  16. Mikami T, Nakahara M, Someya K (2016) Compatibility study of Dolby Atmos objects’ spatial sound localization using a visualization method. In: Audio Engineering Society convention, vol 140, pp 1–4

  17. Oldfield R, Shirley B, Spille J (2015) Object-based audio for interactive football broadcast. Multimed Tools Appl 74(8):2717–2741

    Article  Google Scholar 

  18. Rafii Z, Liutkus A, Fabian-Robert S, Mimilakis S, Bittner R (2017) The MUSDB18 corpus for music separation. Available: https://sigsep.github.io/datasets/musdb.html

  19. Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. In: Proceedings of the international telecommunications union, Switzerland

  20. Scheirer E (1999) Structured audio and effects processing in the MPEG-4 multimedia standard. Multimed Syst 7(1):11–22

    Article  Google Scholar 

  21. Scheirer E, Väänänen R, Huopaniemi J (1998) AudioBIFS: The MPEG-4 standard for effects processing. In: Proceedings of the DAFX98 workshop on digital audio effects processing, pp 1–9

  22. Shirley B, Oldfield R (2015) Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J Audio Eng Soc 63(4):245–256

    Article  Google Scholar 

  23. Vannieuwenhoven N, Vandebril R, Meerbergen K (2012) A new truncation strategy for the higher-order singular value decomposition. SIAM J Sci Comput 34(2):1027–1052

    Article  MathSciNet  Google Scholar 

  24. Wall M, Rechtsteiner A, Rocha L (2003) Singular value decomposition and principal component analysis. Springer, Boston, pp 91–109

    Google Scholar 

  25. Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41

    Article  Google Scholar 

  26. Wu T, Hu RM, Wang XC, Ke SF (2019) Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78(15):20723–20738

    Article  Google Scholar 

  27. Zamani S, Rose K (2019) Spatial Audio Coding without Recourse to Background Signal Compression. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 720–724

  28. Zhang S, Wu X H, Qu TS (2019) Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society convention, vol 146, pp 1–10

  29. Zheng XG, Ritz C, Xi JT (2013) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38

    Article  Google Scholar 

  30. Zheng XG, Ritz C, Xi JT (2013) A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: International conference on acoustics, speech and signal processing (ICASSP), pp 281–285

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. 61762005, U1736206), Basic Research Project of Science and Technology Plan of Shenzhen (JCYJ20170818143246278).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, C., Wang, X., Hu, R. et al. Audio object coding based on N-step residual compensating. Multimed Tools Appl 80, 18717–18733 (2021). https://doi.org/10.1007/s11042-020-10339-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10339-0

Keywords

Navigation