Skip to main content
Log in

Audio object coding based on optimal parameter frequency resolution

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object-based audio content is becoming the main form of audio content, because it is more interactive and flexible than traditional channel-based audio content. The Spatial Audio Object Coding (SAOC) method is proposed to encode multiple audio objects at low bitrate. However, SAOC extracts only a few parameters for each frame signal, which leads to low parameter frequency resolution. So the decoded signals have serious aliasing distortion which will destroy the sound quality. In this paper, we present a novel audio object coding method. We are the first to analyze how the signal distortion varies with parameter frequency resolution, and determine the optimal resolution to reduce aliasing distortion. In addition, we also achieve low coding bitrate by the dimensional reduction algorithm. Both the objective and subjective experiments confirm that the proposed method can provide higher sound quality of output signals than the state-of-the-art methods at equivalent bitrate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ando A (2011) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475

    Article  Google Scholar 

  2. Blauert J (1974) Spatial Hearing. MIT Press, Cambridge

    Google Scholar 

  3. Cañadas-Quesada FJ, Vera-Candeas P, Martinez-Munoz D, Ruiz-Reyes N, Carabias-Orti JJ, Cabanas-Molero P (2016) Constrained non-negative matrix factorization for score-informed piano music restoration. Digit Signal Process 50:240–257

    Article  Google Scholar 

  4. Elfitri I, Muharam M, Shobirin M (2014) Distortion analysis of hierarchical mixing technique on MPEG surround standard. In: IEEE international conference on advanced computer science and information systems (ICACSIS). IEEE. pp 396–400

  5. Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio. Speech Lang Process 19(7):2046–2057

    Article  Google Scholar 

  6. Fevotte C, Gribonval R, Vincent E (2005) BSS_EVAL Toolbox User Guide. IRISA, Tech. Rep. 1706, Available: http://www.irisa.fr/metiss/bss_eval/

  7. Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1894–1897

  8. Herre J, Purnhagen H, Koppens J, Hellmuth O, Engdegrd J, Hilper J, Valero ML (2012) MPEG Spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J Audio Eng Soc 60(9):655–673

    Google Scholar 

  9. Herre J, Hilpert J, Kuntz A, Plogsties J (2015) MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J Sel Top Signal Process 9 (5):770–779

    Article  Google Scholar 

  10. Hou J, Chen J, Chau LP, He Y (2016) Sparse two-dimensional singular value decomposition. IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

  11. ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), Part 2: Spatial Audio Object Coding

  12. ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), Part 3: 3D Audio

  13. Jia MS, Yang ZY, Bao CC, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23 (6):1082–1095

    Article  Google Scholar 

  14. Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216

    Article  Google Scholar 

  15. Krbz S, Ozerov A, Liutkus A, Girin L (2014) Perceptual coding-based informed source separation. Proceedings of the 22nd European Signal Processing Conference (EUSIPCO). IEEE, pp 959– 963

  16. Liutkus A, Pinel J, Badeau R, Girin L, Richard G (2012) Informed source separation through spectrogram coding and data embedding. Signal Process 92 (8):1937–1949

    Article  Google Scholar 

  17. Nikunen J, Virtanen T (2010) Object-based audio coding using non-negative matrix factorization for the spectrogram representation Audio Engineering Society Convention 128 (AES), Audio Engineering Society

  18. Ozerov A, Liutkus A, Badeau R, Richard G (2013) Coding-based informed source separation: Nonnegative tensor factorization approach. IEEE Trans Audio Speech Lang Process 21(8):1699–1712

    Article  Google Scholar 

  19. QUASI database - a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/

  20. Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. Proc. International Telecommunications Union, Switzerland

  21. Rodriguez-Serrano FJ, Ewert S, Vera-Candeas P, Sandler M (2016) A score-informed shift-invariant extension of complex matrix factorization for improving the separation of overlapped partials in music recordings. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 61–65

  22. Rohlfing C, Cohen JE, Liutkus A (2017) Very low bitrate spatial audio coding with dimensionality reduction. IEEE International Conference on Acoustics, Speech and signal processing (ICASSP). IEEE, pp 741–745

  23. Rufai AM, Anbarjafari G, Demirel H (2014) Lossy image compression using singular value decomposition and wavelet difference reduction. Digit Signal Process 24:117–123

    Article  Google Scholar 

  24. SAOC Verification Test Report. http://www.chiariglione.org/mpeg/working_documents/mpeg-d/sac/verification_tests.zip

  25. Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469

    Article  Google Scholar 

  26. Vincent E, Gribonval R, Pumbley M (2007) Oracle estimators for the benchmarking of source separation algorithms. Signal Process 87(8):1933–1950

    Article  MATH  Google Scholar 

  27. Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Proc Mag 31(3):107–115

    Article  Google Scholar 

  28. Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. a practical approach to microarray data analysis. Springer, US, pp 91–109

    Google Scholar 

  29. Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41

    Article  Google Scholar 

  30. Zhang SH, Girin L, Liutkus A (2013) Informed source separation from compressed mixtures using spatial Wiener filter and quantization noise estimation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 61–65

  31. Zheng X, Ritz C, Xi J (2013) Encoding navigable speech sources: A psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38

    Article  Google Scholar 

  32. Zheng X, Ritz C, Xi J (2016) Encoding and communicating navigable speech soundfields. Multimed Tools Appl 75(9):5183–5204

    Article  Google Scholar 

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206), Hubei Province Technological Innovation Major Project (No. 2016AAA015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, T., Hu, R., Wang, X. et al. Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78, 20723–20738 (2019). https://doi.org/10.1007/s11042-019-7409-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7409-7

Keywords

Navigation