Skip to main content

Stacked Sparse Autoencoder for Audio Object Coding

  • Conference paper
  • First Online:
Book cover MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

Abstract

Compared with channel-based audio coding, the object-based audio coding has a definite advantage in meeting the user’s demands of personalized control. However, in the conventional Spatial Audio Object Coding (SAOC), each frame is divided into 28 sub-bands. All frequency points in one sub-band share the common parameter. Under the SAOC framework, the bitrate can be saved, but aliasing distortion is prone to occur, which will influence the listening experience of audiences. In order to obtain higher perceptual quality, we propose a Stacked Sparse Autoencoder (SSAE) pipeline as overlapped modules. Each module extracted the efficient feature of side information from its preceding module. Then we can reduce the dimensionality of side information parameters for saving bitrate, and well reconstruct audio objects, thereby providing favorable auditory perception. Compared with conventional SAOC, TS-SAOC, and SVD-SAOC, both objective and subjective results show that the proposed method can achieve the best sound quality of the output signal at the same bitrate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dolby Laboratories: Dolby ATMOS cinema specifications (2014). http://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-specifications.pdf

  2. Herre, J., Purnhagen, H., Koppens, J., Hellmuth, O., Engdegrd, J., Hilper, J.: Valero ML (2012) MPEG spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J. Audio Eng. Soc. 60(9), 655–673 (2012)

    Google Scholar 

  3. Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)

    Article  Google Scholar 

  4. Herre, J., Disch, S.: New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1894–1897. IEEE (2007)

    Google Scholar 

  5. Herre, J., et al.: Spatial audio coding: next-generation efficient and compatible coding of multichannel audio. In: Audio Engineering Society Convention, vol. 117 (2004)

    Google Scholar 

  6. Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)

    Article  Google Scholar 

  7. Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)

    Google Scholar 

  8. Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 1082–1095 (2015)

    Article  Google Scholar 

  9. Rohlfing, C., Cohen, J. E., Liutkus, A.: Very low bitrate spatial audio coding with dimensionality reduction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 741–745 (2017)

    Google Scholar 

  10. Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)

    Article  Google Scholar 

  11. Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools Appl. 78(15), 20723–20738 (2019)

    Article  Google Scholar 

  12. Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention, vol. 146. Audio Engineering Society (2019)

    Google Scholar 

  13. Hu, C., Hu, R., Wang, X., Wu, T., Li, D.: Multi-step coding structure of spatial audio object coding. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 666–678. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_54

    Chapter  Google Scholar 

  14. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(1), 1957–2000 (2010)

    MathSciNet  MATH  Google Scholar 

  15. Mariem, G., Ammar, L., Ridha, E., Mourad, Z.: Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl. 78, 2157–2179 (2019)

    Article  Google Scholar 

  16. Faller, C., Baumgarte, F.: Binaural cue coding-part II: schemes and applications. IEEE Trans. Speech Audio Process. 11(6), 520–531 (2003)

    Article  Google Scholar 

  17. Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184(5), 232–242 (2016)

    Article  Google Scholar 

  18. Liutkus, A., Fabian-Robert, S., Rafii, Z., Kitamura, D., Rivet, B.: The 2016 Signal Separation Evaluation Campaign (2017). https://sigsep.github.io/datasets/dsd100.html

  19. Fevotte, C., Gribonval R., Vincent, E.: BSS\(\_\)EVAL Toolbox User Guide. IRISA, Technical report 1706 (2005). http://www.irisa.fr/metiss/bss_eval/user_guide.pdf

  20. Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  21. ITU Radiocommunication Bureau: “BS.1534-3: Method for the subjective assessment of intermediate quality level of coding systems,” Recommendation ITUR BS. 1534 (2015)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Key R&D Program of China (No. 2017YFB1002803) and the National Nature Science Foundation of China (No. 61701194, No. U1736206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Y., Hu, R., Wang, X., Hu, C., Li, G. (2021). Stacked Sparse Autoencoder for Audio Object Coding. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67832-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67831-9

  • Online ISBN: 978-3-030-67832-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics