Stacked Sparse Autoencoder for Audio Object Coding

Wu, Yulin; Hu, Ruimin; Wang, Xiaochen; Hu, Chenhao; Li, Gang

doi:10.1007/978-3-030-67832-6_5

Yulin Wu¹⁵,
Ruimin Hu^15,16,
Xiaochen Wang^15,17,
Chenhao Hu¹⁵ &
…
Gang Li¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

2526 Accesses
3 Citations

Abstract

Compared with channel-based audio coding, the object-based audio coding has a definite advantage in meeting the user’s demands of personalized control. However, in the conventional Spatial Audio Object Coding (SAOC), each frame is divided into 28 sub-bands. All frequency points in one sub-band share the common parameter. Under the SAOC framework, the bitrate can be saved, but aliasing distortion is prone to occur, which will influence the listening experience of audiences. In order to obtain higher perceptual quality, we propose a Stacked Sparse Autoencoder (SSAE) pipeline as overlapped modules. Each module extracted the efficient feature of side information from its preceding module. Then we can reduce the dimensionality of side information parameters for saving bitrate, and well reconstruct audio objects, thereby providing favorable auditory perception. Compared with conventional SAOC, TS-SAOC, and SVD-SAOC, both objective and subjective results show that the proposed method can achieve the best sound quality of the output signal at the same bitrate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dolby Laboratories: Dolby ATMOS cinema specifications (2014). http://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-specifications.pdf
Herre, J., Purnhagen, H., Koppens, J., Hellmuth, O., Engdegrd, J., Hilper, J.: Valero ML (2012) MPEG spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J. Audio Eng. Soc. 60(9), 655–673 (2012)
Google Scholar
Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)
Article Google Scholar
Herre, J., Disch, S.: New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1894–1897. IEEE (2007)
Google Scholar
Herre, J., et al.: Spatial audio coding: next-generation efficient and compatible coding of multichannel audio. In: Audio Engineering Society Convention, vol. 117 (2004)
Google Scholar
Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)
Article Google Scholar
Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)
Google Scholar
Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 1082–1095 (2015)
Article Google Scholar
Rohlfing, C., Cohen, J. E., Liutkus, A.: Very low bitrate spatial audio coding with dimensionality reduction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 741–745 (2017)
Google Scholar
Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)
Article Google Scholar
Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools Appl. 78(15), 20723–20738 (2019)
Article Google Scholar
Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention, vol. 146. Audio Engineering Society (2019)
Google Scholar
Hu, C., Hu, R., Wang, X., Wu, T., Li, D.: Multi-step coding structure of spatial audio object coding. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 666–678. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_54
Chapter Google Scholar
Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(1), 1957–2000 (2010)
MathSciNet MATH Google Scholar
Mariem, G., Ammar, L., Ridha, E., Mourad, Z.: Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl. 78, 2157–2179 (2019)
Article Google Scholar
Faller, C., Baumgarte, F.: Binaural cue coding-part II: schemes and applications. IEEE Trans. Speech Audio Process. 11(6), 520–531 (2003)
Article Google Scholar
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184(5), 232–242 (2016)
Article Google Scholar
Liutkus, A., Fabian-Robert, S., Rafii, Z., Kitamura, D., Rivet, B.: The 2016 Signal Separation Evaluation Campaign (2017). https://sigsep.github.io/datasets/dsd100.html
Fevotte, C., Gribonval R., Vincent, E.: BSS\(\_\)EVAL Toolbox User Guide. IRISA, Technical report 1706 (2005). http://www.irisa.fr/metiss/bss_eval/user_guide.pdf
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar
ITU Radiocommunication Bureau: “BS.1534-3: Method for the subjective assessment of intermediate quality level of coding systems,” Recommendation ITUR BS. 1534 (2015)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Key R&D Program of China (No. 2017YFB1002803) and the National Nature Science Foundation of China (No. 61701194, No. U1736206).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China
Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu & Gang Li
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China
Ruimin Hu
Collaborative Innovation Center of Geospatial Technology, Wuhan, China
Xiaochen Wang

Authors

Yulin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chenhao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Hu, R., Wang, X., Hu, C., Li, G. (2021). Stacked Sparse Autoencoder for Audio Object Coding. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_5
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics