Abstract
Compared with channel-based audio coding, the object-based audio coding has a definite advantage in meeting the user’s demands of personalized control. However, in the conventional Spatial Audio Object Coding (SAOC), each frame is divided into 28 sub-bands. All frequency points in one sub-band share the common parameter. Under the SAOC framework, the bitrate can be saved, but aliasing distortion is prone to occur, which will influence the listening experience of audiences. In order to obtain higher perceptual quality, we propose a Stacked Sparse Autoencoder (SSAE) pipeline as overlapped modules. Each module extracted the efficient feature of side information from its preceding module. Then we can reduce the dimensionality of side information parameters for saving bitrate, and well reconstruct audio objects, thereby providing favorable auditory perception. Compared with conventional SAOC, TS-SAOC, and SVD-SAOC, both objective and subjective results show that the proposed method can achieve the best sound quality of the output signal at the same bitrate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dolby Laboratories: Dolby ATMOS cinema specifications (2014). http://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-specifications.pdf
Herre, J., Purnhagen, H., Koppens, J., Hellmuth, O., Engdegrd, J., Hilper, J.: Valero ML (2012) MPEG spatial audio object coding - the ISO/MPEG standard for efficient coding of interactive audio scenes. J. Audio Eng. Soc. 60(9), 655–673 (2012)
Herre, J., Hilpert, J., Kuntz, A., Plogsties, J.: MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process. 9(5), 770–779 (2015)
Herre, J., Disch, S.: New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1894–1897. IEEE (2007)
Herre, J., et al.: Spatial audio coding: next-generation efficient and compatible coding of multichannel audio. In: Audio Engineering Society Convention, vol. 117 (2004)
Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans. Multimedia 13(6), 1208–1216 (2011)
Zheng, X., Ritz, C., Xi, J.: A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 281–285. IEEE (2013)
Jia, M., Yang, Z., Bao, C., Zheng, X., Ritz, C.: Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 1082–1095 (2015)
Rohlfing, C., Cohen, J. E., Liutkus, A.: Very low bitrate spatial audio coding with dimensionality reduction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 741–745 (2017)
Wu, T., Hu, R., Wang, X., Ke, S., Wang, J.: High quality audio object coding framework based on non-negative matrix factorization. China Commun. 14(9), 32–41 (2017)
Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimal parameter frequency resolution. Multimedia Tools Appl. 78(15), 20723–20738 (2019)
Zhang, S., Wu, X., Qu, T.: Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society Convention, vol. 146. Audio Engineering Society (2019)
Hu, C., Hu, R., Wang, X., Wu, T., Li, D.: Multi-step coding structure of spatial audio object coding. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 666–678. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_54
Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(1), 1957–2000 (2010)
Mariem, G., Ammar, L., Ridha, E., Mourad, Z.: Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl. 78, 2157–2179 (2019)
Faller, C., Baumgarte, F.: Binaural cue coding-part II: schemes and applications. IEEE Trans. Speech Audio Process. 11(6), 520–531 (2003)
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184(5), 232–242 (2016)
Liutkus, A., Fabian-Robert, S., Rafii, Z., Kitamura, D., Rivet, B.: The 2016 Signal Separation Evaluation Campaign (2017). https://sigsep.github.io/datasets/dsd100.html
Fevotte, C., Gribonval R., Vincent, E.: BSS\(\_\)EVAL Toolbox User Guide. IRISA, Technical report 1706 (2005). http://www.irisa.fr/metiss/bss_eval/user_guide.pdf
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
ITU Radiocommunication Bureau: “BS.1534-3: Method for the subjective assessment of intermediate quality level of coding systems,” Recommendation ITUR BS. 1534 (2015)
Acknowledgment
This work was supported by the National Key R&D Program of China (No. 2017YFB1002803) and the National Nature Science Foundation of China (No. 61701194, No. U1736206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Y., Hu, R., Wang, X., Hu, C., Li, G. (2021). Stacked Sparse Autoencoder for Audio Object Coding. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-67832-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)