Abstract
Modern audio coding technologies apply methods of bandwidth extension (BWE) to efficiently represent audio data at low bitrates. An established method is the well-known spectral band replication (SBR) that can provide the very high sound quality with imperceptible artifact. However, its bitrates and complexity are very high. Another great method is LPC-based BWE, which is part of 3GPP AMR-WB+ codec. Although its bitrates and complexity are reduced distinctly, the sound quality it provided is unsatisfactory for music. In this paper, a novel bandwidth extension method is proposed which provided the high sound quality close to eSBR, with only 0.8 kbps bitrates. The proposed method predicts the fine structure of high frequency band from low frequency band by a deep auto-encoder, and only extracts the envelope of high frequency as side information. The performance evaluation demonstrates the advantage of the proposed method compared to the state of the art. Compared with eSBR, the bitrates drop about 63 %, and the subjective listening quality is close to it. Compared with LPC-based BWE, the subjective listening quality is better than it with the same bitrates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Larsen, E., Aarts, R.M.: Audio bandwidth extension: application of psychoacoustics. In: Signal Processing and Loudspeaker Design, pp. 113–117. Wiley, Hoboken (2005)
Geiser, B., Jax, P., Vary, P., et al.: Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G. 729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)
Makinen, J., Bessette, B., Bruhn, S., et al.: AMR-WB+: a new audio coding standard for 3rd generation mobile audio services. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1109–1112. IEEE Press, Philadelphia, USA, 19–23 March 2005
Dietz, M., Liljeryd, L., Kjorling, K., et al.: Spectral band replication, a novel approach in audio coding. In: Proceedings of the 112th Audio Engineering Society Convention, pp. 1–8. Audio Engineering Society press, Munich, Germany, 10–13 May 2002
Ekstrand, P.: Bandwidth extension of audio signals by spectral band replication. In: Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA), pp. 53–58. IEEE Press, Leuven, Belg, 15 November 2002
Neukam, C., Nagel, F., Schuller, G., et al.: A MDCT based harmonic spectral bandwidth extension method. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 566–570. IEEE Press, Vancouver, Canada, 26–31 May 2013
ISO/IEC 14496–3:2001/Amd1:2004. Bandwidth extension
Schnell, M., Geiger, R., Schmidt, M., et al.: Enhanced MPEG-4 low delay AAC-Low bitrate high quality communication. In: Proceedings of the 122nd Audio Engineering Society Convention, pp. 1211–1223. Audio Engineering Society Press, Vienna, Austria, 5–8 May 2007
Max, N., Markus, M., Nikolaus, R., et al.: MPEG unified speech and audio coding – the ISO/MPEG standard for high-efficiency audio coding of all content types. In: 132nd Audio Engineering Society Convention, pp. 248–269. Budapest, Hungary, 26–29 April 2012
Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Deng, L., Seltzer, M., et al.: Binary coding of speech spectrograms using a deep auto-encoder. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1692–1695. IEEE Press, Makuhari, Chiba, Japan, 26–30 September 2010
Mohamed, A., George, E.D., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)
Ling, Z.-H., Deng, L., Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 21(10), 2129–2139 (2013)
Liu, C.M., Hsu, H.W., Lee, W.C.: Compression artifacts in perceptual audio coding. IEEE Trans. Audio Speech Lang. Process. 16(4), 681–695 (2008)
Kim, K.T., Choi, J.Y., Kang, H.G.: Perceptual relevance of the temporal envelope to the speech signal in the 4–7 kHz band. J. Acoust. Soc. Am. 122(3), EL88–EL88 (2007)
ITU-R BS.1534-1, MUSHRA. International Telecommunications Union, Geneva, Switzerland (2001–2003)
Acknowledgments
The research was supported by National Nature Science Foundation of China (No. 61231015); National High Technology Research and Development Program of China (863 Program) No. 2015AA016306; National Nature Science Foundation of China (No. 61102127, 61201340, 61201169, 61471271), Guangdong-Hongkong Key Domain Breakthrough Project of China (No. 2012A090200007), and The major Science and Technology Innovation Plan of Hubei Province (No. 2013AAA020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, L., Hu, R., Wang, X., Zhang, M. (2015). Low Bitrates Audio Bandwidth Extension Using a Deep Auto-Encoder. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-24075-6_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)