Skip to main content

Low Bitrates Audio Bandwidth Extension Using a Deep Auto-Encoder

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing -- PCM 2015 (PCM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9314))

Included in the following conference series:

Abstract

Modern audio coding technologies apply methods of bandwidth extension (BWE) to efficiently represent audio data at low bitrates. An established method is the well-known spectral band replication (SBR) that can provide the very high sound quality with imperceptible artifact. However, its bitrates and complexity are very high. Another great method is LPC-based BWE, which is part of 3GPP AMR-WB+ codec. Although its bitrates and complexity are reduced distinctly, the sound quality it provided is unsatisfactory for music. In this paper, a novel bandwidth extension method is proposed which provided the high sound quality close to eSBR, with only 0.8 kbps bitrates. The proposed method predicts the fine structure of high frequency band from low frequency band by a deep auto-encoder, and only extracts the envelope of high frequency as side information. The performance evaluation demonstrates the advantage of the proposed method compared to the state of the art. Compared with eSBR, the bitrates drop about 63 %, and the subjective listening quality is close to it. Compared with LPC-based BWE, the subjective listening quality is better than it with the same bitrates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Larsen, E., Aarts, R.M.: Audio bandwidth extension: application of psychoacoustics. In: Signal Processing and Loudspeaker Design, pp. 113–117. Wiley, Hoboken (2005)

    Google Scholar 

  2. Geiser, B., Jax, P., Vary, P., et al.: Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G. 729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)

    Article  Google Scholar 

  3. Makinen, J., Bessette, B., Bruhn, S., et al.: AMR-WB+: a new audio coding standard for 3rd generation mobile audio services. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1109–1112. IEEE Press, Philadelphia, USA, 19–23 March 2005

    Google Scholar 

  4. Dietz, M., Liljeryd, L., Kjorling, K., et al.: Spectral band replication, a novel approach in audio coding. In: Proceedings of the 112th Audio Engineering Society Convention, pp. 1–8. Audio Engineering Society press, Munich, Germany, 10–13 May 2002

    Google Scholar 

  5. Ekstrand, P.: Bandwidth extension of audio signals by spectral band replication. In: Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA), pp. 53–58. IEEE Press, Leuven, Belg, 15 November 2002

    Google Scholar 

  6. Neukam, C., Nagel, F., Schuller, G., et al.: A MDCT based harmonic spectral bandwidth extension method. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 566–570. IEEE Press, Vancouver, Canada, 26–31 May 2013

    Google Scholar 

  7. ISO/IEC 14496–3:2001/Amd1:2004. Bandwidth extension

    Google Scholar 

  8. Schnell, M., Geiger, R., Schmidt, M., et al.: Enhanced MPEG-4 low delay AAC-Low bitrate high quality communication. In: Proceedings of the 122nd Audio Engineering Society Convention, pp. 1211–1223. Audio Engineering Society Press, Vienna, Austria, 5–8 May 2007

    Google Scholar 

  9. Max, N., Markus, M., Nikolaus, R., et al.: MPEG unified speech and audio coding – the ISO/MPEG standard for high-efficiency audio coding of all content types. In: 132nd Audio Engineering Society Convention, pp. 248–269. Budapest, Hungary, 26–29 April 2012

    Google Scholar 

  10. Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Deng, L., Seltzer, M., et al.: Binary coding of speech spectrograms using a deep auto-encoder. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1692–1695. IEEE Press, Makuhari, Chiba, Japan, 26–30 September 2010

    Google Scholar 

  12. Mohamed, A., George, E.D., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)

    Article  Google Scholar 

  13. Ling, Z.-H., Deng, L., Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 21(10), 2129–2139 (2013)

    Article  Google Scholar 

  14. Liu, C.M., Hsu, H.W., Lee, W.C.: Compression artifacts in perceptual audio coding. IEEE Trans. Audio Speech Lang. Process. 16(4), 681–695 (2008)

    Article  Google Scholar 

  15. Kim, K.T., Choi, J.Y., Kang, H.G.: Perceptual relevance of the temporal envelope to the speech signal in the 4–7 kHz band. J. Acoust. Soc. Am. 122(3), EL88–EL88 (2007)

    Article  Google Scholar 

  16. ITU-R BS.1534-1, MUSHRA. International Telecommunications Union, Geneva, Switzerland (2001–2003)

    Google Scholar 

Download references

Acknowledgments

The research was supported by National Nature Science Foundation of China (No. 61231015); National High Technology Research and Development Program of China (863 Program) No. 2015AA016306; National Nature Science Foundation of China (No. 61102127, 61201340, 61201169, 61471271), Guangdong-Hongkong Key Domain Breakthrough Project of China (No. 2012A090200007), and The major Science and Technology Innovation Plan of Hubei Province (No. 2013AAA020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, L., Hu, R., Wang, X., Zhang, M. (2015). Low Bitrates Audio Bandwidth Extension Using a Deep Auto-Encoder. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24075-6_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24074-9

  • Online ISBN: 978-3-319-24075-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics