skip to main content
10.1145/3441233.3441240acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssipConference Proceedingsconference-collections
research-article

Music Upscaling Using Convolutional Neural Networks

Published: 06 March 2021 Publication History

Abstract

Audio upscaling with generative neural networks has been studied in the fields of super-resolution and speech bandwidth expansion. Previous approaches have worked well for speech, but not for music. We propose a convolutional neural network approach with a novel dilated and residual architecture for this domain and an additional refinement method which outperforms the cubic spline baseline when upscaling music according to a spectral distance error metric.

References

[1]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2015. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38, 2(2015), 295–307.
[2]
Per Ekstrand. 2002. Bandwidth extension of audio signals by spectral band replication. In in Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA’02. Citeseer.
[3]
Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. 2002. Applying LSTM to time series predictable through time-window approaches. In Neural Nets WIRN Vietri-01. Springer, 193–200.
[4]
Md Rashidul Hasan, Mustafa Jamil, MGRMS Rahman, 2004. Speaker identification using mel frequency cepstral coefficients. variations 1, 4 (2004).
[5]
Satoshi Imai. 1983. Cepstral analysis synthesis on the mel frequency scale. In ICASSP’83. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 8. IEEE, 93–96.
[6]
Satoshi Imai, Kazuo Sumita, and Chieko Furuichi. 1983. Mel log spectrum approximation (MLSA) filter for speech synthesis. Electronics and Communications in Japan (Part I: Communications) 66, 2(1983), 10–18.
[7]
Bernd Iser and Gerhard Schmidt. 2003. Neural networks versus codebooks in an application for bandwidth extension of speech signals. In Eighth European Conference on Speech Communication and Technology.
[8]
Jui-Hsin Lai, Chieh-Chi Kao, and Shao-Yi Chien. 2009. Super-resolution sprite with foreground removal. In 2009 IEEE International Conference on Multimedia and Expo. IEEE, 1306–1309.
[9]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681–4690.
[10]
Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. 2011. Acoustic modeling using deep belief networks. IEEE transactions on audio, speech, and language processing 20, 1(2011), 14–22.
[11]
Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759(2016).
[12]
Kun-Youl Park and Hyung Soon Kim. 2000. Narrowband to wideband conversion of speech using GMM based transformation. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 00CH37100), Vol. 3. IEEE, 1843–1846.
[13]
Se Rim Park and Jinwon Lee. 2016. A fully convolutional neural network for speech enhancement. arXiv preprint arXiv:1609.07132(2016).
[14]
Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1874–1883.
[15]
Sasha Targ, Diogo Almeida, and Kevin Lyman. 2016. Resnet in resnet: Generalizing residual architectures. arXiv preprint arXiv:1603.08029(2016).
[16]
Tong Tong, Gen Li, Xiejie Liu, and Qinquan Gao. 2017. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision. 4799–4807.
[17]
Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. CoRR abs/1609.03499(2016). arxiv:1609.03499http://arxiv.org/abs/1609.03499
[18]
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. 2018. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV). 0–0.
[19]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146(2016).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSIP '20: Proceedings of the 2020 3rd International Conference on Sensors, Signal and Image Processing
October 2020
95 pages
ISBN:9781450388283
DOI:10.1145/3441233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Algorithms
  2. Neural Networks
  3. Signals

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SSIP 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 48
    Total Downloads
  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media