Skip to main content

Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series (ICANN 2019)

Abstract

Two main approaches are currently prevalent in the digital emulation of musical instruments: manipulation of pre-recorded samples and techniques of real-time synthesis, generally based on physical models with varying degrees of accuracy. Concerning the first, while the processing power of present-day computers enables their use in real-time, many restrictions arising from this sample-based design persist; the huge on disk space requirements and the stiffness of musical articulations being the most prominent. On the other side of the spectrum, pure synthesis approaches, while offering greater flexibility, fail to capture and reproduce certain nuances central to the verisimilitude of the generated sound, offering a dry, synthetic output, at a high computational cost. We propose a method where ensembles of lightweight neural networks working in parallel are learned, from crafted frequency-domain features of an instrument sound spectra, an arbitrary instrument’s voice and articulations realistically and efficiently. We find that our method, while retaining perceptual sound quality on par with sampled approaches, exhibits 1/10 of latency times of industry standard real-time synthesis algorithms, and 1/100 of the disk space requirements of industry standard sample-based digital musical instruments. This method can, therefore, serve as a basis for more efficient implementations in dedicated devices, such as keyboards and electronic drumkits and in general purpose platforms, like desktops and tablets or open-source hardware like Arduino and Raspberry Pi. From a conceptual point of view, this work highlights the advantages of a closer integration of machine learning with other subjects, especially in the endeavor of new product development. Exploiting the synergy between neural networks, digital signal processing techniques and physical modelling, we illustrate the proposed method via the implementation of two virtual instruments: a conventional grand piano and a hibrid stringed instrument.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Medeiros, R., Calegario, F., Cabral, G., Ramalho, G.: Challenges in designing new interfaces for musical expression. In: Marcus, A. (ed.) DUXU 2014. LNCS, vol. 8517, pp. 643–652. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07668-3_62

    Chapter  Google Scholar 

  2. Emerson, G., Egermann, H.: Exploring the motivations for building new digital musical instruments. Musicae Scientiae, p. 102986491880298 (2018). https://doi.org/10.1177/1029864918802983

  3. Smith III, J.O.: Digital Waveguide Architectures for Virtual Musical Instruments. In: Havelock, D., Kuwano, S., Vorländer, M. (eds.) Handbook of Signal Processing in Acoustics, pp. 399–417. Springer, New York (2008). https://doi.org/10.1007/978-0-387-30441-0_25

    Chapter  Google Scholar 

  4. Li, B., Zhang, Y., Sainath, T., Wu, Y., Chan, W.: Bytes are all you need: end-to-end multilingual speech recognition and synthesis with bytes. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019). https://doi.org/10.1109/icassp.2019.8682674

  5. Koster, R., et al.: Big-loop recurrence within the hippocampal system supports integration of information across episodes. Neuron 99, 1342–1354.e6 (2018). https://doi.org/10.1016/j.neuron.2018.08.009

    Article  Google Scholar 

  6. Tobing, P.L., Hayashi, T., Wu, Y.-C., Kobayashi, K., Toda, T.: An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion. In: 2018 IEEE Spoken Language Technology Workshop (SLT) (2018). https://doi.org/10.1109/slt.2018.8639608

  7. NSynth: Neural Audio Synthesis, Google Brain team and DeepMind, 6 April 2017. https://magenta.tensorflow.org/nsynth. Accessed 20 July 2019

  8. Koenig, D.M.: Spectral Analysis of Musical Sounds with Emphasis on the Piano. Oxford University Press (2014). https://doi.org/10.1093/acprof:oso/9780198722908.001.0001

  9. Tung, B.: The Railsback curve, indicating the deviation between normal piano tuning and an equal-tempered scale, 7 June 2006. https://en.wikipedia.org/wiki/Piano_acoustics#/media/File:Railsback2.png. Accessed 20 July 2019

  10. Fletcher, H.: Normal vibration frequencies of a stiff piano string. J. Acoust. Soc. Am. 36, 203–209 (1964). https://doi.org/10.1121/1.1918933

    Article  Google Scholar 

  11. Tarjano, C.: Neurospectral Audio Synthesis Repository, Github, 15 July 2018. https://github.com/tesserato/neurospectral-audio-synthesis. Accessed 20 July 2019

  12. University of Iowa: University of Iowa Electronic Music Studio (1997). http://theremin.music.uiowa.edu/MIS.html. Accessed 20 July 2019

  13. Bilbao, S.: Numerical Sound Synthesis. Wiley, Chichester (2009). https://doi.org/10.1002/9780470749012

    Book  MATH  Google Scholar 

  14. Karjalainen, M., Erkut, C.: Digital waveguides versus finite difference structures: equivalence and mixed modeling. EURASIP J. Adv. Signal Process. 2004, 978–989 (2004). https://doi.org/10.1155/s1110865704401176

    Article  MATH  Google Scholar 

  15. Tarjano, C.: Neuro-Spectral Audio Synthesis, SoundCloud, 12 June 2018. https://soundcloud.com/carlos-tarjano/sets/spectral-neural-synthesis. Accessed 20 July 2019

  16. Pianoteq: Pianoteq, 04 July 2019. https://www.pianoteq.com/pianoteq6. Accessed 20 July 2019

  17. Gully, A.J., Yoshimura, T., Murphy, D.T., Hashimoto, K., Nankaku, Y., Tokuda, K.: Articulatory text-to-speech synthesis using the digital waveguide mesh driven by a deep neural network. In: Proceedings of the Interspeech 2017 (2017). https://doi.org/10.21437/Interspeech.2017-900

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Tarjano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tarjano, C., Pereira, V. (2019). Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30490-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30489-8

  • Online ISBN: 978-3-030-30490-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics