Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks

Tarjano, Carlos; Pereira, Valdecy

doi:10.1007/978-3-030-30490-4_30

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11730))

Included in the following conference series:

International Conference on Artificial Neural Networks

4541 Accesses

Abstract

Two main approaches are currently prevalent in the digital emulation of musical instruments: manipulation of pre-recorded samples and techniques of real-time synthesis, generally based on physical models with varying degrees of accuracy. Concerning the first, while the processing power of present-day computers enables their use in real-time, many restrictions arising from this sample-based design persist; the huge on disk space requirements and the stiffness of musical articulations being the most prominent. On the other side of the spectrum, pure synthesis approaches, while offering greater flexibility, fail to capture and reproduce certain nuances central to the verisimilitude of the generated sound, offering a dry, synthetic output, at a high computational cost. We propose a method where ensembles of lightweight neural networks working in parallel are learned, from crafted frequency-domain features of an instrument sound spectra, an arbitrary instrument’s voice and articulations realistically and efficiently. We find that our method, while retaining perceptual sound quality on par with sampled approaches, exhibits 1/10 of latency times of industry standard real-time synthesis algorithms, and 1/100 of the disk space requirements of industry standard sample-based digital musical instruments. This method can, therefore, serve as a basis for more efficient implementations in dedicated devices, such as keyboards and electronic drumkits and in general purpose platforms, like desktops and tablets or open-source hardware like Arduino and Raspberry Pi. From a conceptual point of view, this work highlights the advantages of a closer integration of machine learning with other subjects, especially in the endeavor of new product development. Exploiting the synergy between neural networks, digital signal processing techniques and physical modelling, we illustrate the proposed method via the implementation of two virtual instruments: a conventional grand piano and a hibrid stringed instrument.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Medeiros, R., Calegario, F., Cabral, G., Ramalho, G.: Challenges in designing new interfaces for musical expression. In: Marcus, A. (ed.) DUXU 2014. LNCS, vol. 8517, pp. 643–652. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07668-3_62
Chapter Google Scholar
Emerson, G., Egermann, H.: Exploring the motivations for building new digital musical instruments. Musicae Scientiae, p. 102986491880298 (2018). https://doi.org/10.1177/1029864918802983
Smith III, J.O.: Digital Waveguide Architectures for Virtual Musical Instruments. In: Havelock, D., Kuwano, S., Vorländer, M. (eds.) Handbook of Signal Processing in Acoustics, pp. 399–417. Springer, New York (2008). https://doi.org/10.1007/978-0-387-30441-0_25
Chapter Google Scholar
Li, B., Zhang, Y., Sainath, T., Wu, Y., Chan, W.: Bytes are all you need: end-to-end multilingual speech recognition and synthesis with bytes. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019). https://doi.org/10.1109/icassp.2019.8682674
Koster, R., et al.: Big-loop recurrence within the hippocampal system supports integration of information across episodes. Neuron 99, 1342–1354.e6 (2018). https://doi.org/10.1016/j.neuron.2018.08.009
Article Google Scholar
Tobing, P.L., Hayashi, T., Wu, Y.-C., Kobayashi, K., Toda, T.: An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion. In: 2018 IEEE Spoken Language Technology Workshop (SLT) (2018). https://doi.org/10.1109/slt.2018.8639608
NSynth: Neural Audio Synthesis, Google Brain team and DeepMind, 6 April 2017. https://magenta.tensorflow.org/nsynth. Accessed 20 July 2019
Koenig, D.M.: Spectral Analysis of Musical Sounds with Emphasis on the Piano. Oxford University Press (2014). https://doi.org/10.1093/acprof:oso/9780198722908.001.0001
Tung, B.: The Railsback curve, indicating the deviation between normal piano tuning and an equal-tempered scale, 7 June 2006. https://en.wikipedia.org/wiki/Piano_acoustics#/media/File:Railsback2.png. Accessed 20 July 2019
Fletcher, H.: Normal vibration frequencies of a stiff piano string. J. Acoust. Soc. Am. 36, 203–209 (1964). https://doi.org/10.1121/1.1918933
Article Google Scholar
Tarjano, C.: Neurospectral Audio Synthesis Repository, Github, 15 July 2018. https://github.com/tesserato/neurospectral-audio-synthesis. Accessed 20 July 2019
University of Iowa: University of Iowa Electronic Music Studio (1997). http://theremin.music.uiowa.edu/MIS.html. Accessed 20 July 2019
Bilbao, S.: Numerical Sound Synthesis. Wiley, Chichester (2009). https://doi.org/10.1002/9780470749012
Book MATH Google Scholar
Karjalainen, M., Erkut, C.: Digital waveguides versus finite difference structures: equivalence and mixed modeling. EURASIP J. Adv. Signal Process. 2004, 978–989 (2004). https://doi.org/10.1155/s1110865704401176
Article MATH Google Scholar
Tarjano, C.: Neuro-Spectral Audio Synthesis, SoundCloud, 12 June 2018. https://soundcloud.com/carlos-tarjano/sets/spectral-neural-synthesis. Accessed 20 July 2019
Pianoteq: Pianoteq, 04 July 2019. https://www.pianoteq.com/pianoteq6. Accessed 20 July 2019
Gully, A.J., Yoshimura, T., Murphy, D.T., Hashimoto, K., Nankaku, Y., Tokuda, K.: Articulatory text-to-speech synthesis using the digital waveguide mesh driven by a deep neural network. In: Proceedings of the Interspeech 2017 (2017). https://doi.org/10.21437/Interspeech.2017-900

Download references

Author information

Authors and Affiliations

Universidade Federal Fluminense, Niterói, RJ, Brazil
Carlos Tarjano & Valdecy Pereira

Authors

Carlos Tarjano
View author publications
You can also search for this author in PubMed Google Scholar
Valdecy Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Tarjano .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tarjano, C., Pereira, V. (2019). Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-30490-4_30
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics