Skip to main content
Log in

Pitch contours curve frequency domain fitting with vocabulary matching based music generation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a whole new perspective on generating music. The method proposed in this paper is the first to be used which uses the frequency domain characteristics of pitch contour curve to generate music melody with long-term structure controllable. The music generated by this method has a good long-term structure that other basic music generation methods do not have. This method has great development potential and application ability, can be combined with other music generation methods, and improve the performance of long-term structure. This method firstly uses the neural network to fit the pitch contour curve in frequency domain, then combines the vocabulary matching method to perfect the detailed characteristics of melody generated in time domain and control the long-term trend of notes generated with respect to label information, finally generates music melody with real and controllable long-term structure. Through a large number of experiments, it can be seen that compared with the music generated based on the LSTM, the music generated by the proposed method has better long-term structure and has similar statistical characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig 1.
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Ammari T, Kaye J, Tsai JY, Bentley F (2019) Music, search, and IoT: how people (really) use voice assistants. ACM Transactions on Computer-Human Interaction (ATOCHI) 26(3):1–28

  2. Bittner RM, Salamon J, Bosch JJ, Bello JP (2017) Pitch contours as a mid-level representation for music informatics. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society

  3. Cai W, Wei Z (2020) PiiGAN: generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463

    Article  Google Scholar 

  4. Chen CJ (2014) U.S. Patent No. 8,886,539. Washington, DC: U.S. Patent and Trademark Office

  5. Chen K, Zhang W, Dubnov S, Xia G, Li W (2019) The effect of explicit structure encoding of deep neural networks for symbolic music generation. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), 77-84

  6. Conklin D (2003) Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences, 30-35

  7. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301

    Article  MathSciNet  Google Scholar 

  8. Dong HW, Hsiao WY, Yang LC, Yang YH (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence

  9. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, ⋯, Bengio Y (2014) Generative adversarial nets. In Advances in neural information processing systems, 2672–2680

  10. Hadjeres G, Nielsen F, Pachet F (2017) GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–7

  11. Hiller LA, Isaacson LM (1959) Experimental Music: Composition with an Electronic Computer. McGraw-Hill Publishing Company, London

  12. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  13. Hsieh TH, Su L, Yang YH (2019) A streamlined encoder/decoder architecture for melody extraction. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 156–160

  14. Jeon W, Ma C (2011) Efficient search of music pitch contours using wavelet transforms and segmented dynamic time warping. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2304–2307

  15. Jiang J, Xia GG, Carlton DB, Anderson CN, Miyakawa RH (2020) Transformer VAE: a hierarchical model for structure-aware and interpretable music representation learning. In ICASSP 2020–-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 516–520

  16. Keerti G, Vaishnavi AN, Mukherjee P, Vidya AS, Sreenithya GS, Nayab D (2020) Attentional networks for music generation. arXiv preprint arXiv:2002.03854

  17. Lim H, Rhyu S, Lee K (2017) Chord generation from symbolic melody using BLSTM networks. arXiv preprint arXiv:1712.01011

  18. Mangal S, Modak R, Joshi P (2019) LSTM based music generation system. arXiv preprint arXiv:1908.01080

  19. Ouyang P, Yin S, Wei S (2017) A fast and power efficient architecture to parallelize LSTM based RNN for cognitive intelligence applications. In Proceedings of the 54th Annual Design Automation Conference 2017, 1-6

  20. Razavi A, van den Oord A, Vinyals O (2019) Generating diverse high-fidelity images with vq-vae-2. In 2019 Conference on Neural Information Processing Systems (NIPS):14837–14847–14847

  21. Salamon J, Gómez E (2012) Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans Audio Speech Lang Process 20(6):1759–1770

    Article  Google Scholar 

  22. Salamon J, Peeters G, Röbel A (2012) Statistical characterisation of melodic pitch contours and its application for melody extraction. In ISMIR, 187–192

  23. Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint Deeping learning model. IEEE Access 8:71353–71363

    Article  Google Scholar 

  24. Wu J, Hu C, Wang Y, Hu X, Zhu J (2019) A hierarchical recurrent neural network for symbolic melody generation. IEEE Transactions on Cybernetics 50(6):2749–2757

  25. Yamshchikov IP, Tikhonov A (2017) Music generation with variational recurrent autoencoder supported by history. arXiv preprint arXiv:1705.05458

  26. Yang LC, Chou SY, Yang YH (2017) MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847

  27. You H, Tian S, Yu L, Lv Y (2019) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songhao Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

figure 20

Examples of the spectrum

Appendix 2

figure 21

Long-term structure of label-used generation

Appendix 3

figure 22

The spectrum of frequency domain after vocabulary matching

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lang, R., Zhu, S. & Wang, D. Pitch contours curve frequency domain fitting with vocabulary matching based music generation. Multimed Tools Appl 80, 28463–28486 (2021). https://doi.org/10.1007/s11042-021-11049-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11049-x

Keywords

Navigation