Abstract
In this paper, we present a whole new perspective on generating music. The method proposed in this paper is the first to be used which uses the frequency domain characteristics of pitch contour curve to generate music melody with long-term structure controllable. The music generated by this method has a good long-term structure that other basic music generation methods do not have. This method has great development potential and application ability, can be combined with other music generation methods, and improve the performance of long-term structure. This method firstly uses the neural network to fit the pitch contour curve in frequency domain, then combines the vocabulary matching method to perfect the detailed characteristics of melody generated in time domain and control the long-term trend of notes generated with respect to label information, finally generates music melody with real and controllable long-term structure. Through a large number of experiments, it can be seen that compared with the music generated based on the LSTM, the music generated by the proposed method has better long-term structure and has similar statistical characteristics.
Similar content being viewed by others
References
Ammari T, Kaye J, Tsai JY, Bentley F (2019) Music, search, and IoT: how people (really) use voice assistants. ACM Transactions on Computer-Human Interaction (ATOCHI) 26(3):1–28
Bittner RM, Salamon J, Bosch JJ, Bello JP (2017) Pitch contours as a mid-level representation for music informatics. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society
Cai W, Wei Z (2020) PiiGAN: generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Chen CJ (2014) U.S. Patent No. 8,886,539. Washington, DC: U.S. Patent and Trademark Office
Chen K, Zhang W, Dubnov S, Xia G, Li W (2019) The effect of explicit structure encoding of deep neural networks for symbolic music generation. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), 77-84
Conklin D (2003) Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences, 30-35
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301
Dong HW, Hsiao WY, Yang LC, Yang YH (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, ⋯, Bengio Y (2014) Generative adversarial nets. In Advances in neural information processing systems, 2672–2680
Hadjeres G, Nielsen F, Pachet F (2017) GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–7
Hiller LA, Isaacson LM (1959) Experimental Music: Composition with an Electronic Computer. McGraw-Hill Publishing Company, London
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hsieh TH, Su L, Yang YH (2019) A streamlined encoder/decoder architecture for melody extraction. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 156–160
Jeon W, Ma C (2011) Efficient search of music pitch contours using wavelet transforms and segmented dynamic time warping. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2304–2307
Jiang J, Xia GG, Carlton DB, Anderson CN, Miyakawa RH (2020) Transformer VAE: a hierarchical model for structure-aware and interpretable music representation learning. In ICASSP 2020–-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 516–520
Keerti G, Vaishnavi AN, Mukherjee P, Vidya AS, Sreenithya GS, Nayab D (2020) Attentional networks for music generation. arXiv preprint arXiv:2002.03854
Lim H, Rhyu S, Lee K (2017) Chord generation from symbolic melody using BLSTM networks. arXiv preprint arXiv:1712.01011
Mangal S, Modak R, Joshi P (2019) LSTM based music generation system. arXiv preprint arXiv:1908.01080
Ouyang P, Yin S, Wei S (2017) A fast and power efficient architecture to parallelize LSTM based RNN for cognitive intelligence applications. In Proceedings of the 54th Annual Design Automation Conference 2017, 1-6
Razavi A, van den Oord A, Vinyals O (2019) Generating diverse high-fidelity images with vq-vae-2. In 2019 Conference on Neural Information Processing Systems (NIPS):14837–14847–14847
Salamon J, Gómez E (2012) Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans Audio Speech Lang Process 20(6):1759–1770
Salamon J, Peeters G, Röbel A (2012) Statistical characterisation of melodic pitch contours and its application for melody extraction. In ISMIR, 187–192
Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint Deeping learning model. IEEE Access 8:71353–71363
Wu J, Hu C, Wang Y, Hu X, Zhu J (2019) A hierarchical recurrent neural network for symbolic melody generation. IEEE Transactions on Cybernetics 50(6):2749–2757
Yamshchikov IP, Tikhonov A (2017) Music generation with variational recurrent autoencoder supported by history. arXiv preprint arXiv:1705.05458
Yang LC, Chou SY, Yang YH (2017) MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847
You H, Tian S, Yu L, Lv Y (2019) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Appendix 2
Appendix 3
Rights and permissions
About this article
Cite this article
Lang, R., Zhu, S. & Wang, D. Pitch contours curve frequency domain fitting with vocabulary matching based music generation. Multimed Tools Appl 80, 28463–28486 (2021). https://doi.org/10.1007/s11042-021-11049-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11049-x