Abstract
Music has always been the most powerful medium to express human emotion and feeling which sometimes mere words cannot express. As a result generating music using machine and deep learning approaches have been quite popular for some time now. It is a very challenging and interesting task to do as imitating human creativity is not easy. This paper attempted to perform effective melody generation using sequential deep learning models particularly LSTMs (Long short-term memory). In this context, note that the previous works exhibit two principal limitations. Firstly, a significant majority of the studies rely on RNN variants that cannot effectively remember long past sequences. Secondly, they often don’t consider the varying temporal context lengths in melody generation during data modeling. In this work, experiments have been performed with different LSTM variants namely Vanilla LSTM, Multi-Layer LSTM, Bidirectional LSTM and different temporal context lengths for each of them to find out the optimal LSTM model and the optimal timestep for efficient melody generation. Moreover, ensembles of the best-performing techniques for each genre (e.g., classical, country, jazz, and pop) are implemented to see if we can generate even better melodies than the corresponding individual models. Finally, a qualitative evaluation is carried out for the generated melodies by conducting a survey that we circulated among fellow colleagues and within the ISMIR community and asked the participants to rate each audio on a scale of 1-5 which helped us in assessing the quality of our generated music samples. All the models have been validated on four datasets that we manually prepared based on their genres namely Classical, Jazz, Country and Pop.











Similar content being viewed by others
Data Availability
The source of the benchmark dataset is provided in the manuscript.
References
Varshney LR, Pinel F, Varshney KR, Schörgendorfer A, Chee Y-M (2013) Cognition as a part of computational creativity. In: IEEE 12th international conference on cognitive informatics and cognitive computing. IEEE 2013:36–43
Besold TR, Schorlemmer M, Smaill A et al (2015) Computational creativity research: towards creative machines
Toivonen H et al (2020) Computational creativity beyond machine learning. Phys Life Rev
Colton S, Wiggins GA et al (2012) Computational creativity: the final frontier?. In Ecai, vol 12, pp 21–26. Montpelier
Gero JS (2000) Computational models of innovative and creative design processes. Technol Forecast Soc Change 64(2–3):183–196
Leach J, Fitch J (1995) Nature, music, and algorithmic composition. Comput Music J 19(2):23–33
Papadopoulos G, Wiggins G (1999) Ai methods for algorithmic composition: a survey, a critical view and future prospects, in AISB symposium on musical creativity, vol 124. UK, Edinburgh, pp 110–117
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471
Mozer MC (1994) Neural network music composition by prediction: exploring the benefits of psychoacoustic constraints and multi-scale processing. Conn Sci 6(2–3):247–280
Chen C-C, Miikkulainen R (2001) Creating melodies with evolving recurrent neural networks. In IJCNN’01. International joint conference on neural networks. Proceedings (Cat. No. 01CH37222), vol. 3, pp. 2241–2246. IEEE
Eck D, Schmidhuber J (2002) A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103:48
Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv:1206.6392
Waite E, Eck D, Roberts A, Abolafia D (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn
Colombo F, Muscinelli SP, Seeholzer A, Brea J, Gerstner W (2016) Algorithmic composition of melodies with deep recurrent neural networks. arXiv:1606.07251
Colombo F, Seeholzer A, Gerstner W (2017) Deep artificial composer: a creative neural network model for automated melody generation. In International conference on evolutionary and biologically inspired music and art. Springer, pp 81–96
Kalingeri V, Grandhe S (2016) Music generation with deep learning. arXiv:1612.04928
Wu J, Hu C, Wang Y, Hu X, Zhu J (2019) A hierarchical recurrent neural network for symbolic melody generation. IEEE Trans Cybern 50(6):2749–2757
Ranjan A, Behera VNJ, Reza M (2020) Using a bi-directional lstm model with attention mechanism trained on midi data for generating unique music. arXiv:2011.00773,
Moog RA (1986) Midi: musical instrument digital interface. J Audio Eng Soc 34(5):394–404
Vemula DR, Tripathi SK, Sharma NK, Hussain MM, Swamy UR, Polavarapu BL (2023) Music generation using deep learning. In: Machine vision and augmented intelligence: select proceedings of MAI 2022. Springer, pp 597–607
V, Ingale, A, Mohan, D, Adlakha, K, Kumar, and M, Gupta, Music generation using three-layered lstm, arXiv:2105.09046, 2021
Minu R, Nagarajan G, Borah S, Mishra D (2022) Lstm-rnn-based automatic music generation algorithm. In: Intelligent and cloud computing: Proceedings of ICICC 2021. Springer, pp 327–339
Mohanty R, Dubey PP, Sandhan T (2023) Temporally conditioning of generative adversarial networks with lstm for music generation. In: 2023 10th International conference on signal processing and integrated networks (SPIN).IEEE, pp 526–530
Yang L-C, Chou S-Y, Yang Y-H (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv:1703.10847
Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training. arXiv:1611.09904
Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI conference on artificial intelligence
Zhu H, Liu Q, Yuan NJ, Qin C, Li J, Zhang K, Zhou G, Wei F, Xu Y, Chen E (2018) Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2837–2846
Chandna P, Blaauw M, Bonada J, Gómez E (2019) Wgansing: a multi-voice singing voice synthesizer based on the wasserstein-gan. In: 27th European signal processing conference (EUSIPCO). IEEE 2019:1–5
Spezzatti A (2019) Neural networks for music generation. Towards Data Science, June, vol 24
Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered bilstm model. IEEE Access, vol 8, pp 73 992–74 001,
Ramezanpanah Z, Mallem M, Davesne F (2023) Autonomous gesture recognition using multi-layer lstm networks and laban movement analysis. Int J Knowl-Based Intell Eng Syst 1–9
Hernàndez-Carnerero À, Sànchez-Marrè M, Mora-Jiménez I, Soguero-Ruiz C, Martínez-Agüero S, Álvarez-Rodríguez J (2023) Dimensionality reduction and ensemble of lstms for antimicrobial resistance prediction. Artif Intell Med 138:102508
Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv:1609.03499
Oord A, Li Y, Babuschkin I, Simonyan K, Vinyals O, Kavukcuoglu K, Driessche G, Lockhart E, Cobo L, Stimberg F, et al (2018) Parallel wavenet: fast high-fidelity speech synthesis. In: International conference on machine learning. PMLR, pp 3918–3926
Gabrielli L, Cella CE, Vesperini F, Droghini D, Principi E, Squartini S (2018) Deep learning for timbre modification and transfer: an evaluation study. In: Audio engineering society convention 144. Audio Engineering Society
Tan HH, Herremans D (2020) Music fadernets: controllable music generation based on high-level features via low-level feature modelling. arXiv:2007.15474
Cuthbert MS, Ariza C (2010) music21: a toolkit for computer-aided musicology and symbolic music data
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Huang F, Xie G, Xiao R (2009) Research on ensemble learning. In: 2009 international conference on artificial intelligence and computational intelligence, vol 3 pp 249–252. IEEE
Czika W, Maldonado M, Liu Y (2023) Ensemble modeling: recent advances and applications
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nag, B., Middya, A.I. & Roy, S. Melody generation based on deep ensemble learning using varying temporal context length. Multimed Tools Appl 83, 69647–69668 (2024). https://doi.org/10.1007/s11042-024-18270-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18270-4