Suno: potential, prospects, and trends

Yu, Jiaxing; Wu, Songruoyao; Lu, Guanting; Li, Zijin; Zhou, Li; Zhang, Kejun

doi:10.1631/FITEE.2400299

Jiaxing Yu¹,
Songruoyao Wu¹,
Guanting Lu¹,
Zijin Li²,
Li Zhou³ &
…
Kejun Zhang ORCID: orcid.org/0000-0003-4592-1818^1,4

705 Accesses
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agostinelli A, Denk TI, Borsos Z, et al., 2023. MusicLM: generating music from text. https://arxiv.org/abs/2301.11325
Al-Rfou R, Choe D, Constant N, et al., 2019. Character-level language modeling with deeper self-attention. 33^rd AAAI Conf on Artificial Intelligence, p.3159–3166. https://doi.org/10.1609/AAAI.V33I01.33013159
Ao JY, Wang R, Zhou L, et al., 2022. SpeechT5: unified-modal encoder-decoder pre-training for spoken language processing. Proc 60^th Annual Meeting of the Association for Computational Linguistics, p.5723–5738. https://doi.org/10.18653/V1/2022.ACL-LONG.393
Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34^th Int Conf on Neural Information Processing Systems, Article 159.
Coldewey D, 2022. Try Riffusion, an AI Model That Composes Music by Visualizing It. https://techcrunch.com/2022/12/15/try-riffusion-an-ai-model-that-composes-music-by-visualizing-it/ [Accessed on Apr. 6, 2024].
Copet J, Kreuk F, Gat I, et al., 2023. Simple and controllable music generation. Proc 37^th Int Conf on Neural Information Processing Systems, Article 2066.
Dai ZH, Yang ZL, Yang YM, et al., 2019. Transformer-XL: attentive language models beyond a fixed-length context. Proc 57^th Conf of the Association for Computational Linguistics, p.2978–2988. https://doi.org/10.18653/V1/P19-1285
Dhariwal P, Jun H, Payne C, et al., 2020. Jukebox: a generative model for music. https://arxiv.org/abs/2005.00341
Freyberg K, 2024. Introducing v3. https://www.suno.ai/blog/v3 [Accessed on Apr. 6, 2024].
Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hsiao WY, Liu JY, Yeh YC, et al., 2021. Compound Word Transformer: learning to compose full-song music over dynamic directed hypergraphs. 35^th AAAI Conf on Artificial Intelligence, p.178–186. https://doi.org/10.1609/AAAI.V35I1.16091
Huang CZA, Vaswani A, Uszkoreit J, et al., 2019. Music Transformer: generating music with long-term structure. 7^th Int Conf on Learning Representations.
Huang QQ, Park DS, Wang T, et al., 2023. Noise2Music: text-conditioned music generation with diffusion models. https://arxiv.org/abs/2302.03917
Huang YS, Yang YH, 2020. Pop Music Transformer: beat-based modeling and generation of expressive pop piano compositions. Proc 28^th ACM Int Conf on Multimedia, p.1180–1188. https://doi.org/10.1145/3394171.3413671
Kreuk F, Synnaeve G, Polyak A, et al., 2023. AudioGen: textually guided audio generation. 11^th Int Conf on Learning Representations.
Liu HH, Chen ZH, Yuan Y, et al., 2023. AudioLDM: text-to-audio generation with latent diffusion models. Proc 40^th Int Conf on Machine Learning, p.21450–21474.
O’Boyle M, 2023. (Re)Discovering Music Theory: AI Algorithm Learns the Rules of Musical Composition and Provides a Framework for Knowledge Discovery. https://csl.illinois.edu/news-and-media/rediscovering-music-theory-ai-algorithm-learns-the-rules-of-musical-composition-and-provides-a-framework-for-knowledge-discovery [Accessed on Apr. 6, 2024].
Ouyang L, Wu J, Jiang X, et al., 2022. Training language models to follow instructions with human feedback. Proc 36^th Int Conf on Neural Information Processing Systems, Article 2011.
Ren Y, He JZ, Tan X, et al., 2020. PopMAG: pop music accompaniment generation. Proc 28^th ACM Int Conf on Multimedia, p.1198–1206. https://doi.org/10.1145/3394171.3413721
Ren Y, Hu CX, Tan X, et al., 2021. FastSpeech 2: fast and high-quality end-to-end text to speech. 9^th Int Conf on Learning Representations.
Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31^st Int Conf on Neural Information Processing Systems, p.6000–6010.
Wu J, Liu XG, Hu XL, et al., 2020. PopMNet: generating structured pop music melodies using neural networks. Artif Intell, 286:103303. https://doi.org/10.1016/J.ARTINT.2020.103303
Article Google Scholar
Wu XD, Huang ZJ, Zhang KJ, et al., 2024. MelodyGLM: multi-task pre-training for symbolic melody generation. https://arxiv.org/abs/2309.10738
Yu HZ, Varshney LR, Taube H, et al., 2022. (Re)Discovering laws of music theory using information lattice learning. IEEE BITS Inform Theory Mag, 2(1):58–75. https://doi.org/10.1109/MBITS.2022.3205288
Google Scholar
Yuan RB, Lin HF, Wang Y, et al., 2024. ChatMusician: understanding and generating music intrinsically with LLM. https://arxiv.org/abs/2402.16153
Zeng ML, Tan X, Wang R, et al., 2021. MusicBERT: symbolic music understanding with large-scale pre-training. Findings of the Association for Computational Linguistics, p.791–800. https://doi.org/10.18653/V1/2021.FINDINGS-ACL.70
Zhou J, Ke P, Qiu XP, et al., 2023. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, early access. https://doi.org/10.1631/FITEE.2300089
Zou Y, Zou P, Zhao Y, et al., 2022. MELONS: generating melody with long-term structure using transformers and structure graph. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.191–195.

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Jiaxing Yu, Songruoyao Wu, Guanting Lu & Kejun Zhang
Department of Music Artificial Intelligence and Music Information Technology, Central Conservatory of Music, Beijing, 100031, China
Zijin Li
School of Arts and Communication, China University of Geosciences (Wuhan), Wuhan, 430074, China
Li Zhou
Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China
Kejun Zhang

Authors

Jiaxing Yu
View author publications
You can also search for this author inPubMed Google Scholar
Songruoyao Wu
View author publications
You can also search for this author inPubMed Google Scholar
Guanting Lu
View author publications
You can also search for this author inPubMed Google Scholar
Zijin Li
View author publications
You can also search for this author inPubMed Google Scholar
Li Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Kejun Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jiaxing YU, Songruoyao WU, Guanting LU, and Kejun ZHANG drafted the paper. Zijin LI and Li ZHOU helped organize the paper. Kejun ZHANG revised and finalized the paper.

Corresponding author

Correspondence to Kejun Zhang.

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 62272409), the Key R&D Program of Zhejiang Province, China (No. 2022C03126), and the Ministry of Culture and Tourism of China (No. 2022DMKLB001)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Wu, S., Lu, G. et al. Suno: potential, prospects, and trends. Front Inform Technol Electron Eng 25, 1025–1030 (2024). https://doi.org/10.1631/FITEE.2400299

Download citation

Received: 17 April 2024
Accepted: 24 May 2024
Published: 20 June 2024
Issue Date: July 2024
DOI: https://doi.org/10.1631/FITEE.2400299

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suno: potential, prospects, and trends

Access this article

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now