Skip to main content

Advertisement

Log in

Suno: potential, prospects, and trends

  • Comment
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Agostinelli A, Denk TI, Borsos Z, et al., 2023. MusicLM: generating music from text. https://arxiv.org/abs/2301.11325

  • Al-Rfou R, Choe D, Constant N, et al., 2019. Character-level language modeling with deeper self-attention. 33rd AAAI Conf on Artificial Intelligence, p.3159–3166. https://doi.org/10.1609/AAAI.V33I01.33013159

  • Ao JY, Wang R, Zhou L, et al., 2022. SpeechT5: unified-modal encoder-decoder pre-training for spoken language processing. Proc 60th Annual Meeting of the Association for Computational Linguistics, p.5723–5738. https://doi.org/10.18653/V1/2022.ACL-LONG.393

  • Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.

  • Coldewey D, 2022. Try Riffusion, an AI Model That Composes Music by Visualizing It. https://techcrunch.com/2022/12/15/try-riffusion-an-ai-model-that-composes-music-by-visualizing-it/ [Accessed on Apr. 6, 2024].

  • Copet J, Kreuk F, Gat I, et al., 2023. Simple and controllable music generation. Proc 37th Int Conf on Neural Information Processing Systems, Article 2066.

  • Dai ZH, Yang ZL, Yang YM, et al., 2019. Transformer-XL: attentive language models beyond a fixed-length context. Proc 57th Conf of the Association for Computational Linguistics, p.2978–2988. https://doi.org/10.18653/V1/P19-1285

  • Dhariwal P, Jun H, Payne C, et al., 2020. Jukebox: a generative model for music. https://arxiv.org/abs/2005.00341

  • Freyberg K, 2024. Introducing v3. https://www.suno.ai/blog/v3 [Accessed on Apr. 6, 2024].

  • Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Hsiao WY, Liu JY, Yeh YC, et al., 2021. Compound Word Transformer: learning to compose full-song music over dynamic directed hypergraphs. 35th AAAI Conf on Artificial Intelligence, p.178–186. https://doi.org/10.1609/AAAI.V35I1.16091

  • Huang CZA, Vaswani A, Uszkoreit J, et al., 2019. Music Transformer: generating music with long-term structure. 7th Int Conf on Learning Representations.

  • Huang QQ, Park DS, Wang T, et al., 2023. Noise2Music: text-conditioned music generation with diffusion models. https://arxiv.org/abs/2302.03917

  • Huang YS, Yang YH, 2020. Pop Music Transformer: beat-based modeling and generation of expressive pop piano compositions. Proc 28th ACM Int Conf on Multimedia, p.1180–1188. https://doi.org/10.1145/3394171.3413671

  • Kreuk F, Synnaeve G, Polyak A, et al., 2023. AudioGen: textually guided audio generation. 11th Int Conf on Learning Representations.

  • Liu HH, Chen ZH, Yuan Y, et al., 2023. AudioLDM: text-to-audio generation with latent diffusion models. Proc 40th Int Conf on Machine Learning, p.21450–21474.

  • O’Boyle M, 2023. (Re)Discovering Music Theory: AI Algorithm Learns the Rules of Musical Composition and Provides a Framework for Knowledge Discovery. https://csl.illinois.edu/news-and-media/rediscovering-music-theory-ai-algorithm-learns-the-rules-of-musical-composition-and-provides-a-framework-for-knowledge-discovery [Accessed on Apr. 6, 2024].

  • Ouyang L, Wu J, Jiang X, et al., 2022. Training language models to follow instructions with human feedback. Proc 36th Int Conf on Neural Information Processing Systems, Article 2011.

  • Ren Y, He JZ, Tan X, et al., 2020. PopMAG: pop music accompaniment generation. Proc 28th ACM Int Conf on Multimedia, p.1198–1206. https://doi.org/10.1145/3394171.3413721

  • Ren Y, Hu CX, Tan X, et al., 2021. FastSpeech 2: fast and high-quality end-to-end text to speech. 9th Int Conf on Learning Representations.

  • Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288

  • Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000–6010.

  • Wu J, Liu XG, Hu XL, et al., 2020. PopMNet: generating structured pop music melodies using neural networks. Artif Intell, 286:103303. https://doi.org/10.1016/J.ARTINT.2020.103303

    Article  Google Scholar 

  • Wu XD, Huang ZJ, Zhang KJ, et al., 2024. MelodyGLM: multi-task pre-training for symbolic melody generation. https://arxiv.org/abs/2309.10738

  • Yu HZ, Varshney LR, Taube H, et al., 2022. (Re)Discovering laws of music theory using information lattice learning. IEEE BITS Inform Theory Mag, 2(1):58–75. https://doi.org/10.1109/MBITS.2022.3205288

    Google Scholar 

  • Yuan RB, Lin HF, Wang Y, et al., 2024. ChatMusician: understanding and generating music intrinsically with LLM. https://arxiv.org/abs/2402.16153

  • Zeng ML, Tan X, Wang R, et al., 2021. MusicBERT: symbolic music understanding with large-scale pre-training. Findings of the Association for Computational Linguistics, p.791–800. https://doi.org/10.18653/V1/2021.FINDINGS-ACL.70

  • Zhou J, Ke P, Qiu XP, et al., 2023. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, early access. https://doi.org/10.1631/FITEE.2300089

  • Zou Y, Zou P, Zhao Y, et al., 2022. MELONS: generating melody with long-term structure using transformers and structure graph. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.191–195.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Jiaxing YU, Songruoyao WU, Guanting LU, and Kejun ZHANG drafted the paper. Zijin LI and Li ZHOU helped organize the paper. Kejun ZHANG revised and finalized the paper.

Corresponding author

Correspondence to Kejun Zhang.

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 62272409), the Key R&D Program of Zhejiang Province, China (No. 2022C03126), and the Ministry of Culture and Tourism of China (No. 2022DMKLB001)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Wu, S., Lu, G. et al. Suno: potential, prospects, and trends. Front Inform Technol Electron Eng 25, 1025–1030 (2024). https://doi.org/10.1631/FITEE.2400299

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2400299