research-article

First Describe, Then Depict: Generating Covers for Music and Books via Extracting Keywords: This paper presents two methods to generate high resolution uncopyrighted book covers or music album covers.

Authors:

Valeria Efimova,

Viacheslav Shalamov,

Andrey FilchenkovAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 734 - 739

https://doi.org/10.1145/3573942.3574088

Published: 16 May 2023 Publication History

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

First Describe, Then Depict: Generating Covers for Music and Books via Extracting Keywords: This paper presents two methods to generate high resolution uncopyrighted book covers or music album covers.

Pages 734 - 739

Abstract
References

Abstract

In this paper, we consider the two algorithms of generating artwork covers based on texts or audio file features. The resulting image is combined from existing images labelled with keywords after applying filter-based image harmonization. To achieve realistic composition, we train GAN to predict an appropriate filter or apply emotion-based Neural Style Transfer. The quality of generated book covers and music album covers was evaluated by assessors. According to their assessment, the suggested algorithms appeared to produce a better result compared to the existing solutions. The suggested methods also achieve printing quality and require less time for computations, moreover, generated images can be used without copyright infringement.

References

[1]

Artbreeder. 2021. https://www.artbreeder.com/compose/albums, 2020. Retrieved 2021-09-15.

[2]

Automated art. 2021. https://automated-art.co.uk/, 2021. Retrieved 2021-09-15.

[3]

Aghajanyan, A., and Shrivastava, A., and Gupta, A., and Goyal, N., Zettle-moyer, L., and Gupta, S. 2020. Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156 (2020).

[4]

Beliga, S., and Mestrovic, A., and MartincicIpsic, S. 2015. An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences 39, 1 (2015), 1–20.

[5]

Bennani-Smires, K., and Musat, C., and Hossmann, A., and Baeriswyl, M., and Jaggi, M. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018).

[6]

Cong, W., and Zhang, J., and Niu, L., and Liu, L., and Ling, Z., and Li, W., and Zhang, L. 2019. Image harmonization dataset iharmony4: Hcoco, hadobe5k, hflickr, and hday2night. arXiv preprint arXiv:1908.10526 (2019)

[7]

Cong, W., and Zhang, J., and Niu, L., and Liu, L., and Ling, Z., and Li, W., and Zhang, L. 2020. Dovenet: Deep image harmonization via domain verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8394–8403.

[8]

Delbouys, R., and Hennequin, R., and Piccoli, F., and Royo-Letelier, J., and Moussallam, M. 2018. Music mood detection based on audio and lyrics with deep neural net. arXiv preprint arXiv:1809.07276 (2018).

[9]

Devlin, J., and Chang, M.-W., and Lee, K., and Toutanova, K. 2018. Bert:Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[10]

Forte, M., and Pitie, F. 2020. f, b, alpha matting. arXiv preprint arXiv:2003.07711 (2020)

[11]

Frolov, S., and Hinz, T., and Raue, F., and Hees, J., and Dengel, A. 2021. Adversarial text-to-image synthesis: A review. arXiv preprint arXiv:2101.09983 (2021).

[12]

Gardner, M.-A., and Sunkavalli, K., and Yumer, E., and Shen, X., and Gambaretto, E., and Gagne, C., and Lalonde, J.-F. 2017. Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090 (2017).

[13]

Gatys, L. A., and Ecker, A. S., and Bethge, M. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).

[14]

Gavelin, D. 2019. Rocklou album cover generator. https://www.rocklou.com/albumcovergenerator, 2019. Retrieved 2021-09-15.

[15]

Gupta, K., and Lazarow, J., and Achille, A., and Davis, L. S., and Mahadevan, V., and Shrivastava, A. 2021. Layouttransformer: Layout generation and completion with self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 1004–1014.

[16]

He, K., and Zhang, X., and Ren, S., and Sun, J. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (2015), pp. 1026–1034.

[17]

Hepburn, A., and McConville, R., and Santos-Rodrıguez, R. 2017. Album cover generation from genre tags. In 10th International Workshop on Machine Learning and Music (2017)

[18]

Ho, J., and Saharia, C., and Chan, W., and Fleet, D. J., and Norouzi, M., and Salimans, T. 2022. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23 (2022), 47–1.

[19]

Hold-Geoffroy, Y., and Sunkavalli, K., and Hadap, S., and Gambaretto, E., and Lalonde, J.-F. 2017. Deep outdoor illumination estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7312–7321.

[20]

Ke, Z., and Sun, J., and Li, K., and Yan, Q., and Lau, R. W. 2020. Modnet: real-time trimap-free portrait matting via objective decomposition. arXiv e-prints (2020).

[21]

Lalonde, J.-F., and Efros, A. A. 2007. Using color compatibility for assessing image realism. In 2007 IEEE 11th International Conference on Computer Vision (2007), IEEE, pp. 1–8.

[22]

Lucieri, A., and Sabir, H., and Siddiqui, S. A., and Rizvi, S. T. R., and Iwana, B. K., and Uchida, S., and Dengel, A., and Ahmed, S. 2020. Benchmarking deep learning models for classification of book covers. SN Computer Science 1, 3 (2020), 1–16.

[23]

McFee, B., and Raffel, C., and Liang, D., and Ellis, D., and Mcvicar, M., and Battenberg, E., and Nieto, O. 2020. librosa: Audio and music signal analysis in python. pp. 18–24.

[24]

Mirza, M., and Osindero, S. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).

[25]

Radford, A., and Kim, J. W., and Hallacy, C., and Ramesh, A., and Goh, G., and Agarwal, S., and Sastry, G., and Askell, A., and Mishkin, P., and Clark, J., 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (2021), PMLR, pp. 8748–8763.

[26]

Ramesh, A., and Dhariwal, P., and Nichol, A., and Chu, C., and Chen, M. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[27]

Ramesh, A., and Pavlov, M., and Goh, G., and Gray, S.,and Voss, C., and Radford, A., and Chen, M., and Sutskever, I. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning (2021), PMLR, pp. 8821–8831.

[28]

Reinhard, E., and Adhikhmin, M., and Gooch, B., and Shirley, P. 2001. Color transfer between images. IEEE Computer graphics and applications 21, 5 (2001), 34–41.

[29]

Seyp, V. 2021. Gan album art. https://ganalbum.art/, 2019. Accessed: 2021-09-15.

[30]

Tsai, Y.-H., and Shen, X., and Lin, Z., and Sunkavalli, K., and Lu, X., and Yang, M.-H. 2017. Deep image harmonization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3789–3797.

[31]

Ulyanov, D., and Vedaldi, A., and Lempitsky, V. 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 6924–6932.

[32]

Vaswani, A., and Shazeer, N., and Parmar, N., and Uszkoreit, J., and Jones, L., and Gomez, A. N., and Kaiser, L., and Polosukhin, I. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017). Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., and

[33]

He, X. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 1316–1324.

[34]

Zhang, J., and Zhao, Y., and Saleh, M., and Liu, P. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (2020), PMLR, pp. 11328–11339.

Index Terms

First Describe, Then Depict: Generating Covers for Music and Books via Extracting Keywords: This paper presents two methods to generate high resolution uncopyrighted book covers or music album covers.
1. Applied computing
  1. Arts and humanities
    1. Media arts
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...
Generating Music With Emotions
We focus on the music generation conditional on human emotions, specifically the positive and negative emotions. There is no existing large-scale music datasets with the annotation of human emotion labels. It is thus not intuitive how to generate music ...
Pop Music Generation: From Melody to Multi-style Arrangement
Special Issue on KDD 2018, Regular Papers and Survey Paper

Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
47
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten