skip to main content
10.1145/3573942.3574088acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

First Describe, Then Depict: Generating Covers for Music and Books via Extracting Keywords: This paper presents two methods to generate high resolution uncopyrighted book covers or music album covers.

Published: 16 May 2023 Publication History

Abstract

In this paper, we consider the two algorithms of generating artwork covers based on texts or audio file features. The resulting image is combined from existing images labelled with keywords after applying filter-based image harmonization. To achieve realistic composition, we train GAN to predict an appropriate filter or apply emotion-based Neural Style Transfer. The quality of generated book covers and music album covers was evaluated by assessors. According to their assessment, the suggested algorithms appeared to produce a better result compared to the existing solutions. The suggested methods also achieve printing quality and require less time for computations, moreover, generated images can be used without copyright infringement.

References

[1]
Artbreeder. 2021. https://www.artbreeder.com/compose/albums, 2020. Retrieved 2021-09-15.
[2]
Automated art. 2021. https://automated-art.co.uk/, 2021. Retrieved 2021-09-15.
[3]
Aghajanyan, A., and Shrivastava, A., and Gupta, A., and Goyal, N., Zettle-moyer, L., and Gupta, S. 2020. Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156 (2020).
[4]
Beliga, S., and Mestrovic, A., and MartincicIpsic, S. 2015. An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences 39, 1 (2015), 1–20.
[5]
Bennani-Smires, K., and Musat, C., and Hossmann, A., and Baeriswyl, M., and Jaggi, M. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018).
[6]
Cong, W., and Zhang, J., and Niu, L., and Liu, L., and Ling, Z., and Li, W., and Zhang, L. 2019. Image harmonization dataset iharmony4: Hcoco, hadobe5k, hflickr, and hday2night. arXiv preprint arXiv:1908.10526 (2019)
[7]
Cong, W., and Zhang, J., and Niu, L., and Liu, L., and Ling, Z., and Li, W., and Zhang, L. 2020. Dovenet: Deep image harmonization via domain verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8394–8403.
[8]
Delbouys, R., and Hennequin, R., and Piccoli, F., and Royo-Letelier, J., and Moussallam, M. 2018. Music mood detection based on audio and lyrics with deep neural net. arXiv preprint arXiv:1809.07276 (2018).
[9]
Devlin, J., and Chang, M.-W., and Lee, K., and Toutanova, K. 2018. Bert:Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Forte, M., and Pitie, F. 2020. f, b, alpha matting. arXiv preprint arXiv:2003.07711 (2020)
[11]
Frolov, S., and Hinz, T., and Raue, F., and Hees, J., and Dengel, A. 2021. Adversarial text-to-image synthesis: A review. arXiv preprint arXiv:2101.09983 (2021).
[12]
Gardner, M.-A., and Sunkavalli, K., and Yumer, E., and Shen, X., and Gambaretto, E., and Gagne, C., and Lalonde, J.-F. 2017. Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090 (2017).
[13]
Gatys, L. A., and Ecker, A. S., and Bethge, M. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).
[14]
Gavelin, D. 2019. Rocklou album cover generator. https://www.rocklou.com/albumcovergenerator, 2019. Retrieved 2021-09-15.
[15]
Gupta, K., and Lazarow, J., and Achille, A., and Davis, L. S., and Mahadevan, V., and Shrivastava, A. 2021. Layouttransformer: Layout generation and completion with self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 1004–1014.
[16]
He, K., and Zhang, X., and Ren, S., and Sun, J. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (2015), pp. 1026–1034.
[17]
Hepburn, A., and McConville, R., and Santos-Rodrıguez, R. 2017. Album cover generation from genre tags. In 10th International Workshop on Machine Learning and Music (2017)
[18]
Ho, J., and Saharia, C., and Chan, W., and Fleet, D. J., and Norouzi, M., and Salimans, T. 2022. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23 (2022), 47–1.
[19]
Hold-Geoffroy, Y., and Sunkavalli, K., and Hadap, S., and Gambaretto, E., and Lalonde, J.-F. 2017. Deep outdoor illumination estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7312–7321.
[20]
Ke, Z., and Sun, J., and Li, K., and Yan, Q., and Lau, R. W. 2020. Modnet: real-time trimap-free portrait matting via objective decomposition. arXiv e-prints (2020).
[21]
Lalonde, J.-F., and Efros, A. A. 2007. Using color compatibility for assessing image realism. In 2007 IEEE 11th International Conference on Computer Vision (2007), IEEE, pp. 1–8.
[22]
Lucieri, A., and Sabir, H., and Siddiqui, S. A., and Rizvi, S. T. R., and Iwana, B. K., and Uchida, S., and Dengel, A., and Ahmed, S. 2020. Benchmarking deep learning models for classification of book covers. SN Computer Science 1, 3 (2020), 1–16.
[23]
McFee, B., and Raffel, C., and Liang, D., and Ellis, D., and Mcvicar, M., and Battenberg, E., and Nieto, O. 2020. librosa: Audio and music signal analysis in python. pp. 18–24.
[24]
Mirza, M., and Osindero, S. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[25]
Radford, A., and Kim, J. W., and Hallacy, C., and Ramesh, A., and Goh, G., and Agarwal, S., and Sastry, G., and Askell, A., and Mishkin, P., and Clark, J., 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (2021), PMLR, pp. 8748–8763.
[26]
Ramesh, A., and Dhariwal, P., and Nichol, A., and Chu, C., and Chen, M. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[27]
Ramesh, A., and Pavlov, M., and Goh, G., and Gray, S.,and Voss, C., and Radford, A., and Chen, M., and Sutskever, I. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning (2021), PMLR, pp. 8821–8831.
[28]
Reinhard, E., and Adhikhmin, M., and Gooch, B., and Shirley, P. 2001. Color transfer between images. IEEE Computer graphics and applications 21, 5 (2001), 34–41.
[29]
Seyp, V. 2021. Gan album art. https://ganalbum.art/, 2019. Accessed: 2021-09-15.
[30]
Tsai, Y.-H., and Shen, X., and Lin, Z., and Sunkavalli, K., and Lu, X., and Yang, M.-H. 2017. Deep image harmonization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3789–3797.
[31]
Ulyanov, D., and Vedaldi, A., and Lempitsky, V. 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 6924–6932.
[32]
Vaswani, A., and Shazeer, N., and Parmar, N., and Uszkoreit, J., and Jones, L., and Gomez, A. N., and Kaiser, L., and Polosukhin, I. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017). Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., and
[33]
He, X. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 1316–1324.
[34]
Zhang, J., and Zhao, Y., and Saleh, M., and Liu, P. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (2020), PMLR, pp. 11328–11339.

Index Terms

  1. First Describe, Then Depict: Generating Covers for Music and Books via Extracting Keywords: This paper presents two methods to generate high resolution uncopyrighted book covers or music album covers.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
      September 2022
      1221 pages
      ISBN:9781450396899
      DOI:10.1145/3573942
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 May 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Book cover
      2. Image generation
      3. Image harmonization
      4. Music album cover
      5. Neural Style Transfer

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      AIPR 2022

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 47
        Total Downloads
      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media