research-article

Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only

Authors:

Li YuanAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 10716 - 10724

https://doi.org/10.1145/3664647.3681495

Published: 28 October 2024 Publication History

Abstract

As a critical component in graphic design, artistic posters are widely applied in the advertising and entertainment industry, thus the automatic poster creation from user-provided prompts has become increasingly desired recently. Although existing Text2Image methods create impressive images aligned with given prompts, they fail to generate ideal artistic posters, especially with Chinese texts. To create desired artistic Chinese posters including an aligned background, reasonable layouts, and stylized graphical texts from given prompts only, we propose an automatic poster creation framework, named Prompt2Poster. Our framework utilizes the capacity of the powerful Large Language Model (LLM) to extract user intention from provided prompts and generate the aligned background. Although only taking a user prompt as the input, linguistic, visual, and geometrical information is fully utilized in the framework, bringing the ability to fit different distributions. To achieve the use of multi-modal information in the framework, two carefully designed modules, Controllable Layout Generator (CLG) and Graphical Text Generator (GTG) are proposed, leading to accurate and pleasurable visual results. Comprehensive experiments demonstrate that our Prompt2Poster achieves superior performance, especially in text quality and visual harmony.

References

[1]

2023. DALL-E 3. https://openai.com/dall-e-3.

[2]

Diego Martin Arroyo, Janis Postels, and Federico Tombari. 2021. Variational transformer networks for layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13642--13652.

[3]

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. 2022. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324 (2022).

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), 1877--1901.

[5]

Yunning Cao, Ye Ma, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, and Yuning Jiang. 2022. Geometry aligned variational transformer for image-conditioned layout generation. In Proceedings of the 30th ACM International Conference on Multimedia. 1561--1571.

Digital Library

[6]

Saemi Choi, Kiyoharu Aizawa, and Nicu Sebe. 2018. Fontmatcher: font image paring for harmonious digital graphic design. In 23rd International Conference on Intelligent User Interfaces. 37--41.

Digital Library

[7]

Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang. 2023. LayoutGPT: Compositional Visual Planning and Generation with Large Language Models. arXiv preprint arXiv:2305.15393 (2023).

[8]

Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, and Yuning Jiang. 2023. TextPainter: Multimodal Text Image Generation withVisual-harmony and Text-comprehension for Poster Design. arXiv preprint arXiv:2308.04733 (2023).

[9]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems (NeurIPS) 27 (2014).

[10]

Shunan Guo, Zhuochen Jin, Fuling Sun, Jingwen Li, Zhaorui Li, Yang Shi, and Nan Cao. 2021. Vinci: an intelligent graphic design system for generating advertising posters. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1--17.

Digital Library

[11]

Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao, and Yu Qiao. 2022. Diff-Font: Diffusion Model for Robust One-Shot Font Generation. arXiv preprint arXiv:2212.05895 (2022).

[12]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), 6840--6851.

[13]

Hsiao Yuan Hsu, Xiangteng He, Yuxin Peng, Hao Kong, and Qing Zhang. 2023. PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6018--6026.

[14]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).

[15]

Ali Jahanian, Jerry Liu, Qian Lin, Daniel Tretter, Eamonn O'Brien-Strain, Seungyon Claire Lee, Nic Lyons, and Jan Allebach. 2013. Recommendation system for automatic design of magazine covers. In Proceedings of the 2013 international conference on Intelligent user interfaces. 95--106.

Digital Library

[16]

Dorothea Jameson and Leo M Hurvich. 1964. Theory of brightness and color contrast in human vision. Vision research 4, 1--2 (1964), 135--154.

[17]

Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7482--7491.

[18]

Jianan Li, Jimei Yang, Jianming Zhang, Chang Liu, Christina Wang, and Tingfa Xu. 2020. Attribute-conditioned layout gan for automatic graphic design. IEEE Transactions on Visualization and Computer Graphics 27, 10 (2020), 4039--4048.

Digital Library

[19]

Long Lian, Boyi Li, Adam Yala, and Trevor Darrell. 2023. LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models. arXiv preprint arXiv:2305.13655 (2023).

[20]

Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, and Boyi Li. 2023. LLMgrounded Video Diffusion Models. arXiv preprint arXiv:2309.17444 (2023).

[21]

Lukas Liebel and Marco Körner. 2018. Auxiliary tasks in multi-task learning. arXiv preprint arXiv:1805.06334 (2018).

[22]

Jinpeng Lin, Min Zhou, Ye Ma, Yifan Gao, Chenxi Fei, Yangjian Chen, Zhang Yu, and Tiezheng Ge. 2023. AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation. arXiv preprint arXiv:2308.01095 (2023).

[23]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

[24]

Quynh Phung, Songwei Ge, and Jia-Bin Huang. 2023. Grounded Text-to-Image Synthesis with Attention Refocusing. arXiv preprint arXiv:2306.05427 (2023).

[25]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023).

[26]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.

[27]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684--10695.

[28]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems (NeurIPS) 35 (2022), 36479--36494.

[29]

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, and Yasutaka Furukawa. [n. d.]. Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation. ([n. d.]).

[30]

Masataka Tokumaru, Noriaki Muranaka, and Shigeru Imanishi. 2002. Color design support system considering color harmony. In 2002 IEEE world congress on computational intelligence. 2002 IEEE international conference on fuzzy systems. FUZZ-IEEE'02. Proceedings (Cat. No. 02CH37291), Vol. 1. IEEE, 378--383.

[31]

Praneetha Vaddamanu, Vinay Aggarwal, Bhanu Prakash Reddy Guda, Balaji Vasan Srinivasan, and Niyati Chhaya. 2022. Harmonized Banner Creation from Multimodal Design Assets. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1--7.

Digital Library

[32]

Sreekanth Vempati, Korah T Malayil, V Sruthi, and R Sandeep. 2020. Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce. In Fashion Recommender Systems. Springer, 25--48.

[33]

Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, and CL Chen. 2024. Desigen: A Pipeline for Controllable Design Template Generation. arXiv preprint arXiv:2403.09093 (2024).

[34]

Yangchen Xie, Xinyuan Chen, Li Sun, and Yue Lu. 2021. Dg-font: Deformable generative networks for unsupervised font generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5130--5140.

[35]

An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, and Chang Zhou. 2022. Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese. arXiv preprint arXiv:2211.01335 (2022).

[36]

Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, and Kai Chen. 2023. GlyphControl: Glyph Conditional Control for Visual Text Generation. arXiv preprint arXiv:2305.18259 (2023).

[37]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models.

[38]

Min Zhou, Chenchen Xu, Ye Ma, Tiezheng Ge, Yuning Jiang, and Weiwei Xu. 2022. Composition-aware graphic layout GAN for visual-textual presentation designs. arXiv preprint arXiv:2205.00303 (2022).

Index Terms

Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Two-stage Content-Aware Layout Generation for Poster Designs
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Automatic layout generation models can generate numerous design layouts in a few seconds, which significantly reduces the amount of repetitive work for designers. However, most of these models consider the layout generation task as arranging layout ...
Interactive creation of Chinese calligraphy with the application in calligraphy education
Transactions on edutainment V

Given a few tablet images of Chinese calligraphy, it is difficult to automatically create new Chinese calligraphy with better effects while keeping similar style. A semiautomatic creation scheme of Chinese calligraphy and its application in calligraphy ...
Web Service Automatically Layout Generation Method
EIDWT '13: Proceedings of the 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies

Companies recognize the need to be customer driven by providing superior service to satisfy customers' needs. But as customers and their needs grow increasing diverse, unnecessary cost and complexity are inevitably added to operations. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
249
Total Downloads

Downloads (Last 12 months)249
Downloads (Last 6 weeks)154

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten