skip to main content
10.1145/3469877.3493594acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

PBNet: Position-specific Text-to-image Generation by Boundary

Published: 10 January 2022 Publication History

Abstract

Most existing methods focus on improving the clarity and semantic consistency of the image with a given text, but do not pay attention to the multiple control of generated image content, such as the position of the object in generated image. In this paper, we introduce a novel position-based generative network (PBNet) which can generate fine-grained images with the object at the specified location. PBNet combines iterative structure with generative adversarial network (GAN). A location information embedding module (LIEM) is proposed to combine the location information extracted from the boundary block image with the semantic information extracted from the text. In addition, a silhouette generation module (SGM) is proposed to train the generator to generate object based on location information. The experimental results on CUB dataset demonstrate that PBNet effectively controls the location of the object in the generated image.

References

[1]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, and A. Stewart. 2018. Sever: A Robust Meta-Algorithm for Stochastic Optimization. (2018).
[2]
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P Dollar, J. Gao, X. He, M. Mitchell, and J. C. Platt. 2015. From Captions to Visual Concepts and Back. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Networks. Advances in Neural Information Processing Systems 3 (2014), 2672–2680.
[4]
Z. Han, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas. 2017. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99(2017), 1–1.
[5]
S. Reed, Z. Akata, H. Lee, and B. Schiele. 2016. Learning Deep Representations of Fine-Grained Visual Descriptions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. 2016. Generative Adversarial Text to Image Synthesis. JMLR.org (2016).
[7]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.
[8]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[9]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. california institute of technology(2011).
[10]
T. Xu, P. Zhang, Q. Huang, and Zhang.2018. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Index Terms

  1. PBNet: Position-specific Text-to-image Generation by Boundary
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
      December 2021
      508 pages
      ISBN:9781450386074
      DOI:10.1145/3469877
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 January 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attention mechanism
      2. boundary block diagram
      3. location information
      4. silhouette generation module

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Funding Sources

      Conference

      MMAsia '21
      Sponsor:
      MMAsia '21: ACM Multimedia Asia
      December 1 - 3, 2021
      Gold Coast, Australia

      Acceptance Rates

      Overall Acceptance Rate 59 of 204 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 58
        Total Downloads
      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media