skip to main content
10.1145/2964284.2967193acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Scene Image Synthesis from Natural Sentences Using Hierarchical Syntactic Analysis

Published: 01 October 2016 Publication History

Abstract

Synthesizing a new image from verbal information is a challenging task that has a number of applications. Most research on the issue has attempted to address this question by providing external clues, such as sketches. However, no study has been able to successfully handle various sentences for this purpose without any other information. We propose a system to synthesize scene images solely from sentences. Input sentences are expected to be complete sentences with visualizable objects. Our priorities are the analysis of sentences and the correlation of information between input sentences and visible image patches. A hierarchical syntactic parser is developed for sentence analysis, and a combination of lexical knowledge and corpus statistics is designed for word correlation. The entire system was applied to both a clip-art dataset and an actual image dataset. This application highlighted the capability of the proposed system to generate novel images as well as its ability to succinctly convey ideas.

References

[1]
R. Achanta, S. Appu, S. Kevin, L. Aurelien, F. Pascal, and S. Sabine. Slic superpixels compared to state-of-the-art superpixel methods. PAMI, 34(11):2274--2282, 2012.
[2]
T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu. Sketch2photo: internet image montage. ACM TOG, 28(5):124:1--10, 2009.
[3]
M.-C. De Marneffe, B. MacCartney, C. D. Manning, et al. Generating typed dependency parses from phrase structure parses. In ICLRE, 2006.
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255, 2009.
[5]
C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
[6]
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47:853--899, 2013.
[7]
S. Inaba, A. Kanezaki, and T. Harada. Automatic image synthesis from keywords using scene context. In ACMMM, pages 1149--1152, 2014.
[8]
L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. PAMI, 20(11):1254--1259, 1998.
[9]
Y. Li, D. McLean, Z. Bandar, J. D. O'shea, K. Crockett, et al. Sentence similarity based on semantic nets and corpus statistics. TKDE, 18(8):1138--1150, 2006.
[10]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740--755, 2014.
[11]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
[12]
K. K. Schuler. Verbnet: A Broad-coverage, Comprehensive Verb Lexicon. PhD thesis, Philadelphia, PA, USA, 2005. AAI3179808.
[13]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[14]
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. ACL, 2:67--78, 2014.
[15]
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS, pages 487--495, 2014.
[16]
C. L. Zitnick, D. Parikh, and L. Vanderwende. Learning the visual interpretation of sentences. In ICCV, 2013.

Cited By

View all
  • (2019)Mobile App for Text-to-Image SynthesisMobile Computing, Applications, and Services10.1007/978-3-030-28468-8_3(32-43)Online publication date: 25-Sep-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. image processing
  2. image synthesis
  3. multimedia content creation
  4. natural language processing
  5. syntactic abstraction
  6. word mapping

Qualifiers

  • Short-paper

Funding Sources

  • ImPACT Program by the Cabinet Office Government of Japan

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Mobile App for Text-to-Image SynthesisMobile Computing, Applications, and Services10.1007/978-3-030-28468-8_3(32-43)Online publication date: 25-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media