abstract

Captioning Images Using Different Styles

Author:

Alexander Patrick MathewsAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 665 - 668

https://doi.org/10.1145/2733373.2807998

Published: 13 October 2015 Publication History

Abstract

I develop techniques that can be used to incorporate stylistic objectives into existing image captioning systems. Style is generally a very tricky concept to define, thus I concentrate on two specific components of style. First I develop a technique for predicting how people will name visual objects. I demonstrate that this technique could be used to generate captions with human like naming conventions. Full details are available in a recent publication. Second I outline a system for generating sentences which express a strong positive or negative sentiment. Finally I present two possible future directions which are aimed at modelling style more generally. These are learning to imitate an individuals captioning style and generating a diverse set of captions for a single image.

References

[1]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. ACMMM, 2013.

Digital Library

[2]

S. E. Chaigneau, L. W. Barsalou, and M. Zamani. Situational information contributes to object categorization and inference. Acta psychologica, 2009.

[3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.

[4]

J. Deng, J. Krause, a. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. CVPR, 2012.

Digital Library

[5]

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CoRR, 2014.

[6]

A. Esuli, F. Sebastiani, and V. G. Moruzzi. SENTIWORDNET : A Publicly Available Lexical Resource for Opinion Mining. LREC, 2006.

[7]

L. Gatti, M. Guerini, O. Stock, and C. Strapparava. Sentiment Variations in Text for Persuasion Technology. Persuasive Technology, 2014.

Digital Library

[8]

M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: data, models and evaluation metrics. JAIR, 2013.

Digital Library

[9]

D. Joshi, R. Datta, E. Fedorovskaya, Q.-T. Luong, J. Wang, J. Li, and J. Luo. Aesthetics and emotions in images. Signal Processing Magazine, IEEE, 2011.

[10]

A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. NIPS, 2014.

Digital Library

[11]

R. Kiros, R. Salakhutdinov, and R. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv, 2014.

[12]

A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012.

Digital Library

[13]

G. Kulkarni, V. Premraj, and S. Dhar. Baby talk: Understanding and generating simple image descriptions. CVPR, 2011.

Digital Library

[14]

P. Kuznetsova, V. Ordonez, and A. Berg. Collective generation of natural image descriptions. ACL, 2012.

Digital Library

[15]

J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain Images with Multimodal Recurrent Neural Networks. NIPS, 2015.

[16]

A. Mathews, L. Xie, and X. He. Choosing basic-level concept names using visual and language context. WACV, 2015.

Digital Library

[17]

V. Ordonez, J. Deng, and Y. Choi. From large scale image categorization to entry-level categories. ICCV, 2013.

Digital Library

[18]

V. Ordonez, G. Kulkarni, and T. Berg. Im2text: Describing images using 1 million captioned photographs. NIPS, 2011.

Digital Library

[19]

G. Özbal, D. Pighin, and C. Strapparava. BRAINSUP: Brainstorming Support for Creative Sentence Generation. ACL, 2013.

[20]

M. Rohrbach, W. Qiu, I. Titov, and S. Thater. Translating Video Content to Natural Language Descriptions. ICCV, 2013.

Digital Library

[21]

E. Rosch. Principles of categorization. 1999.

[22]

E. Rosch, C. Mervis, and W. Gray. Basic objects in natural categories. Cognitive Psychology, 1976.

[23]

F. D. Rosis and F. Grasso. Affective natural language generation. Affective interactions, 2000.

Digital Library

[24]

R. Socher, A. Perelygin, and J. Wu. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013.

[25]

M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas. Sentiment strength detection in short informal text. JASIST, 2010.

Digital Library

[26]

O. Vinyals and A. Toshev. Show and Tell: A Neural Image Caption Generator. arXiv, 2014.

[27]

Y. Yang, C. Teo, H. D. III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. EMNLP, 2011.

Digital Library

Cited By

Panchal NGarg D(2023)Image Captioning Using Xception-Long Short-Term MemoryMining Intelligence and Knowledge Exploration10.1007/978-3-031-44084-7_3(25-33)Online publication date: 24-Sep-2023
https://doi.org/10.1007/978-3-031-44084-7_3
Muehlbradt AKane S(2022)What's in an ALT Tag? Exploring Caption Content Priorities through Collaborative CaptioningACM Transactions on Accessible Computing10.1145/350765915:1(1-32)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3507659
Usha Kingsly Devi KGomathi V(2022)Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment AnalysisNeural Processing Letters10.1007/s11063-022-11082-355:4(5087-5120)Online publication date: 18-Nov-2022
https://doi.org/10.1007/s11063-022-11082-3
Show More Cited By

Index Terms

Captioning Images Using Different Styles
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
    2. Natural language processing
      1. Natural language generation

Recommendations

Style as Sentiment Versus Style as Formality: The Same or Different?
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
Unsupervised textual style transfer presupposes that style is a coherent and consistent concept and that style transfer approaches will generalise consistently across different domains of style. This paper explores whether this presupposition is ...
Conditional GANs for Image Captioning with Sentiments
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series
Abstract
The area of automatic image captioning has witnessed much progress recently. However, generating captions with sentiment, which is a common dimension in human generated captions, still remains a challenge. This work presents a generative approach ...
Automatic Caption Generation for News Images

This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Check for updates

Author Tags

Qualifiers

Abstract

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
293
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Panchal NGarg D(2023)Image Captioning Using Xception-Long Short-Term MemoryMining Intelligence and Knowledge Exploration10.1007/978-3-031-44084-7_3(25-33)Online publication date: 24-Sep-2023
https://doi.org/10.1007/978-3-031-44084-7_3
Muehlbradt AKane S(2022)What's in an ALT Tag? Exploring Caption Content Priorities through Collaborative CaptioningACM Transactions on Accessible Computing10.1145/350765915:1(1-32)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3507659
Usha Kingsly Devi KGomathi V(2022)Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment AnalysisNeural Processing Letters10.1007/s11063-022-11082-355:4(5087-5120)Online publication date: 18-Nov-2022
https://doi.org/10.1007/s11063-022-11082-3
Karuthakannan UVelusamy G(2021)TGSL-Dependent Feature Selection for Boosting the Visual Sentiment ClassificationSymmetry10.3390/sym1308146413:8(1464)Online publication date: 10-Aug-2021
https://doi.org/10.3390/sym13081464
R MAnu MS D(2021)Building A Voice Based Image Caption Generator with Deep Learning2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432091(943-948)Online publication date: 6-May-2021
https://doi.org/10.1109/ICICCS51141.2021.9432091
Srivastava GSrivastava R(2018)A Survey on Automatic Image CaptioningMathematics and Computing10.1007/978-981-13-0023-3_8(74-83)Online publication date: 14-Apr-2018
https://doi.org/10.1007/978-981-13-0023-3_8
Wang JFu JXu YMei T(2016)Beyond object recognitionProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061108(3484-3490)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061108

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents