skip to main content
10.1145/2733373.2807998acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
abstract

Captioning Images Using Different Styles

Published: 13 October 2015 Publication History

Abstract

I develop techniques that can be used to incorporate stylistic objectives into existing image captioning systems. Style is generally a very tricky concept to define, thus I concentrate on two specific components of style. First I develop a technique for predicting how people will name visual objects. I demonstrate that this technique could be used to generate captions with human like naming conventions. Full details are available in a recent publication. Second I outline a system for generating sentences which express a strong positive or negative sentiment. Finally I present two possible future directions which are aimed at modelling style more generally. These are learning to imitate an individuals captioning style and generating a diverse set of captions for a single image.

References

[1]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. ACMMM, 2013.
[2]
S. E. Chaigneau, L. W. Barsalou, and M. Zamani. Situational information contributes to object categorization and inference. Acta psychologica, 2009.
[3]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
[4]
J. Deng, J. Krause, a. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. CVPR, 2012.
[5]
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CoRR, 2014.
[6]
A. Esuli, F. Sebastiani, and V. G. Moruzzi. SENTIWORDNET : A Publicly Available Lexical Resource for Opinion Mining. LREC, 2006.
[7]
L. Gatti, M. Guerini, O. Stock, and C. Strapparava. Sentiment Variations in Text for Persuasion Technology. Persuasive Technology, 2014.
[8]
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: data, models and evaluation metrics. JAIR, 2013.
[9]
D. Joshi, R. Datta, E. Fedorovskaya, Q.-T. Luong, J. Wang, J. Li, and J. Luo. Aesthetics and emotions in images. Signal Processing Magazine, IEEE, 2011.
[10]
A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. NIPS, 2014.
[11]
R. Kiros, R. Salakhutdinov, and R. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv, 2014.
[12]
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012.
[13]
G. Kulkarni, V. Premraj, and S. Dhar. Baby talk: Understanding and generating simple image descriptions. CVPR, 2011.
[14]
P. Kuznetsova, V. Ordonez, and A. Berg. Collective generation of natural image descriptions. ACL, 2012.
[15]
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain Images with Multimodal Recurrent Neural Networks. NIPS, 2015.
[16]
A. Mathews, L. Xie, and X. He. Choosing basic-level concept names using visual and language context. WACV, 2015.
[17]
V. Ordonez, J. Deng, and Y. Choi. From large scale image categorization to entry-level categories. ICCV, 2013.
[18]
V. Ordonez, G. Kulkarni, and T. Berg. Im2text: Describing images using 1 million captioned photographs. NIPS, 2011.
[19]
G. Özbal, D. Pighin, and C. Strapparava. BRAINSUP: Brainstorming Support for Creative Sentence Generation. ACL, 2013.
[20]
M. Rohrbach, W. Qiu, I. Titov, and S. Thater. Translating Video Content to Natural Language Descriptions. ICCV, 2013.
[21]
E. Rosch. Principles of categorization. 1999.
[22]
E. Rosch, C. Mervis, and W. Gray. Basic objects in natural categories. Cognitive Psychology, 1976.
[23]
F. D. Rosis and F. Grasso. Affective natural language generation. Affective interactions, 2000.
[24]
R. Socher, A. Perelygin, and J. Wu. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013.
[25]
M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas. Sentiment strength detection in short informal text. JASIST, 2010.
[26]
O. Vinyals and A. Toshev. Show and Tell: A Neural Image Caption Generator. arXiv, 2014.
[27]
Y. Yang, C. Teo, H. D. III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. EMNLP, 2011.

Cited By

View all
  • (2023)Image Captioning Using Xception-Long Short-Term MemoryMining Intelligence and Knowledge Exploration10.1007/978-3-031-44084-7_3(25-33)Online publication date: 24-Sep-2023
  • (2022)What's in an ALT Tag? Exploring Caption Content Priorities through Collaborative CaptioningACM Transactions on Accessible Computing10.1145/350765915:1(1-32)Online publication date: 4-Mar-2022
  • (2022)Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment AnalysisNeural Processing Letters10.1007/s11063-022-11082-355:4(5087-5120)Online publication date: 18-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '15: Proceedings of the 23rd ACM international conference on Multimedia
October 2015
1402 pages
ISBN:9781450334594
DOI:10.1145/2733373
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Check for updates

Author Tags

  1. caption generation
  2. image description
  3. object naming
  4. sentiment

Qualifiers

  • Abstract

Conference

MM '15
Sponsor:
MM '15: ACM Multimedia Conference
October 26 - 30, 2015
Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
Overall Acceptance Rate 995 of 4,171 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Image Captioning Using Xception-Long Short-Term MemoryMining Intelligence and Knowledge Exploration10.1007/978-3-031-44084-7_3(25-33)Online publication date: 24-Sep-2023
  • (2022)What's in an ALT Tag? Exploring Caption Content Priorities through Collaborative CaptioningACM Transactions on Accessible Computing10.1145/350765915:1(1-32)Online publication date: 4-Mar-2022
  • (2022)Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment AnalysisNeural Processing Letters10.1007/s11063-022-11082-355:4(5087-5120)Online publication date: 18-Nov-2022
  • (2021)TGSL-Dependent Feature Selection for Boosting the Visual Sentiment ClassificationSymmetry10.3390/sym1308146413:8(1464)Online publication date: 10-Aug-2021
  • (2021)Building A Voice Based Image Caption Generator with Deep Learning2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432091(943-948)Online publication date: 6-May-2021
  • (2018)A Survey on Automatic Image CaptioningMathematics and Computing10.1007/978-981-13-0023-3_8(74-83)Online publication date: 14-Apr-2018
  • (2016)Beyond object recognitionProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061108(3484-3490)Online publication date: 9-Jul-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media