short-paper

Introducing Concept And Syntax Transition Networks for Image Captioning

Authors:

Philipp Blandfort,

Tushar Karayil,

Damian Borth,

Andreas DengelAuthors Info & Claims

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 385 - 388

https://doi.org/10.1145/2911996.2930060

Published: 06 June 2016 Publication History

Get Access

Abstract

The area of image captioning i.e. the automatic generation of short textual descriptions of images has experienced much progress recently. However, image captioning approaches often only focus on describing the content of the image without any emotional or sentimental dimension which is common in human captions. This paper presents an approach for image captioning designed specifically to incorporate emotions and feelings into the caption generation process. The presented approach consists of a Deep Convolutional Neural Network (CNN) for detecting Adjective Noun Pairs in the image and a novel graphical network architecture called "Concept And Syntax Transition (CAST)" network for generating sentences from these detected concepts.

References

[1]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In ACM Int. Conf. on Multimedia (ACM MM), 2013.

Digital Library

Google Scholar

[2]

T. Chen, D. Borth, T. Darrell, and S.-F. Chang. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586, 2014.

Google Scholar

[3]

X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv:1504.00325, 2015.

Google Scholar

[4]

A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In Computer Vision--ECCV 2010, pages 15--29. Springer, 2010.

Digital Library

Google Scholar

[5]

M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, pages 853--899, 2013.

Digital Library

Google Scholar

[6]

S. Kalkowski, C. Schulze, A. Dengel, and D. Borth. Real-time analysis and visualization of the yfcc100m dataset. In MM COMMOMS Workshop, 2015.

Digital Library

Google Scholar

[7]

A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128--3137, 2015.

Crossref

Google Scholar

[8]

S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 220--228. ACL, 2011.

Digital Library

Google Scholar

[9]

J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.

Google Scholar

[10]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546, 2013.

Google Scholar

[11]

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for nding and describing images with sentences. Transactions of the ACL, 2:207--218, 2014.

Google Scholar

[12]

B. Thomee, B. Elizalde, D. A. Shamma, K. Ni, G. Friedland, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64--73, 2016.

Digital Library

Google Scholar

[13]

A. Ulges, D. Borth, and T. M. Breuel. Visual concept learning from weakly labeled web videos. In Video Search and Mining, pages 203--232. Springer, 2010.

Crossref

Google Scholar

[14]

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555, 2014.

Google Scholar

Cited By

View all

Jiang TJi YLiu C(2021)Integrating Historical States and Co-attention Mechanism for Visual Dialog2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412629(2041-2048)Online publication date: 10-Jan-2021
https://doi.org/10.1109/ICPR48806.2021.9412629
Ragusa ECambria EZunino RGastaldo P(2019)A Survey on Deep Learning in Image Polarity Detection: Balancing Generalization Performances and Computational CostsElectronics10.3390/electronics80707838:7(783)Online publication date: 12-Jul-2019
https://doi.org/10.3390/electronics8070783
Randive KMohan R(2019)A State-of-Art Review on Automatic Video Annotation TechniquesIntelligent Systems Design and Applications10.1007/978-3-030-16657-1_99(1060-1069)Online publication date: 12-Apr-2019
https://doi.org/10.1007/978-3-030-16657-1_99
Show More Cited By

Index Terms

Introducing Concept And Syntax Transition Networks for Image Captioning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
        Object recognition
    2. Natural language processing
      1. Natural language generation

Recommendations

A Comprehensive Survey of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image captioning requires recognizing the important objects, their attributes, and their relationships in an image. It also needs to generate syntactically and semantically correct ...
Generating Affective Captions using Concept And Syntax Transition Networks
MM '16: Proceedings of the 24th ACM international conference on Multimedia

The area of image captioning i.e. the automatic generation of short textual descriptions of images has experienced much progress recently. However, image captioning approaches often only focus on describing the content of the image without any emotional ...
Factors Influencing The Performance of Image Captioning Model: An Evaluation
MoMM '16: Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media

Recently, neural network-based methods have shown impressive performances in captioning task. There have been numerous attempts with many proposed architectures to solve this captioning problem. In this paper, we present the evaluation of different ...

Comments

Information & Contributors

Information

Published In

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

June 2016

452 pages

ISBN:9781450343596

DOI:10.1145/2911996

General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Bundesministerium für Bildung und Forschung

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6 - 9, 2016

New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
262
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jiang TJi YLiu C(2021)Integrating Historical States and Co-attention Mechanism for Visual Dialog2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412629(2041-2048)Online publication date: 10-Jan-2021
https://doi.org/10.1109/ICPR48806.2021.9412629
Ragusa ECambria EZunino RGastaldo P(2019)A Survey on Deep Learning in Image Polarity Detection: Balancing Generalization Performances and Computational CostsElectronics10.3390/electronics80707838:7(783)Online publication date: 12-Jul-2019
https://doi.org/10.3390/electronics8070783
Randive KMohan R(2019)A State-of-Art Review on Automatic Video Annotation TechniquesIntelligent Systems Design and Applications10.1007/978-3-030-16657-1_99(1060-1069)Online publication date: 12-Apr-2019
https://doi.org/10.1007/978-3-030-16657-1_99
Srivastava GSrivastava R(2018)A Survey on Automatic Image CaptioningMathematics and Computing10.1007/978-981-13-0023-3_8(74-83)Online publication date: 14-Apr-2018
https://doi.org/10.1007/978-981-13-0023-3_8
Luo JBorth DYou QLiu QLienhart RWang HChen SBoll SChen PFriedland GLi JYan S(2017)Social Multimedia Sentiment AnalysisProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3130143(1953-1954)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3123266.3130143
Karayil TBlandfort PBorth DDengel AHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Generating Affective Captions using Concept And Syntax Transition NetworksProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2984070(1111-1115)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2984070

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A Comprehensive Survey of Deep Learning for Image Captioning

Generating Affective Captions using Concept And Syntax Transition Networks

Factors Influencing The Performance of Image Captioning Model: An Evaluation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations