Paper
15 March 2019 Sequence-to-sequence image caption generator
Author Affiliations +
Proceedings Volume 11041, Eleventh International Conference on Machine Vision (ICMV 2018); 110410C (2019) https://doi.org/10.1117/12.2523174
Event: Eleventh International Conference on Machine Vision (ICMV 2018), 2018, Munich, Germany
Abstract
Recently, image captioning has received much attention from the artificial-intelligent (AI) research community. Most of the current works follow the encoder-decoder machine translation model to automatically generate captions for images. However, most of these works used Convolutional Neural Network (CNN) as an image encoder and Recurrent Neural Network (RNN) as a decoder to generate the caption. In this paper, we propose a sequence-to-sequence model that uses RNN as an image encoder that follows the encoder-decoder machine translation model, such that the input to the model is a sequence of images that represents the objects in the image. These objects are ordered based on their order in the captions. We demonstrate the results of the model on Flickr30K dataset and compare the results with the state-ofthe-art methods that use the same dataset. The proposed model outperformed the state-of-the-art methods on all metrics.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Rehab Alahmadi, Chung Hyuk Park, and James Hahn "Sequence-to-sequence image caption generator", Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 110410C (15 March 2019); https://doi.org/10.1117/12.2523174
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Feature extraction

Performance modeling

Convolutional neural networks

Neural networks

Visual process modeling

Artificial intelligence

Back to Top