Loading [a11y]/accessibility-menu.js
Discriminative Style Learning for Cross-Domain Image Captioning | IEEE Journals & Magazine | IEEE Xplore

Discriminative Style Learning for Cross-Domain Image Captioning


Abstract:

The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large domain shift problem. Although prior work...Show More

Abstract:

The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large domain shift problem. Although prior work has attempted to leverage both paired source and unpaired target data to minimize this shift, the performance is still unsatisfactory. One main reason lies in the large discrepancy in language expression between two domains, where diverse language styles are adopted to describe an image from different views, resulting in different semantic descriptions for an image. To tackle this problem, this paper proposes a Style-based Cross-domain Image Captioner (SCIC) which incorporates the discriminative style information into the encoder-decoder framework, and interprets an image as a special sentence according to external style instructions. Technically, we design a novel “Instruction-based LSTM”, which adds the instruct gate to collect a style instruction, and then outputs a specified format according to that instruction. Two objectives are designed to train I-LSTM: 1) generating correct image descriptions and 2) generating correct styles, thus the model is expected to accurately capture the semantic meanings of an image by the special caption as well as understand the syntactic structure of the caption. We use MS-COCO as the source domain, and Oxford-102, CUB-200, Flickr30k as the target domains. Experimental results demonstrate that our model consistently outperforms the previous methods, and the style information incorporating with I-LSTM significantly improves the performance, with 5% CIDEr improvements at least on all datasets.
Published in: IEEE Transactions on Image Processing ( Volume: 31)
Page(s): 1723 - 1736
Date of Publication: 27 January 2022

ISSN Information:

PubMed ID: 35085078

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.