Journals & Magazines >IEEE Transactions on Image Pr... >Volume: 31

I²Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

TV show captioning aims to generate a linguistic sentence based on the video and its associated subtitle. Compared to purely video-based captioning, the subtitle can prov...Show More

Metadata

Abstract:

TV show captioning aims to generate a linguistic sentence based on the video and its associated subtitle. Compared to purely video-based captioning, the subtitle can provide the captioning model with useful semantic clues such as actors’ sentiments and intentions. However, the effective use of subtitle is also very challenging, because it is the pieces of scrappy information and has semantic gap with visual modality. To organize the scrappy information together and yield a powerful omni-representation for all the modalities, an efficient captioning model requires understanding video contents, subtitle semantics, and the relations in between. In this paper, we propose an Intra- and Inter-relation Embedding Transformer (I²Transformer), consisting of an Intra-relation Embedding Block (IAE) and an Inter-relation Embedding Block (IEE) under the framework of a Transformer. First, the IAE captures the intra-relation in each modality via constructing the learnable graphs. Then, IEE learns the cross attention gates, and selects useful information from each modality based on their inter-relations, so as to derive the omni-representation as the input to the Transformer. Experimental results on the public dataset show that the I²Transformer achieves the state-of-the-art performance. We also evaluate the effectiveness of the IAE and IEE on two other relevant tasks of video with text inputs, i.e., TV show retrieval and video-guided machine translation. The encouraging performance further validates that the IAE and IEE blocks have a good generalization ability. The code is available at https://github.com/tuyunbin/I2Transformer.

Published in: IEEE Transactions on Image Processing ( Volume: 31)

Page(s): 3565 - 3577

Date of Publication: 21 March 2022

ISSN Information:

PubMed ID: 35312620

DOI: 10.1109/TIP.2022.3159472

Funding Agency:

Contents

References is not available for this document.

I²Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

I2Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

I²Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning