Journals & Magazines >IEEE Transactions on Circuits... >Volume: 33 Issue: 7

Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Video captioning is a challenging task of automatically generating natural and meaningful textual descriptions given some context videos. The state-of-the-art methods agg...Show More

Metadata

Abstract:

Video captioning is a challenging task of automatically generating natural and meaningful textual descriptions given some context videos. The state-of-the-art methods aggregate the spatial-wise information in the video encoder at the early stage, which has two drawbacks: 1) Early aggregation in the encoder can cause considerable spatial details missing, which may consequently lead to incorrect word choices in the following text encoder. 2) The spatial attention learned in the video encoder may not be compelling enough without text guidance. To solve these problems, we propose a Stay-in-Grid video CAPtioning method SGCAP, which makes full use of the grid-level spatial features and consists of a Bilinear Sequential Attention Encoder (BSAE) and a Cross-modal Sequential Attention Decoder (CSAD). The former explores and retains fully grid-level discriminative representations in the video encoder, while the latter performs the late spatial aggregation in the decoder to attend to the most relevant regions with the supervision of the input words. Experimental results demonstrate the effectiveness of our method on three public datasets, showing its superior performance over multiple state-of-the-art video captioning models. Source codes and the pre-trained models will be made available to the public.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 33, Issue: 7, July 2023)

Page(s): 3319 - 3332

Date of Publication: 27 December 2022

ISSN Information:

DOI: 10.1109/TCSVT.2022.3232634

Funding Agency:

Contents

References is not available for this document.

Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?