Loading [MathJax]/extensions/MathMenu.js
Automatic video description generation via LSTM with joint two-stream encoding | IEEE Conference Publication | IEEE Xplore

Automatic video description generation via LSTM with joint two-stream encoding


Abstract:

In this paper, we propose a novel two-stream framework based on combinational deep neural networks. The framework is mainly composed of two components: one is a parallel ...Show More

Abstract:

In this paper, we propose a novel two-stream framework based on combinational deep neural networks. The framework is mainly composed of two components: one is a parallel two-stream encoding component which learns video encoding from multiple sources using 3D convolutional neural networks and the other is a long-short-term-memory (LSTM)-based decoding language model which transfers the input encoded video representations to text descriptions. The merits of our proposed model are: 1) It extracts both temporal and spatial features by exploring the usage of 3D convolutional networks on both raw RGB frames and motion history images. 2) Our model can dynamically tune the weights of different feature channels since the network is trained end-to-end from learning combinational encoding of multiple features to LSTM-based language model. Our model is evaluated on three public video description datasets: one YouTube clips dataset (Microsoft Video Description Corpus) and two large movie description datasets (MPII Corpus and Montreal Video Annotation Dataset) and achieves comparable or better performance than the state-of-the-art approaches in video caption generation.
Date of Conference: 04-08 December 2016
Date Added to IEEE Xplore: 24 April 2017
ISBN Information:
Conference Location: Cancun

Contact IEEE to Subscribe

References

References is not available for this document.