ABSTRACT
Analyzing videos is one of the fundamental problems of computer vision and multimedia content analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now able to boost the performance of video analysis significantly and initiate new research directions to analyze video content. This tutorial will present recent advances under the umbrella of video understanding, which start from a unified deep learning toolkit--Microsoft Cognitive Toolkit (CNTK) that supports popular model types such as convolutional nets and recurrent networks, to fundamental challenges of video representation learning and video classification, recognition, and finally to an emerging area of video and language.
- Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, and Jiebo Luo. 2016. Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation. In ICMR. Google ScholarDigital Library
- Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, and Jiebo Luo. 2017. Learning hierarchical video representation for action recognition. International Journal of Multimedia Information Retrieval (2017), 1--14.Google ScholarCross Ref
- Yehao Li, Ting Yao, Tao Mei, Hongyang Chao, and Yong Rui. 2016. Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding. In ACM MM. Google ScholarDigital Library
- Yingwei Pan, Yehao Li, Ting Yao, Tao Mei, Houqiang Li, and Yong Rui. 2016. Learning deep intrinsic video representation by exploring temporal coherence and graph structure. In IJCAI. Google ScholarDigital Library
- Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui. 2016. Jointly modeling embedding and translation to bridge video and language. In CVPR.Google Scholar
- Yingwei Pan, Ting Yao, Houqiang Li, and Tao Mei. 2017. Video Captioning with Transferred Semantic Attributes. CVPR (2017).Google Scholar
- Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Deep Quantization: Encoding Convolutional Activations with Deep Generative Model. In CVPR.Google Scholar
- Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning Spatio- Temporal Representation with Pseudo-3D Residual Networks. In ICCV.Google Scholar
- Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. Msr-vtt: A large video description dataset for bridging video and language. In CVPR.Google Scholar
- Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2017. Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects. In CVPR.Google Scholar
- Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting Image Captioning with Attributes. In ICCV.Google Scholar
Index Terms
- Deep Learning for Intelligent Video Analysis
Recommendations
Deep Learning-Based Video Coding: A Review and a Case Study
The past decade has witnessed the great success of deep learning in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. We review the representative works about using ...
Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation
CoVieW'18: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the WildAnalyzing videos is one of the fundamental problems of computer vision and multimedia analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development ...
Deep representation-based transfer learning for deep neural networks
AbstractIn recent years, deep neural networks (DNNs) have become the de facto models for practically all visual tasks and most temporal analysis tasks due to the abundance of available labeled data and advances in computational resources. Deep ...
Highlights- A deep representation-based transfer learning method is proposed for knowledge transfer between deep neural networks.
Comments