Recurrent Deconvolutional Generative Adversarial Networks with Application to Video Generation

Yu, Hongyuan; Huang, Yan; Pi, Lihong; Wang, Liang

doi:10.1007/978-3-030-31723-2_2

Recurrent Deconvolutional Generative Adversarial Networks with Application to Video Generation

Conference paper
First Online: 31 October 2019

2436 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Abstract

This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

Student first author.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bin, H., Weihai, C., Xingming, W., Chun-Liang, L.: High-quality face image SR using conditional generative adversarial networks. arXiv preprint arXiv:1707.00737 (2017)
Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: CVPR (2015)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: ICML (2015)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Google Scholar
Kalchbrenner, N., et al.: Video pixel networks. arXiv:1610.00527 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
Kiros, R., et al.: Skip-thought vectors. In: NeurIPS (2015)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Chapter Google Scholar
Nemirovski, A., Yudin, D.: On Cezari’s convergence of the steepest descent method for approximating saddle point of convex-concave functions. In: Soviet Math. Dokl (1978)
Google Scholar
van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with PixelCNN decoders. In: NeurIPS (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)
Google Scholar
Reed, S., van den Oord, A., Kalchbrenner, N., Bapst, V., Botvinick, M., deFreitas, N.: Generating interpretable images with controllable structure. Technical report (2016)
Google Scholar
Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: NeurIPS (2015)
Google Scholar
Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV (2017)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 (2012)
Tan, W.R., Chan, C.S., Aguirre, H., Tanaka, K.: Improved ArtGAN for conditional synthesis of natural image and artwork. arXiv preprint arXiv:1708.09533 (2017)
Tieleman, T.: Optimizing neural networks that generate images. Ph.D. thesis (2014)
Google Scholar
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: CVPR (2018)
Google Scholar
Van DenOord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: ICML (2016)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NeurIPS (2016)
Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream ConvNets. arXiv:1507.02159 (2015)
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_20
Chapter Google Scholar
Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: towards realistic high-resolution image blending. arXiv preprint arXiv:1703.07195 (2017)
Yang, J., Reed, S.E., Yang, M.H., Lee, H.: Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In: NeurIPS (2015)
Google Scholar
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
Google Scholar
Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 262–277. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_16
Chapter Google Scholar

Download references

Acknowledgments

This work is jointly supported by National Key Research and Development Program of China (2016YFB1001000), National Natural Science Foundation of China (61525306, 61633021, 61721004, 61420106015, 61806194), Capital Science and Technology Leading Talent Training Project (Z181100006318030), Beijing Science and Technology Project (Z181100008918010) and CAS-AIR.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences (UCAS), Beijing, China
Hongyuan Yu, Yan Huang & Liang Wang
Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Beijing, China
Hongyuan Yu, Yan Huang & Liang Wang
The Institute of Microelectronics, Tsinghua University (THU), Beijing, China
Lihong Pi
Chinese Academy of Sciences Artificial Intelligence Research (CAS-AIR), Beijing, China
Liang Wang

Authors

Hongyuan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Pi
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Wang .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Huang, Y., Pi, L., Wang, L. (2019). Recurrent Deconvolutional Generative Adversarial Networks with Application to Video Generation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-31723-2_2
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics