Unsupervised video-to-video translation with preservation of frame modification tendency

Liu, Huajun; Li, Chao; Lei, Dian; Zhu, Qing

doi:10.1007/s00371-020-01913-6

Unsupervised video-to-video translation with preservation of frame modification tendency

Original article
Published: 22 July 2020

Volume 36, pages 2105–2116, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Huajun Liu¹,
Chao Li¹,
Dian Lei¹ &
…
Qing Zhu²

556 Accesses
3 Citations
Explore all metrics

Abstract

Tremendous advances have been achieved in image translation with the employment of generative adversarial networks (GANs). With respect to video-to-video translation, similar idea has been leveraged by various researches, which may focus on the associations among relevant frames. However, the existing video-synthesis methods based on GANs do not make full exploitation of the spatial–temporal information in videos, especially in the continuous frames. In this paper, we propose an efficient method to conduct video translation that can preserve the frame modification trends in sequential frames of the original video and smooth the variations between the generated frames. To constrain the consistency of the mentioned tendency between the generated video and the original one, we propose a tendency-invariant loss to impel further exploitation of spatial-temporal information. Experiments show that our method is able to learn more abundant information of adjacent frames and generate more desirable videos than the baselines, i.e., Recycle-GAN and CycleGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image-to-Video Translation Using a VAE-GAN with Refinement Network

Two-Channel VAE-GAN Based Image-To-Video Translation

VHS to HDTV Video Translation Using Multi-task Adversarial Learning

References

Anoosheh, A., Agustsson, E., Timofte, R., Van Gool, L.: Combogan: unrestrained scalability for image domain translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 783–790 (2018)
Bansal, A., Ma, S., Ramanan, D., Sheikh, Y.: Recycle-gan: unsupervised video retargeting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135 (2018)
Bashkirova, D., Usman, B., Saenko, K.: Unsupervised video-to-video translation. arXiv:1806.03698 (2018)
Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. In: Advances in Neural Information Processing Systems, pp. 2104–2114 (2018)
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2017)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5933–5942 (2019)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Gafni, O., Wolf, L., Taigman, Y.: Vid2game: controllable characters extracted from real-world videos. arXiv:1904.08379 (2019)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Kang, K., Li, H., Xiao, T., Ouyang, W., Yan, J., Liu, X., Wang, X.: Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 727–735 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. arXiv:1905.01723 (2019)
Ma, T., Tian, W.: Back-projection-based progressive growing generative adversarial network for single image super-resolution. Vis. Comput (2020). https://doi.org/10.1007/s00371-020-01843-3
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)
Miyato, T., Koyama, M.: cgans with projection discriminator. arXiv:1802.05637 (2018)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 2642–2651. JMLR.org (2017)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv:1605.05396 (2016)
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2830–2839 (2017)
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2107–2116 (2017)
Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. arXiv:1611.02200 (2016)
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: Mocogan: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1526–1535 (2018)
Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: European Conference on Computer Vision, pp. 835–851. Springer (2016)
Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: video forecasting by generating pose futures. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3332–3341 (2017)
Wang, T.C., Liu, M.Y., Tao, A., Liu, G., Kautz, J., Catanzaro, B.: Few-shot video-to-video synthesis. arXiv:1910.12713 (2019)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Liu, G., Tao, A., Kautz, J., Catanzaro, B.: Video-to-video synthesis. arXiv:1808.06601 (2018)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, X., Gupta, A.: Videos as space-time region graphs. In: The European Conference on Computer Vision (ECCV) (2018)
Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Yuan, Q., Li, J., Zhang, L., Wu, Z., Liu, G.: Blind motion deblurring with cycle generative adversarial networks. Vis. Comput. 36, 1591–1601 (2019)
Article Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Zhou, Y., Wang, Z., Fang, C., Bui, T., Berg, T.L.: Dance dance generation: motion transfer for internet videos. arXiv:1904.00129 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (NSFC) (41771427 and 41631174).

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Huajun Liu, Chao Li & Dian Lei
Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, China
Qing Zhu

Authors

Huajun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Dian Lei
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Huajun Liu or Dian Lei.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Li, C., Lei, D. et al. Unsupervised video-to-video translation with preservation of frame modification tendency. Vis Comput 36, 2105–2116 (2020). https://doi.org/10.1007/s00371-020-01913-6

Download citation

Published: 22 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00371-020-01913-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised video-to-video translation with preservation of frame modification tendency

Abstract

Access this article

Similar content being viewed by others

Image-to-Video Translation Using a VAE-GAN with Refinement Network

Two-Channel VAE-GAN Based Image-To-Video Translation

VHS to HDTV Video Translation Using Multi-task Adversarial Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised video-to-video translation with preservation of frame modification tendency

Abstract

Access this article

Similar content being viewed by others

Image-to-Video Translation Using a VAE-GAN with Refinement Network

Two-Channel VAE-GAN Based Image-To-Video Translation

VHS to HDTV Video Translation Using Multi-task Adversarial Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation