Skip to main content

Advertisement

Log in

FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Deep neural networks have provided promising results for the task of video-to-video translation. These schemes require a large number of training samples from both the source and target domains for producing translated video signals with high visual qualities. However, often acquiring several video signals from the target domain is difficult, as it needs various logistics and requires a considerable amount of time. Therefore, the development of the deep few-shot learning-based schemes that are able to efficiently perform the task of video-to-video translation using only a few number of samples from the target domain is crucial. In this paper, we propose a novel deep few-shot unsupervised learning-based video-to-video translation scheme, which by using the episodic learning technique generates high-quality visual signals. Further, in order to enhance the spatio-temporal consistency of the translated video signals, we incorporate a novel module in the proposed method that employs the Kalman filtering operation and relativistic generative adversarial neural networks. The results of extensive experiments show that the proposed video-to-video translation scheme significantly outperforms the state-of-the-art methods, when the number of video signal samples in the target domain is small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4

Similar content being viewed by others

Data availibility

No datasets were generated or analysed during the current study.

References

  1. Schutera, M., Hussein, M., Abhau, J., Mikut, R., Reischl, M.: Night-to-day: online image-to-image translation for object detection within autonomous driving by night. IEEE Trans. Intel. Veh. 6(3), 480–489 (2020)

    Article  Google Scholar 

  2. Wang, L., Cho, W., Yoon, K.-J.: Deceiving image-to-image translation networks for autonomous driving with adversarial perturbations. IEEE Robot. Autom. Lett. 5(2), 1421–1428 (2020)

    Article  MATH  Google Scholar 

  3. Lin, C.-T., Wu, Y.-Y., Hsu, P.-H., Lai, S.-H.: Multimodal structure-consistent image-to-image translation. Proc. AAAI Conf. Artif. Intel. 34(07), 11 490-11 498 (2020)

    Google Scholar 

  4. Chen, H., Wang, Y., Shu, H., Wen, C., Xu, C., Shi, B., Xu, C., Xu, C.: Distilling portable generative adversarial networks for image translation. Proc. AAAI Conf. Artif. Intel. 34(04), 3585–3592 (2020)

    MATH  Google Scholar 

  5. Song, S., Lee, S., Seong, H., Min, K., Kim, E.: Shunit: style harmonization for unpaired image-to-image translation. Proc. AAAI Conf. Artif. Intel. 37(2), 2292–2302 (2023)

    MATH  Google Scholar 

  6. Zhang, J., Lang, X., Huang, B., Jiang, X.: Vae-Cogan: unpaired image-to-image translation for low-level vision. Signal Image Video Process. 17(4), 1019–1026 (2023)

    Article  MATH  Google Scholar 

  7. Hu, J., Wu, G., Wang, H., Zhang, J.: Latent style: multi-style image transfer via latent style coding and skip connection. Signal Image Video Process. 16, 359 (2022)

    Article  MATH  Google Scholar 

  8. Fan, X., Ye, R., Cai, F., Liu, J., Li, Y., Huang, L., Ding, Y.: Multi-orientation depthwise extraction for stereo image super- resolution. Signal Image Video Process. 17(8), 4087–4095 (2023)

    Article  MATH  Google Scholar 

  9. Huang, Y., Bian, W., Jie, B., Zhu, Z., Li, W.: Image super-resolution reconstruction based on deep dictionary learning and A+. Signal Image Video Process. 18, 2629 (2024)

    Article  MATH  Google Scholar 

  10. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, (2017)

  11. Liu, M.-Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10551–10560, (2019)

  12. Saito, K., Saenko, K., Liu, M.-Y.: Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In: Computer Vision-ECCV, 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer 2020, pp. 382–398 (2020)

  13. Lin, J., Xia, Y., Liu, S., Zhao, S., Chen, Z.: Zstgan: an adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomputing 461, 327–335 (2021)

    Article  MATH  Google Scholar 

  14. Cohen, T., Wolf, L.: Bidirectional one-shot unsupervised domain mapping. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1784–1792, (2019)

  15. Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. In: Advances in Neural Information Processing Systems, vol. 31, (2018)

  16. Wang, T.-C., Liu, M.-Y., Tao, A., Liu, G., Kautz, J., Catanzaro, B.: Few-shot video-to-video synthesis, arXiv preprint arXiv:1910.12713, (2019)

  17. Lee, J., Ramanan, D., Girdhar, R.: Metapix: Few-shot video retargeting, arXiv preprint arXiv:1910.04742, (2019)

  18. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189, (2018)

  19. Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: International Conference on Machine Learning. PMLR, (2017), pp. 1857–1865

  20. Zhang, F., Wang, C.: Msgan: generative adversarial networks for image seasonal style transfer. IEEE Access 8, 104 830-104 840 (2020)

    Article  MATH  Google Scholar 

  21. Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296, (2021)

  22. Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857, (2017)

  23. Liu, M.-Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, vol. 30, (2017)

  24. Bansal, A., Ma, S., Ramanan, D., Sheikh, Y.: Recycle-gan: Unsupervised video retargeting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135, (2018)

  25. Wang, K., Akash, K., Misu, T.: Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow. Proc AAAI Conf Artif Intel 36, 2477–2486 (2022)

    MATH  Google Scholar 

  26. Liu, H., Li, C., Lei, D., Zhu, Q.: Unsupervised video-to-video translation with preservation of frame modification tendency. Visual Comput. 36, 2105–2116 (2020)

    Article  MATH  Google Scholar 

  27. Chung, C., Park, Y., Choi, S., Ganbat, M., Choo, J.: Shortcut-v2v: compression framework for video-to-video translation based on temporal redundancy reduction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7612–7622, (2023)

  28. Fan, R., Sun, Q., Xia, R., Tang, Y.: Temporally consistent unpaired multi-domain video translation by contrastive learning. In: 2024 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1–8, (2024)

  29. Lin, J., Pang, Y., Xia, Y., Chen, Z., Luo, J.: Tuigan: learning versatile image-to-image translation with two unpaired images, In: Computer Vision-ECCV, 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer 2020, pp. 18–35 (2020)

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, (2014)

  31. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol. 30, (2017)

  32. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 1199–1208, (2018)

  33. Faragher, R.: Understanding the basis of the Kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 29(5), 128–132 (2012)

    Article  Google Scholar 

  34. Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734, (2018)

  35. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  36. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)

    Article  Google Scholar 

  37. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, (2014)

  38. Park, K., Woo, S., Kim, D., Cho, D., Kweon, I.S.: Preserving semantic and temporal consistency for unpaired video-to-video translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1248–1257, (2019)

  39. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30, (2017)

  40. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, (2016)

  41. Dowson, D., Landau, B.: The fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982)

    Article  MATH  Google Scholar 

  42. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222, (2017)

  43. Huynh-Thu, Q., Garcia, M.-N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57(1), 1–14 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Koorosh Roohi: conceptualization, software, writing. Alireza Esmaeilzehi: conceptualization, software, writing. M. Omair Ahmad: conceptualization, writing, supervision.

Corresponding author

Correspondence to Alireza Esmaeilzehi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roohi, K., Esmaeilzehi, A. & Ahmad, M.O. FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN. SIViP 19, 422 (2025). https://doi.org/10.1007/s11760-025-03982-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03982-3

Keywords