FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN

Roohi, Koorosh; Esmaeilzehi, Alireza; Ahmad, M. Omair

doi:10.1007/s11760-025-03982-3

FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN

Original Paper
Published: 28 March 2025

Volume 19, article number 422, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Koorosh Roohi¹,
Alireza Esmaeilzehi² &
M. Omair Ahmad³

98 Accesses
Explore all metrics

Abstract

Deep neural networks have provided promising results for the task of video-to-video translation. These schemes require a large number of training samples from both the source and target domains for producing translated video signals with high visual qualities. However, often acquiring several video signals from the target domain is difficult, as it needs various logistics and requires a considerable amount of time. Therefore, the development of the deep few-shot learning-based schemes that are able to efficiently perform the task of video-to-video translation using only a few number of samples from the target domain is crucial. In this paper, we propose a novel deep few-shot unsupervised learning-based video-to-video translation scheme, which by using the episodic learning technique generates high-quality visual signals. Further, in order to enhance the spatio-temporal consistency of the translated video signals, we incorporate a novel module in the proposed method that employs the Kalman filtering operation and relativistic generative adversarial neural networks. The results of extensive experiments show that the proposed video-to-video translation scheme significantly outperforms the state-of-the-art methods, when the number of video signal samples in the target domain is small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Unsupervised video-to-video translation with preservation of frame modification tendency

Article 22 July 2020

SIGMA: Sinkhorn-Guided Masked Video Modeling

Two-Channel VAE-GAN Based Image-To-Video Translation

Data availibility

No datasets were generated or analysed during the current study.

References

Schutera, M., Hussein, M., Abhau, J., Mikut, R., Reischl, M.: Night-to-day: online image-to-image translation for object detection within autonomous driving by night. IEEE Trans. Intel. Veh. 6(3), 480–489 (2020)
Article Google Scholar
Wang, L., Cho, W., Yoon, K.-J.: Deceiving image-to-image translation networks for autonomous driving with adversarial perturbations. IEEE Robot. Autom. Lett. 5(2), 1421–1428 (2020)
Article MATH Google Scholar
Lin, C.-T., Wu, Y.-Y., Hsu, P.-H., Lai, S.-H.: Multimodal structure-consistent image-to-image translation. Proc. AAAI Conf. Artif. Intel. 34(07), 11 490-11 498 (2020)
Google Scholar
Chen, H., Wang, Y., Shu, H., Wen, C., Xu, C., Shi, B., Xu, C., Xu, C.: Distilling portable generative adversarial networks for image translation. Proc. AAAI Conf. Artif. Intel. 34(04), 3585–3592 (2020)
MATH Google Scholar
Song, S., Lee, S., Seong, H., Min, K., Kim, E.: Shunit: style harmonization for unpaired image-to-image translation. Proc. AAAI Conf. Artif. Intel. 37(2), 2292–2302 (2023)
MATH Google Scholar
Zhang, J., Lang, X., Huang, B., Jiang, X.: Vae-Cogan: unpaired image-to-image translation for low-level vision. Signal Image Video Process. 17(4), 1019–1026 (2023)
Article MATH Google Scholar
Hu, J., Wu, G., Wang, H., Zhang, J.: Latent style: multi-style image transfer via latent style coding and skip connection. Signal Image Video Process. 16, 359 (2022)
Article MATH Google Scholar
Fan, X., Ye, R., Cai, F., Liu, J., Li, Y., Huang, L., Ding, Y.: Multi-orientation depthwise extraction for stereo image super- resolution. Signal Image Video Process. 17(8), 4087–4095 (2023)
Article MATH Google Scholar
Huang, Y., Bian, W., Jie, B., Zhu, Z., Li, W.: Image super-resolution reconstruction based on deep dictionary learning and A+. Signal Image Video Process. 18, 2629 (2024)
Article MATH Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, (2017)
Liu, M.-Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10551–10560, (2019)
Saito, K., Saenko, K., Liu, M.-Y.: Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In: Computer Vision-ECCV, 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer 2020, pp. 382–398 (2020)
Lin, J., Xia, Y., Liu, S., Zhao, S., Chen, Z.: Zstgan: an adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomputing 461, 327–335 (2021)
Article MATH Google Scholar
Cohen, T., Wolf, L.: Bidirectional one-shot unsupervised domain mapping. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1784–1792, (2019)
Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. In: Advances in Neural Information Processing Systems, vol. 31, (2018)
Wang, T.-C., Liu, M.-Y., Tao, A., Liu, G., Kautz, J., Catanzaro, B.: Few-shot video-to-video synthesis, arXiv preprint arXiv:1910.12713, (2019)
Lee, J., Ramanan, D., Girdhar, R.: Metapix: Few-shot video retargeting, arXiv preprint arXiv:1910.04742, (2019)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189, (2018)
Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: International Conference on Machine Learning. PMLR, (2017), pp. 1857–1865
Zhang, F., Wang, C.: Msgan: generative adversarial networks for image seasonal style transfer. IEEE Access 8, 104 830-104 840 (2020)
Article MATH Google Scholar
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296, (2021)
Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857, (2017)
Liu, M.-Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, vol. 30, (2017)
Bansal, A., Ma, S., Ramanan, D., Sheikh, Y.: Recycle-gan: Unsupervised video retargeting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135, (2018)
Wang, K., Akash, K., Misu, T.: Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow. Proc AAAI Conf Artif Intel 36, 2477–2486 (2022)
MATH Google Scholar
Liu, H., Li, C., Lei, D., Zhu, Q.: Unsupervised video-to-video translation with preservation of frame modification tendency. Visual Comput. 36, 2105–2116 (2020)
Article MATH Google Scholar
Chung, C., Park, Y., Choi, S., Ganbat, M., Choo, J.: Shortcut-v2v: compression framework for video-to-video translation based on temporal redundancy reduction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7612–7622, (2023)
Fan, R., Sun, Q., Xia, R., Tang, Y.: Temporally consistent unpaired multi-domain video translation by contrastive learning. In: 2024 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1–8, (2024)
Lin, J., Pang, Y., Xia, Y., Chen, Z., Luo, J.: Tuigan: learning versatile image-to-image translation with two unpaired images, In: Computer Vision-ECCV, 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer 2020, pp. 18–35 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, (2014)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol. 30, (2017)
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 1199–1208, (2018)
Faragher, R.: Understanding the basis of the Kalman filter via a simple and intuitive derivation. IEEE Signal Process. Mag. 29(5), 128–132 (2012)
Article Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734, (2018)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet MATH Google Scholar
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, (2014)
Park, K., Woo, S., Kim, D., Cho, D., Kweon, I.S.: Preserving semantic and temporal consistency for unpaired video-to-video translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1248–1257, (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30, (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, (2016)
Dowson, D., Landau, B.: The fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982)
Article MATH Google Scholar
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222, (2017)
Huynh-Thu, Q., Garcia, M.-N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57(1), 1–14 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada
Koorosh Roohi
The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
Alireza Esmaeilzehi
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
M. Omair Ahmad

Authors

Koorosh Roohi
View author publications
You can also search for this author inPubMed Google Scholar
Alireza Esmaeilzehi
View author publications
You can also search for this author inPubMed Google Scholar
M. Omair Ahmad
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Koorosh Roohi: conceptualization, software, writing. Alireza Esmaeilzehi: conceptualization, software, writing. M. Omair Ahmad: conceptualization, writing, supervision.

Corresponding author

Correspondence to Alireza Esmaeilzehi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Roohi, K., Esmaeilzehi, A. & Ahmad, M.O. FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN. SIViP 19, 422 (2025). https://doi.org/10.1007/s11760-025-03982-3

Download citation

Received: 13 March 2024
Revised: 14 February 2025
Accepted: 20 February 2025
Published: 28 March 2025
DOI: https://doi.org/10.1007/s11760-025-03982-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised video-to-video translation with preservation of frame modification tendency

SIGMA: Sinkhorn-Guided Masked Video Modeling

Two-Channel VAE-GAN Based Image-To-Video Translation

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now