Skip to main content
Log in

Fast Fourier transform-based method of neural network training for human re-rendering

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

A novel view synthesis is one of generative imaging issues in which generative adversarial networks (GANs) can be applied. One of such tasks is human re-rendering from a single image. In this work, we reimplement the current state-of-the-art solution and identify its main drawbacks—low quality of rendered images in the areas of high-frequency details like hair, faces, hands, etc. We modify the architectures of baseline models and investigate the influence of operations on Fourier spectra of the images, which we believe may be the solution to the main issue of missing quality of high-frequency details. In particular, we propose discrete Fourier transform loss function (DFT loss) and investigate how this loss function influences the visual quality and evaluation metrics values for the rendered images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Please contact - Sarkar et al. [2].

References

  1. Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-Octob, pp. 5903–5912 (2019)

  2. Sarkar, K., Mehta, D., Xu, W., Golyanik, V., Theobalt, C.:Neural Re-rendering of Humans from a Single Image. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12356 LNCS (2020)

  3. Zhu, H., Su, H., Wang, P., Cao, X., Yang, R.:View Extrapolation of Human Body from a Single Image. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4450–4459 (2018)

  4. Sitzmann, V., Thies, J., Heide, F., Niebner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 2432–2441 (2019)

  5. Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Gr. 38(4), 10500 (2019)

    Article  Google Scholar 

  6. Xu, C., Fu, Y., Wen, C., Pan, Y., Jiang, Y.G., Xue, X.: Pose-guided person image synthesis in the non-iconic views. IEEE Trans. Image Process. 29(1), 9060–9072 (2020)

    Article  MATH  Google Scholar 

  7. Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. In: MM 2018-Proceedings of the 2018 ACM Multimedia Conference, pp. 383–391 (2018)

  8. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

  9. Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 43–54 (1996)

  10. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 11–20 (1996)

  11. Dinh, L., Krueger, D., Bengio, Y.: NICE: Non-linear independent components estimation. In: 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, vol. 1, no. 2, pp. 1–13 (2015)

  12. Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1\(\times \)1 convolutions. In: Advances in Neural Information Processing Systems, vol. 2018-Decem, pp. 10215–10224 (2018)

  13. Mordvintsev, A., Pezzotti, N., Schubert, L., Olah, C.: Differentiable image parameterizations. Distill 3, 7 (2018)

    Article  Google Scholar 

  14. Van Den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, pp. 2611–2620 (2016)

  15. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12346 LNCS, pp. 405–421 (2020)

  16. Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. arXiv, no. NeurIPS, pp. 1–12 (2019)

  17. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv, pp. 1–26 (2017)

  18. Flynn, J., Broxton, M., Debevec, P., Duvall, M., Fyffe, G., Overbeck, R., Snavely, N., Tucker, R.: Deepview: view synthesis with learned gradient descent. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 2362–2371 (2019)

  19. Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial X-rays. Comput. Gr. Forum 37(2), 377–388 (2018)

    Article  Google Scholar 

  20. Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syst. 2017, 365–376 (2017)

    Google Scholar 

  21. Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ren, N.G., Abhishek, K.A.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. arXiv, vol. 38, no. 4 (2019)

  22. Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans (2020)

  23. Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 853–862 (2017)

  24. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, Van L.: Pose guided person image generation. Adv. Neural Inf. Process. Syst. 2017, 406–416 (2017)

    Google Scholar 

  25. Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)

  26. Grigorev, A., Sevastopolsky, A., Vakhitov, A., Lempitsky, V.: Coordinate-based texture inpainting for pose-guided human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 12127–12136 (2019)

  27. Neverova, N., Alp Güler, R., Kokkinos, I.: Dense pose transfer. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11207 LNCS, pp. 128–143 (2018)

  28. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Gr. 34, 6 (2015)

    Article  Google Scholar 

  29. Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)

  30. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. 1, pp. 1096–1104 (2016)

  31. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)

    Article  Google Scholar 

  32. Dzanic, T., Witherden, F.D.: Fourier spectrum discrepancies in deep network generated images.arXiv, vol. 1, no. NeurIPS 2020, pp. 1–11 (2019)

  33. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  34. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings-13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 67–74 (2018)

  35. Daoud, A.O., Tsehayae, A.A., Fayek, A.R.: A guided evaluation of the impact of research and development partnerships on university, industry, and government. Can. J. Civ. Eng. 44(4), 253–263 (2017)

    Article  Google Scholar 

  36. Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)

    Article  MATH  Google Scholar 

  37. Paulraj, M.P., Zanar Azalan, M.S., H.C.R., Palaniappan, R.: Image quality assessment using elman neural network model and interleaving method. In: International Journal of Human Computer Interaction (IJHCI), vol. 3, no. 3, pp. 51–57 (2012)

  38. Kipli, K., Muhammad, M.S., Masra, S.M.W., Zamhari, N., Lias, K., Mat, D.A.A.: Performance of levenberg-marquardt backpropagation for full reference hybrid image quality metrics. Eng. Comput. Sci. 2195, 704–707 (2012)

    Google Scholar 

  39. Wang, X., Liang, X., Yang, B., Li, F.W.: No-reference synthetic image quality assessment with convolutional neural network and local image saliency. Comput. Vis. Media 5(2), 193–208 (2019)

  40. Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles. arXiv, vol. 6 (2019)

  41. Hsu, C.-h., Guo, Z., Yen, K.: Comparison of image approximation methods: fourier transform, cosine transform, wavelets packet and Karhunen-Loeve Transform (2002)

  42. Repository: DCT (Discrete Cosine Transform) for PyTorch

  43. Makhoul, J.: A fast cosine transform in one and two dimensions. IEEE Trans. Acoust. Speech Signal Process. 28(1), 27–34 (1980)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

K.B.K. introduced the study; P.K, K.G, K.B.K. contributed to methods and results; K.G. done discussion; P.K. did code and data.

Corresponding author

Correspondence to Krzysztof Gromada.

Ethics declarations

Conflict of interest

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Code availability

On contact.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gromada, K., Kowaleczko, P. & Kalinowska, K.B. Fast Fourier transform-based method of neural network training for human re-rendering. SIViP 17, 227–235 (2023). https://doi.org/10.1007/s11760-022-02225-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02225-z

Keywords

Navigation