Fast Fourier transform-based method of neural network training for human re-rendering

Gromada, Krzysztof; Kowaleczko, Paweł; Kalinowska, Kamila Barbara

doi:10.1007/s11760-022-02225-z

Fast Fourier transform-based method of neural network training for human re-rendering

Original Paper
Published: 28 April 2022

Volume 17, pages 227–235, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Krzysztof Gromada ORCID: orcid.org/0000-0003-3125-7661¹^na1,
Paweł Kowaleczko²^na1 &
Kamila Barbara Kalinowska³^na1

337 Accesses
Explore all metrics

Abstract

A novel view synthesis is one of generative imaging issues in which generative adversarial networks (GANs) can be applied. One of such tasks is human re-rendering from a single image. In this work, we reimplement the current state-of-the-art solution and identify its main drawbacks—low quality of rendered images in the areas of high-frequency details like hair, faces, hands, etc. We modify the architectures of baseline models and investigate the influence of operations on Fourier spectra of the images, which we believe may be the solution to the main issue of missing quality of high-frequency details. In particular, we propose discrete Fourier transform loss function (DFT loss) and investigate how this loss function influences the visual quality and evaluation metrics values for the rendered images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

Please contact - Sarkar et al. [2].

References

Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-Octob, pp. 5903–5912 (2019)
Sarkar, K., Mehta, D., Xu, W., Golyanik, V., Theobalt, C.:Neural Re-rendering of Humans from a Single Image. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12356 LNCS (2020)
Zhu, H., Su, H., Wang, P., Cao, X., Yang, R.:View Extrapolation of Human Body from a Single Image. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4450–4459 (2018)
Sitzmann, V., Thies, J., Heide, F., Niebner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 2432–2441 (2019)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Gr. 38(4), 10500 (2019)
Article Google Scholar
Xu, C., Fu, Y., Wen, C., Pan, Y., Jiang, Y.G., Xue, X.: Pose-guided person image synthesis in the non-iconic views. IEEE Trans. Image Process. 29(1), 9060–9072 (2020)
Article MATH Google Scholar
Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. In: MM 2018-Proceedings of the 2018 ACM Multimedia Conference, pp. 383–391 (2018)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 43–54 (1996)
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 11–20 (1996)
Dinh, L., Krueger, D., Bengio, Y.: NICE: Non-linear independent components estimation. In: 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, vol. 1, no. 2, pp. 1–13 (2015)
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1\(\times \)1 convolutions. In: Advances in Neural Information Processing Systems, vol. 2018-Decem, pp. 10215–10224 (2018)
Mordvintsev, A., Pezzotti, N., Schubert, L., Olah, C.: Differentiable image parameterizations. Distill 3, 7 (2018)
Article Google Scholar
Van Den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, pp. 2611–2620 (2016)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12346 LNCS, pp. 405–421 (2020)
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. arXiv, no. NeurIPS, pp. 1–12 (2019)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv, pp. 1–26 (2017)
Flynn, J., Broxton, M., Debevec, P., Duvall, M., Fyffe, G., Overbeck, R., Snavely, N., Tucker, R.: Deepview: view synthesis with learned gradient descent. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 2362–2371 (2019)
Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial X-rays. Comput. Gr. Forum 37(2), 377–388 (2018)
Article Google Scholar
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syst. 2017, 365–376 (2017)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ren, N.G., Abhishek, K.A.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. arXiv, vol. 38, no. 4 (2019)
Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans (2020)
Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 853–862 (2017)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, Van L.: Pose guided person image generation. Adv. Neural Inf. Process. Syst. 2017, 406–416 (2017)
Google Scholar
Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)
Grigorev, A., Sevastopolsky, A., Vakhitov, A., Lempitsky, V.: Coordinate-based texture inpainting for pose-guided human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 12127–12136 (2019)
Neverova, N., Alp Güler, R., Kokkinos, I.: Dense pose transfer. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11207 LNCS, pp. 128–143 (2018)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Gr. 34, 6 (2015)
Article Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. 1, pp. 1096–1104 (2016)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Article Google Scholar
Dzanic, T., Witherden, F.D.: Fourier spectrum discrepancies in deep network generated images.arXiv, vol. 1, no. NeurIPS 2020, pp. 1–11 (2019)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings-13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 67–74 (2018)
Daoud, A.O., Tsehayae, A.A., Fayek, A.R.: A guided evaluation of the impact of research and development partnerships on university, industry, and government. Can. J. Civ. Eng. 44(4), 253–263 (2017)
Article Google Scholar
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Article MATH Google Scholar
Paulraj, M.P., Zanar Azalan, M.S., H.C.R., Palaniappan, R.: Image quality assessment using elman neural network model and interleaving method. In: International Journal of Human Computer Interaction (IJHCI), vol. 3, no. 3, pp. 51–57 (2012)
Kipli, K., Muhammad, M.S., Masra, S.M.W., Zamhari, N., Lias, K., Mat, D.A.A.: Performance of levenberg-marquardt backpropagation for full reference hybrid image quality metrics. Eng. Comput. Sci. 2195, 704–707 (2012)
Google Scholar
Wang, X., Liang, X., Yang, B., Li, F.W.: No-reference synthetic image quality assessment with convolutional neural network and local image saliency. Comput. Vis. Media 5(2), 193–208 (2019)
Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles. arXiv, vol. 6 (2019)
Hsu, C.-h., Guo, Z., Yen, K.: Comparison of image approximation methods: fourier transform, cosine transform, wavelets packet and Karhunen-Loeve Transform (2002)
Repository: DCT (Discrete Cosine Transform) for PyTorch
Makhoul, J.: A fast cosine transform in one and two dimensions. IEEE Trans. Acoust. Speech Signal Process. 28(1), 27–34 (1980)

Download references

Funding

Not applicable.

Author information

These authors contributed equally to this work.

Authors and Affiliations

The Institute of Automatic Control and Robotics, Warsaw University of Technology, Plac Politechniki 1, Warsaw, 00-661, Poland
Krzysztof Gromada
Institute of Computer Science, Warsaw University of Technology, Plac Politechniki 1, Warsaw, 00-661, Poland
Paweł Kowaleczko
Institute of Photogrammetry, Remote Sensing and Spatial Information Systems, Warsaw University of Technology, Plac Politechniki 1, Warsaw, 00-661, Poland
Kamila Barbara Kalinowska

Authors

Krzysztof Gromada
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Kowaleczko
View author publications
You can also search for this author in PubMed Google Scholar
Kamila Barbara Kalinowska
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.B.K. introduced the study; P.K, K.G, K.B.K. contributed to methods and results; K.G. done discussion; P.K. did code and data.

Corresponding author

Correspondence to Krzysztof Gromada.

Ethics declarations

Conflict of interest

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Code availability

On contact.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gromada, K., Kowaleczko, P. & Kalinowska, K.B. Fast Fourier transform-based method of neural network training for human re-rendering. SIViP 17, 227–235 (2023). https://doi.org/10.1007/s11760-022-02225-z

Download citation

Received: 26 September 2021
Revised: 16 January 2022
Accepted: 13 February 2022
Published: 28 April 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11760-022-02225-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Fourier transform-based method of neural network training for human re-rendering

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Code availability

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast Fourier transform-based method of neural network training for human re-rendering

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Code availability

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation