Abstract
A novel view synthesis is one of generative imaging issues in which generative adversarial networks (GANs) can be applied. One of such tasks is human re-rendering from a single image. In this work, we reimplement the current state-of-the-art solution and identify its main drawbacks—low quality of rendered images in the areas of high-frequency details like hair, faces, hands, etc. We modify the architectures of baseline models and investigate the influence of operations on Fourier spectra of the images, which we believe may be the solution to the main issue of missing quality of high-frequency details. In particular, we propose discrete Fourier transform loss function (DFT loss) and investigate how this loss function influences the visual quality and evaluation metrics values for the rendered images.
Similar content being viewed by others
Data availability
Please contact - Sarkar et al. [2].
References
Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-Octob, pp. 5903–5912 (2019)
Sarkar, K., Mehta, D., Xu, W., Golyanik, V., Theobalt, C.:Neural Re-rendering of Humans from a Single Image. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12356 LNCS (2020)
Zhu, H., Su, H., Wang, P., Cao, X., Yang, R.:View Extrapolation of Human Body from a Single Image. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4450–4459 (2018)
Sitzmann, V., Thies, J., Heide, F., Niebner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 2432–2441 (2019)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Gr. 38(4), 10500 (2019)
Xu, C., Fu, Y., Wen, C., Pan, Y., Jiang, Y.G., Xue, X.: Pose-guided person image synthesis in the non-iconic views. IEEE Trans. Image Process. 29(1), 9060–9072 (2020)
Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. In: MM 2018-Proceedings of the 2018 ACM Multimedia Conference, pp. 383–391 (2018)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 43–54 (1996)
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 11–20 (1996)
Dinh, L., Krueger, D., Bengio, Y.: NICE: Non-linear independent components estimation. In: 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, vol. 1, no. 2, pp. 1–13 (2015)
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1\(\times \)1 convolutions. In: Advances in Neural Information Processing Systems, vol. 2018-Decem, pp. 10215–10224 (2018)
Mordvintsev, A., Pezzotti, N., Schubert, L., Olah, C.: Differentiable image parameterizations. Distill 3, 7 (2018)
Van Den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, pp. 2611–2620 (2016)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12346 LNCS, pp. 405–421 (2020)
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. arXiv, no. NeurIPS, pp. 1–12 (2019)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv, pp. 1–26 (2017)
Flynn, J., Broxton, M., Debevec, P., Duvall, M., Fyffe, G., Overbeck, R., Snavely, N., Tucker, R.: Deepview: view synthesis with learned gradient descent. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 2362–2371 (2019)
Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial X-rays. Comput. Gr. Forum 37(2), 377–388 (2018)
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syst. 2017, 365–376 (2017)
Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ren, N.G., Abhishek, K.A.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. arXiv, vol. 38, no. 4 (2019)
Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans (2020)
Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 853–862 (2017)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, Van L.: Pose guided person image generation. Adv. Neural Inf. Process. Syst. 2017, 406–416 (2017)
Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)
Grigorev, A., Sevastopolsky, A., Vakhitov, A., Lempitsky, V.: Coordinate-based texture inpainting for pose-guided human image generation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 12127–12136 (2019)
Neverova, N., Alp Güler, R., Kokkinos, I.: Dense pose transfer. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11207 LNCS, pp. 128–143 (2018)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Gr. 34, 6 (2015)
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. 1, pp. 1096–1104 (2016)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Dzanic, T., Witherden, F.D.: Fourier spectrum discrepancies in deep network generated images.arXiv, vol. 1, no. NeurIPS 2020, pp. 1–11 (2019)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings-13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 67–74 (2018)
Daoud, A.O., Tsehayae, A.A., Fayek, A.R.: A guided evaluation of the impact of research and development partnerships on university, industry, and government. Can. J. Civ. Eng. 44(4), 253–263 (2017)
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Paulraj, M.P., Zanar Azalan, M.S., H.C.R., Palaniappan, R.: Image quality assessment using elman neural network model and interleaving method. In: International Journal of Human Computer Interaction (IJHCI), vol. 3, no. 3, pp. 51–57 (2012)
Kipli, K., Muhammad, M.S., Masra, S.M.W., Zamhari, N., Lias, K., Mat, D.A.A.: Performance of levenberg-marquardt backpropagation for full reference hybrid image quality metrics. Eng. Comput. Sci. 2195, 704–707 (2012)
Wang, X., Liang, X., Yang, B., Li, F.W.: No-reference synthetic image quality assessment with convolutional neural network and local image saliency. Comput. Vis. Media 5(2), 193–208 (2019)
Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles. arXiv, vol. 6 (2019)
Hsu, C.-h., Guo, Z., Yen, K.: Comparison of image approximation methods: fourier transform, cosine transform, wavelets packet and Karhunen-Loeve Transform (2002)
Repository: DCT (Discrete Cosine Transform) for PyTorch
Makhoul, J.: A fast cosine transform in one and two dimensions. IEEE Trans. Acoust. Speech Signal Process. 28(1), 27–34 (1980)
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
K.B.K. introduced the study; P.K, K.G, K.B.K. contributed to methods and results; K.G. done discussion; P.K. did code and data.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
On contact.
Ethics approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gromada, K., Kowaleczko, P. & Kalinowska, K.B. Fast Fourier transform-based method of neural network training for human re-rendering. SIViP 17, 227–235 (2023). https://doi.org/10.1007/s11760-022-02225-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-022-02225-z