Skip to main content
Log in

CDPT: context-driven omni-dimensional dynamic pose transfer network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Pose transfer refers to transferring a given person’s pose to the target pose. We present a context-driven omni-dimensional dynamic pose transfer model to address the issue of regular convolutional networks being unable to manage complicated changes. First, we construct a dynamic convolution module to extract rich contextual features. This module dynamically adjusts to the differences in input data during the convolution process, enhancing the adaptability of features. Second, a feature fusion block (FFBlock) is built by merging multiscale channel attention information from global and local channel contexts. Furthermore, the focal-frequency distance between the generated image and the original image is measured using the focal-frequency loss, which allows a model to adaptively focus on difficult-to-synthesis frequency components by reducing the weighting of easy-to-synthesis frequency components, narrowing the gap in the frequency domain, and improving image generation quality. The effectiveness and efficiency of the network are qualitatively and quantitatively verified on fashion datasets, and a large number of experiments demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability statement

The data used to support the findings of this study is available from the corresponding author upon request. No datasets were generated or analysed during the current study.

References

  1. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. Adv. Neural Info. Process. Syst. 30 (2017)

  2. Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: video forecasting by generating pose futures. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3332–3341 (2017)

  3. Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762 (2017)

  4. Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 99–108 (2018)

  5. Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2347–2356 (2019)

  6. Roy, P., Bhattacharya, S., Ghosh, S., Pal, U.: Multi-scale attention guided pose transfer. Pattern Recognit. 137, 109315 (2023)

    Article  MATH  Google Scholar 

  7. Esser, P., Sutter, E., Ommer, B.: A variational u-net for conditional appearance and shape generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8857–8866 (2018)

  8. Tang, H., Bai, S., Zhang, L., Torr, P.H., Sebe, N.: Xinggan for person image generation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 717–734. Springer (2020)

  9. Zhang, J., Li, K., Lai, Y.-K., Yang, J.: Pise: person image synthesis and editing with decoupled gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7982–7990 (2021)

  10. Zhang, P., Yang, L., Lai, J.-H., Xie, X.: Exploring dual-task correlation for pose guided person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7713–7722 (2022)

  11. Bako, S., Vogels, T., McWilliams, B., Meyer, M., Novák, J., Harvill, A., Sen, P., Derose, T., Rousselle, F.: Kernel-predicting convolutional networks for denoising monte carlo renderings. ACM Trans. Graph. 36(4), 97–1 (2017)

    Article  Google Scholar 

  12. Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, pp. 2554–2563. PMLR (2017)

  13. Mildenhall, B., Barron, J.T., Chen, J., Sharlet, D., Ng, R., Carroll, R.: Burst denoising with kernel prediction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2502–2510 (2018)

  14. Diba, A., Sharma, V., Gool, L.V., Stiefelhagen, R.: Dynamonet: Dynamic action and motion network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6192–6201 (2019)

  15. Ma, N., Zhang, X., Huang, J., Sun, J.: Weightnet: Revisiting the design space of weight networks. In: European Conference on Computer Vision, pp. 776–792. Springer (2020)

  16. Lin, X., Ma, L., Liu, W., Chang, S.-F.: Context-gated convolution. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 701–718 (2020). Springer

  17. Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947 (2022)

  18. Cui, Y., Tao, Y., Bing, Z., Ren, W., Gao, X., Cao, X., Huang, K., Knoll, A.: Selective frequency network for image restoration. In: The 11th International Conference on Learning Representations (2023)

  19. Cui, Y., Ren, W., Knoll, A.: Omni-kernel network for image restoration. Proc. AAAI Conf. Artif. Intell. 38, 1426–1434 (2024)

    MATH  Google Scholar 

  20. Cui, Y., Ren, W., Cao, X., Knoll, A.: Revitalizing convolutional network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. (2024)

  21. Jiang, L., Dai, B., Wu, W., Loy, C.C.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13919–13929 (2021)

  22. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  MATH  Google Scholar 

  23. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Adv. Neural Info. Process. Syst. 29 (2016)

  24. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Info. Process. Syst. 30 (2017)

  25. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

  26. Men, Y., Mao, Y., Jiang, Y., Ma, W.-Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5084–5093 (2020)

Download references

Funding

This work was supported by the National Natural Science Foundation of China (61772179), Hunan Provincial Natural Science Foundation of China (2022JJ50016,2023JJ50095), and the Science and Technology Plan Project of Hunan Province (2016TP1020). Scientific Research Fund of Hunan Provincial Education Department (21B0649), Double First-Class University Project of Hunan Province (Xiangjiaotong [2018]469,[2020]248), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20221285).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Yue Chen, Xiaoman Liang, Mugang Lin, Yuan Qin and Huihuang Zhao. The first draft of the manuscript was written by Yue Chen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaoman Liang.

Ethics declarations

Conflict of interest

All the authors declare that they have no competing financial interests or personal relationships that could influence the work reported in this paper.

Ethics approval statement

This study did not involve any humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Liang, X., Lin, M. et al. CDPT: context-driven omni-dimensional dynamic pose transfer network. SIViP 19, 376 (2025). https://doi.org/10.1007/s11760-025-03969-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03969-0

Keywords