CDPT: context-driven omni-dimensional dynamic pose transfer network

Chen, Yue; Liang, Xiaoman; Lin, Mugang; Qin, Yuan; Zhao, Huihuang

doi:10.1007/s11760-025-03969-0

CDPT: context-driven omni-dimensional dynamic pose transfer network

Original Paper
Published: 07 March 2025

Volume 19, article number 376, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yue Chen¹,
Xiaoman Liang^1,2^na1,
Mugang Lin^1,2^na1,
Yuan Qin¹^na1 &
…
Huihuang Zhao^1,2^na1

90 Accesses
Explore all metrics

Abstract

Pose transfer refers to transferring a given person’s pose to the target pose. We present a context-driven omni-dimensional dynamic pose transfer model to address the issue of regular convolutional networks being unable to manage complicated changes. First, we construct a dynamic convolution module to extract rich contextual features. This module dynamically adjusts to the differences in input data during the convolution process, enhancing the adaptability of features. Second, a feature fusion block (FFBlock) is built by merging multiscale channel attention information from global and local channel contexts. Furthermore, the focal-frequency distance between the generated image and the original image is measured using the focal-frequency loss, which allows a model to adaptively focus on difficult-to-synthesis frequency components by reducing the weighting of easy-to-synthesis frequency components, narrowing the gap in the frequency domain, and improving image generation quality. The effectiveness and efficiency of the network are qualitatively and quantitatively verified on fashion datasets, and a large number of experiments demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose Guided Human Image Synthesis by View Disentanglement and Enhanced Weighting Loss

Texture-driven pose-guided human image synthesis

Article 28 March 2025

Dense Pose Transfer

Data availability statement

The data used to support the findings of this study is available from the corresponding author upon request. No datasets were generated or analysed during the current study.

References

Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. Adv. Neural Info. Process. Syst. 30 (2017)
Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: video forecasting by generating pose futures. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3332–3341 (2017)
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762 (2017)
Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 99–108 (2018)
Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2347–2356 (2019)
Roy, P., Bhattacharya, S., Ghosh, S., Pal, U.: Multi-scale attention guided pose transfer. Pattern Recognit. 137, 109315 (2023)
Article MATH Google Scholar
Esser, P., Sutter, E., Ommer, B.: A variational u-net for conditional appearance and shape generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8857–8866 (2018)
Tang, H., Bai, S., Zhang, L., Torr, P.H., Sebe, N.: Xinggan for person image generation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 717–734. Springer (2020)
Zhang, J., Li, K., Lai, Y.-K., Yang, J.: Pise: person image synthesis and editing with decoupled gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7982–7990 (2021)
Zhang, P., Yang, L., Lai, J.-H., Xie, X.: Exploring dual-task correlation for pose guided person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7713–7722 (2022)
Bako, S., Vogels, T., McWilliams, B., Meyer, M., Novák, J., Harvill, A., Sen, P., Derose, T., Rousselle, F.: Kernel-predicting convolutional networks for denoising monte carlo renderings. ACM Trans. Graph. 36(4), 97–1 (2017)
Article Google Scholar
Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, pp. 2554–2563. PMLR (2017)
Mildenhall, B., Barron, J.T., Chen, J., Sharlet, D., Ng, R., Carroll, R.: Burst denoising with kernel prediction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2502–2510 (2018)
Diba, A., Sharma, V., Gool, L.V., Stiefelhagen, R.: Dynamonet: Dynamic action and motion network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6192–6201 (2019)
Ma, N., Zhang, X., Huang, J., Sun, J.: Weightnet: Revisiting the design space of weight networks. In: European Conference on Computer Vision, pp. 776–792. Springer (2020)
Lin, X., Ma, L., Liu, W., Chang, S.-F.: Context-gated convolution. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 701–718 (2020). Springer
Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947 (2022)
Cui, Y., Tao, Y., Bing, Z., Ren, W., Gao, X., Cao, X., Huang, K., Knoll, A.: Selective frequency network for image restoration. In: The 11th International Conference on Learning Representations (2023)
Cui, Y., Ren, W., Knoll, A.: Omni-kernel network for image restoration. Proc. AAAI Conf. Artif. Intell. 38, 1426–1434 (2024)
MATH Google Scholar
Cui, Y., Ren, W., Cao, X., Knoll, A.: Revitalizing convolutional network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. (2024)
Jiang, L., Dai, B., Wu, W., Loy, C.C.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13919–13929 (2021)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article MATH Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Adv. Neural Info. Process. Syst. 29 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Info. Process. Syst. 30 (2017)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Men, Y., Mao, Y., Jiang, Y., Ma, W.-Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5084–5093 (2020)

Download references

Funding

This work was supported by the National Natural Science Foundation of China (61772179), Hunan Provincial Natural Science Foundation of China (2022JJ50016,2023JJ50095), and the Science and Technology Plan Project of Hunan Province (2016TP1020). Scientific Research Fund of Hunan Provincial Education Department (21B0649), Double First-Class University Project of Hunan Province (Xiangjiaotong [2018]469,[2020]248), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20221285).

Author information

Xiaoman Liang, Mugang Lin, Yuan Qin and Huihuang Zhao have contributed equally to this work.

Authors and Affiliations

College of Computer Science and Technology, Hengyang Normal University, Hengyang, 421002, China
Yue Chen, Xiaoman Liang, Mugang Lin, Yuan Qin & Huihuang Zhao
Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, 421002, China
Xiaoman Liang, Mugang Lin & Huihuang Zhao

Authors

Yue Chen
View author publications
Search author on:PubMed Google Scholar
Xiaoman Liang
View author publications
Search author on:PubMed Google Scholar
Mugang Lin
View author publications
Search author on:PubMed Google Scholar
Yuan Qin
View author publications
Search author on:PubMed Google Scholar
Huihuang Zhao
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Yue Chen, Xiaoman Liang, Mugang Lin, Yuan Qin and Huihuang Zhao. The first draft of the manuscript was written by Yue Chen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaoman Liang.

Ethics declarations

Conflict of interest

All the authors declare that they have no competing financial interests or personal relationships that could influence the work reported in this paper.

Ethics approval statement

This study did not involve any humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Y., Liang, X., Lin, M. et al. CDPT: context-driven omni-dimensional dynamic pose transfer network. SIViP 19, 376 (2025). https://doi.org/10.1007/s11760-025-03969-0

Download citation

Received: 09 January 2025
Revised: 14 February 2025
Accepted: 15 February 2025
Published: 07 March 2025
DOI: https://doi.org/10.1007/s11760-025-03969-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CDPT: context-driven omni-dimensional dynamic pose transfer network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pose Guided Human Image Synthesis by View Disentanglement and Enhanced Weighting Loss

Texture-driven pose-guided human image synthesis

Dense Pose Transfer

Explore related subjects

Data availability statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now