Skip to main content

Interactive Pose Attention Network for Human Pose Transfer

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2021 (WISE 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13081))

Included in the following conference series:

  • 1245 Accesses

Abstract

In this paper, we propose an end-to-end interactive pose attention network (IPAN) to generate the person image in a target pose, where the generator of the network comprises a sequence of interactive pose attention (IPA) blocks to transfer the attended regions regarding to intermedia poses progressively, and retain the texture details of the unattended regions for subsequent pose transfer. More specifically, we design an attention mechanism by interacting with image and pose pathways to transfer the regions of interest based on the human pose, and capture the uninterested regions in the current IPA block against the uncertainty of the intermedia poses. In particular, we devise long-distance residual to inject the low-level features of the person image into the IPA blocks to keep its appearance characteristics. In terms of adversarial training, the generator exploits reconstruction loss, perceptual loss and contextual loss, and the discriminator exploits the adversarial loss. Quantitative and qualitative experiments conducted on the DeepFashion and Market-1501 datasets demonstrate the superior performance of the proposed method (e.g., FID value is reduced from 36.708 to 22.568 and 15.757 to 12.835 on Market-1501 and DeepFashion datasets, respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

    Google Scholar 

  2. Cui, Q., Sun, H., Yang, F.: Learning dynamic relationships for 3d human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6519–6527 (2020)

    Google Scholar 

  3. Goodfellow, I.J., et al.: Generative adversarial networks (2014). arXiv preprint arXiv:1406.2661

  4. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium (2017). arXiv preprint arXiv:1706.08500

  5. Hu, H., Zhang, Z., Xie, Z., Lin, S.: Local relation networks for image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3464–3473 (2019)

    Google Scholar 

  6. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  7. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  8. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)

    Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization 180 (2014). arXiv preprint arXiv:1412.6980

  10. Li, X.: Human-robot interaction based on gesture and movement recognition. Signal Process. Image Commun. 81, 115686 (2020)

    Article  Google Scholar 

  11. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)

    Google Scholar 

  12. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015). arXiv preprint arXiv:1508.04025

  13. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation (2017). arXiv preprint arXiv:1705.09368

  14. Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 99–108 (2018)

    Google Scholar 

  15. Mechrez, R., Talmi, I., Zelnik-Manor, L.: The contextual loss for image transformation with non-aligned data. In: Proceedings of the European conference on computer vision (ECCV), pp. 768–783 (2018)

    Google Scholar 

  16. Men, Y., Mao, Y., Jiang, Y., Ma, W.Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5084–5093 (2020)

    Google Scholar 

  17. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  18. Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable gans for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)

    Google Scholar 

  19. Tang, H., Bai, S., Torr, P.H., Sebe, N.: Bipartite graph reasoning gans for person image generation (2020). arXiv preprint arXiv:2008.04381

  20. Tang, H., Bai, S., Zhang, L., Torr, P.H.S., Sebe, N.: XingGAN for person image generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 717–734. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_43

    Chapter  Google Scholar 

  21. Tang, H., Xu, D., Sebe, N., Wang, Y., Corso, J.J., Yan, Y.: Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2417–2426 (2019)

    Google Scholar 

  22. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization (2016). arXiv preprint arXiv:1607.08022

  23. Vaswani, A., et al.: Attention is all you need (2017). arXiv preprint arXiv:1706.03762

  24. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

  25. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  26. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)

    Google Scholar 

  27. Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 201–216 (2018)

    Google Scholar 

  28. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)

    Google Scholar 

  29. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  30. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)

    Google Scholar 

  31. Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2347–2356 (2019)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62076073), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202 102020524), the Guangdong Innovative Research Team Program (No. 2014ZT05G 157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, D. et al. (2021). Interactive Pose Attention Network for Human Pose Transfer. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91560-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91559-9

  • Online ISBN: 978-3-030-91560-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics