Skip to main content
Log in

GAN-Poser: an improvised bidirectional GAN model for human motion prediction

  • S.I.: Deep Learning Approaches for RealTime Image Super Resolution (DLRSR)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A novel method called GAN-Poser has been explored to predict human motion in less time given an input 3D human skeleton sequence based on a generator–discriminator framework. Specifically, rather than using the conventional Euclidean loss, a frame-wise geodesic loss is used for geometrically meaningful and more precise distance measurement. In this paper, we have used a bidirectional GAN framework along with a recursive prediction strategy to avoid mode-collapse and to further regularize the training. To be able to generate multiple probable human-pose sequences conditioned on a given starting sequence, a random extrinsic factor \(\varTheta\) has also been introduced. The discriminator is trained in order to regress the extrinsic factor \(\varTheta\), which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. In spite of being in a probabilistic framework, the modified discriminator architecture allows predictions of an intermediate part of pose sequence to be used as conditioning for prediction of the latter part of the sequence. This adversarial learning-based model takes into consideration of the stochasticity, and the bidirectional setup provides a new direction to evaluate the prediction quality against a given test sequence. Our resulting novel method, GAN-Poser, achieves superior performance over the state-of-the-art deep learning approaches when evaluated on the standard NTU-RGB-D and Human3.6 M dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Shamsolmoali P, Zareapoor M, Zhou H, Yang J (2020) AMIL: Adversarial Multi-instance Learning for Human Pose Estimation. ACM Trans Multimedia Comput Commun Appl (TOMM) 16(1s):1–23

    Article  Google Scholar 

  2. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. CoRR arXiv:1701.07875

  3. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Human behavior understanding—2nd international workshop, HBU 2011, Amsterdam, The Netherlands, 16, 2011. Proceedings, pp 29–39

  4. Bütepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. CoRR arXiv:1702.07486

  5. Chen B, Wang W, Wang J, Chen X (2017) Video imagination from a single image with transformation generation. CoRR arXiv:1706.04124

  6. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR arXiv:1412.3555

  7. Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 2625–2634

  8. Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 4346–4354

  9. Graves A (2013) Generating sequences with recurrent neural networks. CoRR arXiv:1308.0850

  10. Pöhlmann STL, Harkness EF, Taylor CJ, Astley SM (2016) Evaluation of Kinect 3D sensor for healthcare imaging. J Med Biol Eng 36:857–870

    Article  Google Scholar 

  11. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  12. Butepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. CoRR arXiv:1702.07486

  13. Ionescu C, Papava D, Olar V, Sminchisescu C (2014) Human3.6 m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Article  Google Scholar 

  14. Jain A, Zamir AR, Savarese S, Saxena A (2016) Structuralrnn: deep learning on spatio-temporal graphs. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 5308–5317

  15. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680

    Google Scholar 

  16. Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS

  17. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV

  18. Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: NIPS

  19. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: ICML

  20. Shamsolmoali P, Zareapoor M, Wang R, Jain DK, Yang J (2019) G-GANISR: gradual generative adversarial network for image super resolution. Neurocomputing 366:140–153

    Article  Google Scholar 

  21. Zareapoor M, Zhou H, Yang J (2019) Perceptual image quality using dual generative adversarial network. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04239-0

    Article  Google Scholar 

  22. Ng JY, Hausknecht M, Vijayanarasimhan S, Oriol Vinyals RM, Toderici G (2016) Beyond short snippets: deep networks for video classification. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR, pp 4594–4602

  23. Zhou X, Zhu M, Leonardos S, Daniilidis K (2017) Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans Pattern Anal Mach Intell 39(8):1648–1661

    Article  Google Scholar 

  24. Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: CVPR

  25. Ionescu C, Li F, Sminchisescu C (2011) Latent structured models for human pose estimation. In: International conference on computer vision

  26. Bouhlel N, Dziri A (2019) Kullback–Leibler divergence between multivariate generalized gaussian distributions. IEEE Signal Process Lett 26(7):1021–1025

    Article  Google Scholar 

  27. Daskalakis C, Papadimitriou CH (July 2009) On a network generalization of the minmax theorem. In: International colloquium on automata, languages, and programming. Springer, Berlin, pp 423–434

  28. Zhang Z, Liu S, Li M, Zhou M, Chen E (Oct 2018) Bidirectional generative adversarial networks for neural machine translation. In: Proceedings of the 22nd conference on computational natural language learning, pp 190–199

  29. Berglund M, Raiko T, Honkala M, Kärkkäinen L, Vetek A, Karhunen JT (2015) Bidirectional recurrent neural networks as generative models. In: Advances in neural information processing systems, pp 856–864

  30. Jaiswal A, AbdAlmageed W, Wu Y, Natarajan P (Dec 2018) Bidirectional conditional generative adversarial networks. In: Asian conference on computer vision. Springer, Cham, pp 216–232

  31. Moore JB, Weiss H (1979) Recursive prediction error methods for adaptive estimation. IEEE Trans Syst Man Cybern 9(4):197–205

    Article  MathSciNet  Google Scholar 

  32. Wigren T (2004) Recursive prediction error identification of nonlinear state space models. Technical Reports from the Department of Information Technology, 4

  33. Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127

  34. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  35. Ollivier Y (2015) Riemannian metrics for neural networks I: feedforward networks. Inf Inference J IMA 4(2):108–153

    Article  MathSciNet  Google Scholar 

  36. Shahroudy A, Liu J, Ng T-T, Wang G (June 2016) Ntu rgb + d: a large scale dataset for 3D human activity analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  37. Tang Y, Ma L, Liu W, Zheng W (2018) Long-term human motion prediction by modeling motion context and enhancing motion dynamic. Preprint arXiv:1805.02513

  38. Barsoum E, Kender J, Liu Z (2018) HP-GAN: probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1418–1427

  39. Kundu JN, Gor M, Babu RV (2019, July) Bihmp-gan: bidirectional 3D human motion prediction Gan. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8553–8560

  40. Wandt B, Rosenhahn B (2019) RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7782–7791

  41. Bitzer S, Kiebel SJ (2012) Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks. Biol Cybern 106(4–5):201–217

    Article  MathSciNet  Google Scholar 

  42. Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3D body poses from motion compensated sequences. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 991–1000

  43. Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (June 2016) Sparseness meets deepness: 3D human pose estimation from monocular video. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  44. Du Y, Wong Y, Liu Y, Han F, Gui Y, Wang Z, Kankanhalli M, Geng W (2016) Marker-less 3D human motion capture with monocular image sequence and height-maps. In: European conference on computer vision, pp 20–36. Springer, Berlin

  45. Park S, Hwang J, Kwak N (2016) 3D human pose estimation using convolutional neural networks with 2D pose information. In: Computer vision—ECCV 2016 workshops—Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, Part III, pp 156–169

  46. Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3D human pose estimation. In: ICCV

  47. Akhter I, Black MJ (June 2015) Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1446–1455

  48. Ramakrishna V, Kanade T, Sheikh YA (Oct 2012) Reconstructing 3D human pose from 2D image landmarks. In European conference on computer vision (ECCV)

  49. Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black J (Oct 2016) Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Computer vision—ECCV 2016, lecture notes in computer science. Springer, London

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Kumar Jain.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, D.K., Zareapoor, M., Jain, R. et al. GAN-Poser: an improvised bidirectional GAN model for human motion prediction. Neural Comput & Applic 32, 14579–14591 (2020). https://doi.org/10.1007/s00521-020-04941-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04941-4

Keywords

Navigation