GAN-Poser: an improvised bidirectional GAN model for human motion prediction

Jain, Deepak Kumar; Zareapoor, Masoumeh; Jain, Rachna; Kathuria, Abhishek; Bachhety, Shivam

doi:10.1007/s00521-020-04941-4

GAN-Poser: an improvised bidirectional GAN model for human motion prediction

S.I.: Deep Learning Approaches for RealTime Image Super Resolution (DLRSR)
Published: 29 April 2020

Volume 32, pages 14579–14591, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Deepak Kumar Jain¹,
Masoumeh Zareapoor²,
Rachna Jain³,
Abhishek Kathuria³ &
…
Shivam Bachhety³

984 Accesses
24 Citations
Explore all metrics

Abstract

A novel method called GAN-Poser has been explored to predict human motion in less time given an input 3D human skeleton sequence based on a generator–discriminator framework. Specifically, rather than using the conventional Euclidean loss, a frame-wise geodesic loss is used for geometrically meaningful and more precise distance measurement. In this paper, we have used a bidirectional GAN framework along with a recursive prediction strategy to avoid mode-collapse and to further regularize the training. To be able to generate multiple probable human-pose sequences conditioned on a given starting sequence, a random extrinsic factor \(\varTheta\) has also been introduced. The discriminator is trained in order to regress the extrinsic factor \(\varTheta\), which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. In spite of being in a probabilistic framework, the modified discriminator architecture allows predictions of an intermediate part of pose sequence to be used as conditioning for prediction of the latter part of the sequence. This adversarial learning-based model takes into consideration of the stochasticity, and the bidirectional setup provides a new direction to evaluate the prediction quality against a given test sequence. Our resulting novel method, GAN-Poser, achieves superior performance over the state-of-the-art deep learning approaches when evaluated on the standard NTU-RGB-D and Human3.6 M dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial Geometry-Aware Human Motion Prediction

Pose Conditioned Human Motion Generation Using Generative Adversarial Networks

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

Article 31 August 2023

References

Shamsolmoali P, Zareapoor M, Zhou H, Yang J (2020) AMIL: Adversarial Multi-instance Learning for Human Pose Estimation. ACM Trans Multimedia Comput Commun Appl (TOMM) 16(1s):1–23
Article Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. CoRR arXiv:1701.07875
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Human behavior understanding—2nd international workshop, HBU 2011, Amsterdam, The Netherlands, 16, 2011. Proceedings, pp 29–39
Bütepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. CoRR arXiv:1702.07486
Chen B, Wang W, Wang J, Chen X (2017) Video imagination from a single image with transformation generation. CoRR arXiv:1706.04124
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR arXiv:1412.3555
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 2625–2634
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 4346–4354
Graves A (2013) Generating sequences with recurrent neural networks. CoRR arXiv:1308.0850
Pöhlmann STL, Harkness EF, Taylor CJ, Astley SM (2016) Evaluation of Kinect 3D sensor for healthcare imaging. J Med Biol Eng 36:857–870
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Butepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. CoRR arXiv:1702.07486
Ionescu C, Papava D, Olar V, Sminchisescu C (2014) Human3.6 m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Article Google Scholar
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structuralrnn: deep learning on spatio-temporal graphs. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 5308–5317
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680
Google Scholar
Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: NIPS
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: ICML
Shamsolmoali P, Zareapoor M, Wang R, Jain DK, Yang J (2019) G-GANISR: gradual generative adversarial network for image super resolution. Neurocomputing 366:140–153
Article Google Scholar
Zareapoor M, Zhou H, Yang J (2019) Perceptual image quality using dual generative adversarial network. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04239-0
Article Google Scholar
Ng JY, Hausknecht M, Vijayanarasimhan S, Oriol Vinyals RM, Toderici G (2016) Beyond short snippets: deep networks for video classification. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR, pp 4594–4602
Zhou X, Zhu M, Leonardos S, Daniilidis K (2017) Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans Pattern Anal Mach Intell 39(8):1648–1661
Article Google Scholar
Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: CVPR
Ionescu C, Li F, Sminchisescu C (2011) Latent structured models for human pose estimation. In: International conference on computer vision
Bouhlel N, Dziri A (2019) Kullback–Leibler divergence between multivariate generalized gaussian distributions. IEEE Signal Process Lett 26(7):1021–1025
Article Google Scholar
Daskalakis C, Papadimitriou CH (July 2009) On a network generalization of the minmax theorem. In: International colloquium on automata, languages, and programming. Springer, Berlin, pp 423–434
Zhang Z, Liu S, Li M, Zhou M, Chen E (Oct 2018) Bidirectional generative adversarial networks for neural machine translation. In: Proceedings of the 22nd conference on computational natural language learning, pp 190–199
Berglund M, Raiko T, Honkala M, Kärkkäinen L, Vetek A, Karhunen JT (2015) Bidirectional recurrent neural networks as generative models. In: Advances in neural information processing systems, pp 856–864
Jaiswal A, AbdAlmageed W, Wu Y, Natarajan P (Dec 2018) Bidirectional conditional generative adversarial networks. In: Asian conference on computer vision. Springer, Cham, pp 216–232
Moore JB, Weiss H (1979) Recursive prediction error methods for adaptive estimation. IEEE Trans Syst Man Cybern 9(4):197–205
Article MathSciNet Google Scholar
Wigren T (2004) Recursive prediction error identification of nonlinear state space models. Technical Reports from the Department of Information Technology, 4
Bengio Y (2009) Learning deep architectures for AI. Found Trends^® Mach Learn 2(1):1–127
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Ollivier Y (2015) Riemannian metrics for neural networks I: feedforward networks. Inf Inference J IMA 4(2):108–153
Article MathSciNet Google Scholar
Shahroudy A, Liu J, Ng T-T, Wang G (June 2016) Ntu rgb + d: a large scale dataset for 3D human activity analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Tang Y, Ma L, Liu W, Zheng W (2018) Long-term human motion prediction by modeling motion context and enhancing motion dynamic. Preprint arXiv:1805.02513
Barsoum E, Kender J, Liu Z (2018) HP-GAN: probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1418–1427
Kundu JN, Gor M, Babu RV (2019, July) Bihmp-gan: bidirectional 3D human motion prediction Gan. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8553–8560
Wandt B, Rosenhahn B (2019) RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7782–7791
Bitzer S, Kiebel SJ (2012) Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks. Biol Cybern 106(4–5):201–217
Article MathSciNet Google Scholar
Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3D body poses from motion compensated sequences. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 991–1000
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (June 2016) Sparseness meets deepness: 3D human pose estimation from monocular video. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Du Y, Wong Y, Liu Y, Han F, Gui Y, Wang Z, Kankanhalli M, Geng W (2016) Marker-less 3D human motion capture with monocular image sequence and height-maps. In: European conference on computer vision, pp 20–36. Springer, Berlin
Park S, Hwang J, Kwak N (2016) 3D human pose estimation using convolutional neural networks with 2D pose information. In: Computer vision—ECCV 2016 workshops—Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, Part III, pp 156–169
Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3D human pose estimation. In: ICCV
Akhter I, Black MJ (June 2015) Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1446–1455
Ramakrishna V, Kanade T, Sheikh YA (Oct 2012) Reconstructing 3D human pose from 2D image landmarks. In European conference on computer vision (ECCV)
Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black J (Oct 2016) Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Computer vision—ECCV 2016, lecture notes in computer science. Springer, London

Download references

Author information

Authors and Affiliations

Key Laboratory of Intelligent Air-Ground Cooperative Control for Universities in Chongqing, College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
Deepak Kumar Jain
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Masoumeh Zareapoor
Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, India
Rachna Jain, Abhishek Kathuria & Shivam Bachhety

Authors

Deepak Kumar Jain
View author publications
You can also search for this author in PubMed Google Scholar
Masoumeh Zareapoor
View author publications
You can also search for this author in PubMed Google Scholar
Rachna Jain
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kathuria
View author publications
You can also search for this author in PubMed Google Scholar
Shivam Bachhety
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Kumar Jain.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, D.K., Zareapoor, M., Jain, R. et al. GAN-Poser: an improvised bidirectional GAN model for human motion prediction. Neural Comput & Applic 32, 14579–14591 (2020). https://doi.org/10.1007/s00521-020-04941-4

Download citation

Received: 16 June 2019
Accepted: 08 April 2020
Published: 29 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00521-020-04941-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAN-Poser: an improvised bidirectional GAN model for human motion prediction

Abstract

Access this article

Similar content being viewed by others

Adversarial Geometry-Aware Human Motion Prediction

Pose Conditioned Human Motion Generation Using Generative Adversarial Networks

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GAN-Poser: an improvised bidirectional GAN model for human motion prediction

Abstract

Access this article

Similar content being viewed by others

Adversarial Geometry-Aware Human Motion Prediction

Pose Conditioned Human Motion Generation Using Generative Adversarial Networks

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation