Skip to main content
Log in

An adversarial semi-supervised approach for action recognition from pose information

  • S.I. : Emerging applications of Deep Learning and Spiking ANN
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The collection of video data for action recognition is very susceptible to measurement bias; the equipment used, camera angle and environmental conditions are all factors that majorly affect the distribution of the collected dataset. Inevitably, training a classifier that can successfully generalize to new data becomes a very hard problem, since it is impossible to gather general enough training sets. Recent approaches in the literature attempt to solve this problem by augmenting a given training set, with synthetic data, so as to better represent the global distribution of the covariates. However, these approaches are limited because they essentially involve hand-crafted data synthesizers, which are typically hard to implement and problem specific. In this work, we propose a different approach to tackling the above issues, which relies on the combination of two techniques: pose extraction, and domain adaptation as a means to improve the generalization capabilities of classifiers. We show that adapted skeletal representations can be retrieved automatically in a semi-supervised setting and these help to generalize classifiers to new forms of measurement bias. We empirically validate our approach for generalizing across different camera angles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://developer.microsoft.com/en-us/windows/kinect.

  2. https://www.asus.com/gr/3D-Sensor/Xtion_PRO/.

  3. http://www.numpy.org/.

  4. https://www.scipy.org/.

  5. https://opencv.org/.

References

  1. Aggarwal JK (2005) Human activity recognition: a grand challenge. In: Digital image computing: techniques and applications (DICTA’05). IEEE, p 1

  2. Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Understanding 171:118–139

    Article  Google Scholar 

  3. Berretti S, Daoudi M, Turaga P, Basu A (2018) Representation, analysis, and recognition of 3D humans: a survey. ACM Trans Multim Comput Commun Appl (TOMM) 14(1):1–36

    Google Scholar 

  4. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36

  5. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  6. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR 2011. IEEE, pp 1297–1304

  7. Liu J, Shahroudy A, Perez ML, Wang G, Duan L-Y, Chichung AK (2019) NTU RGB+ D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2916873

    Article  Google Scholar 

  8. Liu C, Hu Y, Li Y, Song S, Liu J (2017) PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. arXiv:1703.07475

  9. Van Dyk DA, Meng X-L (2001) The art of data augmentation. J Comput Graph Stat 10(1):1–50

    Article  MathSciNet  Google Scholar 

  10. Ding J, Chen B, Liu H, Huang M (2016) Convolutional neural network with data augmentation for SAR target recognition. IEEE Geosci Remote Sens Lett 13(3):364–368

    Google Scholar 

  11. Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 601–604

  12. Papadakis A, Mathe E, Vernikos I, Maniatis A, Spyrou E, Mylonas P (2019) Recognizing human actions using 3d skeletal information and CNNs. In: Proceedings of international conference on engineering applications of neural networks (EANN)

  13. Lawton MP, Brody EM (1969) Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontol 9(3 Part 1):179–186

    Article  Google Scholar 

  14. Papadakis A, Mathe E, Spyrou E, Mylonas P (2019) A geometric approach for cross-view human action recognition using deep learning. In: Proceedings of international symposium on image and signal processing and analysis (ISPA)

  15. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  16. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402

  17. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 international conference on computer vision. IEEE, pp 2556–2563

  18. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  19. Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649

  20. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  21. Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 579–583

  22. Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53

    Article  Google Scholar 

  23. Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811

    Article  Google Scholar 

  24. Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628

    Article  Google Scholar 

  25. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362

    Article  Google Scholar 

  26. Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735

    Article  Google Scholar 

  27. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  28. Xu T et al (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137

    Article  Google Scholar 

  29. Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1234–1241. https://doi.org/10.1109/CVPR.2012.6247806

  30. Tas Y, Koniusz P (2018) Cnn-based action recognition and supervised domain adaptation on 3d body skeletons via kernel feature maps. arXiv:1806.09078

  31. Koniusz P, Tas Y, Porikli F (2017) Domain adaptation by mixture of alignments of second- or higher-order scatter tensors. In: CVPR

  32. Zhang J et al (2016) Semi-supervised image-to-video adaptation for video action recognition. IEEE Trans Cybern 47(4):960–973

    Article  Google Scholar 

  33. Hachiya H, Sugiyama M, Ueda N (2012) Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition. Neurocomputing 80:93–101

    Article  Google Scholar 

  34. Jiang W, Yin Z (2015) Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 1307–1310

  35. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  36. Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79(1–2):151–175

    Article  MathSciNet  Google Scholar 

  37. Csurka G (2017) A comprehensive survey on domain adaptation for visual applications. In: Csurka G (ed) Domain adaptation in computer vision applications. Advances in computer vision and pattern recognition. Springer, Cham

    Chapter  Google Scholar 

  38. Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153

    Article  Google Scholar 

  39. Tzeng E et al (2017) Adversarial discriminative domain adaptation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2962–2971

  40. Ajakan H et al (2014) Domain-adversarial neural networks. arXiv:1412.4446

  41. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. NIPS

  42. Chollet F (2015) Keras. https://github.com/fchollet/keras. Accessed 22 April 2020

  43. Abadi M et al (2016) TensorFlow: a system for large-scale maching learning. In: Proceedings of the USENIX symposium on operating systems design and implementation (OSDI)

Download references

Acknowledgements

This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under Grant Agreement No. 273 (Funding Decision: GGET122785/I2/19-07-2018). We also acknowledge support of this work by the Project SYNTELESIS “Innovative Technologies and Applications based on the Internet of Things (IoT) and the Cloud Computing” (MIS 5002521) which is implemented under the “Action for the Strategic Development on the Research and Technological Sector”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phivos Mylonas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pikramenos, G., Mathe, E., Vali, E. et al. An adversarial semi-supervised approach for action recognition from pose information. Neural Comput & Applic 32, 17181–17195 (2020). https://doi.org/10.1007/s00521-020-05162-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05162-5

Keywords

Navigation