Skip to main content
Log in

Pose-guided action recognition in static images using lie-group

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Static action recognition in images is challenging, because the image lacks the motion information to characterize the relations between the human and objects. Existing works detect the human with related objects or transfer the motion from videos to images. However the interaction is implicitly depicted. In this paper, we try to solve this problem from a different aspect of view, i.e., to explicitly learn the interactive information from the pose of the human and objects. Humans have different poses in different actions, and the objects in different actions can have different spatial interactions with certain parts of the human. This interaction in poses can be represented by Lie-group naturally. The Lie-group method computes the orientation and distance between the key points or joints, which reveal the relation between humans and objects in different actions. In the experiment, the proposed method shows competitive classification results on several still action image datasets, which advocates the way to recognize still actions by using poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Maji S, Bourdev L, Malik J (2011) Action recognition from a distributed representation of pose and appearance. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3177–3184

  2. Hoai M (2014) Regularized max pooling for image categorization. In: Proceedings of British Machine Vision Conference

  3. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1717–1724

  4. Gupta S, Malik J (2015) Visual semantic role labeling. arXiv:1505.0447

  5. Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with R*CNN. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 1080–1088

  6. Sharma G, Jurie F, Schmid C (2015) Expanded parts model for semantic description of humans in still images. arXiv:1509.04186

  7. Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 2470–2478

  8. Prest A, Schmid C, Ferrari V (2012) Weakly supervised learning of interactions between humans and objects. IEEE Trans Pattern Anal Mach Intell 34(3):601–614

    Article  Google Scholar 

  9. Liu L, Tan R T, You S (2018) Loss guided activation for action recognition in still images. In: Asian Conference on Computer Vision, pp 152–167

  10. Khan F S, van de Weijer J, Anwer R M, Bagdanov A D, Felsberg M, Laaksonen J (2018) Scale coding bag of deep features for human attribute and action recognition. arXiv:1612.04884v2

  11. Yang W, Wang Y, Mori G (2010) Recognizing human actions from still images with latent poses. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 2030–2037

  12. Wang J, Wang G (2016) Hierarchical spatial sum-product networks for action recognition in still images. IEEE Trans Circ Syst Video Technol 28(1):90–100

    Article  Google Scholar 

  13. Gkioxari G, Girshick R, Dollár P, He K (2018) Detecting and recognizing human-object intaractions. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition

  14. Gao R, Xiong B, Grauman K (2018) Im2flow: Motion hallucination from static images for action recognition. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 5937–5947

  15. Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images. In: Proceedings of Advances in Neural Information Processing Systems

  16. Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1159–1168

  17. Procesi C (2007) Lie groups: An approach through invariants and representations. Springer

  18. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, pp 1137–1149

  19. Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1–8

  20. Zhou Y, Ni B, Hong R, Wang M, Tian Q (2015) Interaction part mining: A mid-level approach for fine-grained action recognition. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:3323–3331

  21. Girshick R B, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:580–587

  22. Yan S, Smith J S, Lu W, Zhang B (2018) Multibranch attention networks for action recognition in still images. IEEE Trans Cogn Dev Syst 10(4):1116–1125

    Article  Google Scholar 

  23. Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Kloft M, Shen D, Yin J, Gao W (2020) Multiple kernel k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42 (5):1191–1204

    Google Scholar 

  24. Yu X, Ye X, Gao Q (2020) Infrared handprint image restoration algorithm based on apoptotic mechanism. IEEE Access 8:47334–47343

    Article  Google Scholar 

  25. Zhang L, Song L, Du B, Zhang Y (2021) Nonlocal low-rank tensor completion for visual data. IEEE Trans Cybern 51(2):673–685

    Article  Google Scholar 

  26. He Z, Huang H, Wu Y, Yang X, Zhang W (2021) Consistent scale normalization for object perception. Appl Intell 51:4490–4502

    Article  Google Scholar 

  27. Li Y, Cao G, Yu Q, Li X (2018) Active contours driven by non-local gaussian distribution fitting energy for image segmentation. Appl Intell 48(12):4855–4870

    Article  Google Scholar 

  28. Yang W, Gao Y, Cao L, Yang M, Shi Y (2014) mpadal: a joint local-and-global multi-view feature selection method for activity recognition. Appl Intell 41(3):776–790

    Article  Google Scholar 

  29. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1653–1660

  30. Tompson J, Jain A, Lecun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. Proc Adv Neural Inf Process Syst:1799–1807

  31. Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. Int Conf Comput Vis:1913–1921

  32. Wei S, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. Comput Vis Pattern Recogn:4724–4732

  33. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. Proc Eur Conf Comput Vision:483– 499

  34. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:4733–4742

  35. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. Comput Vis Pattern Recogn:5693–5703

  36. Mohamed W, Ben Hamza A (2016) Deformable 3d shape retrieval using a spectral geometric descriptor. Appl Intell 45(2):213–229

    Article  Google Scholar 

  37. Chéron G, Laptev I, Schmid C (2015) P-CNN: Pose-based CNN features for action recognition. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 3218–3226

  38. Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream cnn features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521

    Article  Google Scholar 

  39. Nie B X, Xiong C, Zhu S (2015) Joint action recognition and pose estimation from video. Proc IEEE Int’l Conf Comput Vis Pattern Recogn:1293–1301

  40. Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. Proc IEEE Int’l Conf Comput Vis Pattern Recog:7024–7033

  41. Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos. Proc IEEE Int’l Conf Comput Vis:3745–3754

  42. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 7103–7112

  43. Moreno-Noguer F (2018) 3d human pose estimation from a single image via distance matrix regression. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 1561–1570

  44. Simo-Serra E, Quattoni A, Torras C, Moreno-Noguer F (2013) A joint model for 2d and 3d pose estimation from a single image. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3634–3641

  45. Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3d human pose from 2d image landmarks. In: Proceedings of European Conf. Computer Vision

  46. Martinez J, Hossain R, Romero J, Little J J (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 2659–2668

  47. Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36 (7):1325–1339

    Article  Google Scholar 

  48. Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 3848–3856

  49. Grabner A, Roth P M, Lepetit V (2018) 3d pose estimation and 3d model retrieval for objects in the wild. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 3022–3031

  50. Tekin B, Sinha S N, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 292–301

  51. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 6517–6525

  52. Lepetit V, Moreno-Noguer F, Fua. P (2009) Epnp: An accurate o(n) solution to the pnp problem. Int J Comput Vis 81(2):155–166

    Article  Google Scholar 

  53. Xu C, Govindarajan L N, Zhang Y, Cheng L (2017) Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis 123(3):454–478

    Article  MathSciNet  Google Scholar 

  54. Wang F, Jiang M, Qian C, Yang S, Li C (2017) Residual attention network for image classification. In: Proceedings of IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp 6450–6458

  55. Everingham M, Gool L V, Williams C, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.htm%l

  56. Yao B, Jiang X, Khosla A, Lin AL, Guibas LJ, Fei-Fei L (2011) Human action recognition by learning bases of action attributes and parts. In: Proceedings of IEEE Int’l Conf. on Computer Vision, pp 1331–1338

  57. Zhang Y, Cheng L, Wu J, Cai J, Do M N, Lu J (2016) Action recognition in still images with minimum annotation efforts. IEEE Trans Image Process 25(11):5479–5490

    Article  MathSciNet  Google Scholar 

  58. Safaei M, Foroosh H (2018) A zero-shot architecture for action recognition in still images. In: Proceedings of Int’l Conf. on Image Processing, pp 460–464

  59. Safaei M, Foroosh H (2019) Still image action recognition by predicting spatial-temporal pixel evolution. In: IEEE Winter Conference on Applications of Computer Vision, pp 111– 120

  60. Li L-J, Su H, Lim Y, Cosgriff R, Goodwin D, Fei-Fei L (2011) Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: Proceedings of Advances in Neural Information Processing Systems

  61. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE Int’l Conf. on Com- puter Vision and Pattern Recognition, pp 3360–3367

Download references

Acknowledgments

This work is supported by National Key R&D Program of China (2018AAA0100100), National Natural Science Foundation of China (61702095), and Natural Science Foundation of Jiangsu Province (BK20190341).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siya Mi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mi, S., Zhang, Y. Pose-guided action recognition in static images using lie-group. Appl Intell 52, 6760–6768 (2022). https://doi.org/10.1007/s10489-021-02760-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02760-1

Keywords

Navigation