Skip to main content

Human Action Recognition in Still Images

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1568))

Included in the following conference series:

  • 875 Accesses

Abstract

In this research work, we are addressing the problem of action recognition in still images, in which the model focuses on recognizing the person's action from a single image. We are using the dataset published by V. Jacquot, Z. Ying, and G. Kreiman in CVPR 2020. The dataset consists of the 3 action classes: Drinking, Reading, and Sitting. The images are not classified into these 3 classes. Instead, binary image classification is used on each class i.e., whether the person is performing that particular action or not. To classify the images, we started with the Detectron2 Object detection model for detecting the person performing the activity and the object related to it (foreground) and then we remove everything else (background) from the image. And then, these images without the background are used for the classification task. The classification is done by using various deep learning models with the help of transfer learning. And as a result, the classification accuracy of HAR in still images increases by 10% on VGG16, 7% on InceptionV3, 1% on Xception, and 4% on the Inception-Resnet model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guo, G., Lai, A.: A survey on still image-based human action recognition. Pattern Recognit. 47(10), 3343–3361 (2014)

    Article  Google Scholar 

  2. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)

    Google Scholar 

  3. Li, L.-J., Li, F.-F.: What, where and who? classifying events by scene and object recognition. In: ICCV, vol. 2, no. 5, p. 6 (2007)

    Google Scholar 

  4. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC 2010 (2010)

    Google Scholar 

  5. Shapovalova, N., Gong, W., Pedersoli, M., Roca, F.X., Gonzàlez, J.: On importance of interactions and context in human action recognition. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) Pattern Recognition and Image Analysis, IbPRIA 2011. LNCS, vol. 6669, pp. 58–66. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21257-4_8

  6. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9–16. IEEE (2010)

    Google Scholar 

  7. Chaudhary, S., Murala, S.: Depth-based end-to-end deep network for human action recognition. IET Comput. Vis. 13(1), 15–22 (2019)

    Article  Google Scholar 

  8. Desai, C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision – ECCV 2012. LNCS, vol. 7575, pp. 158–172. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_12

  9. Thurau, C., Hlavác, V.: Pose primitive based human action recognition in videos or still images. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

    Google Scholar 

  10. Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  11. Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 9–16. IEEE (2010)

    Google Scholar 

  12. Chaudhary, S.: Deep learning approaches to tackle the challenges of human action recognition in videos. Dissertation (2019)

    Google Scholar 

  13. Wang, Y., Jiang, H., Drew, M.S., Li, Z.-N., Mori, G.: Unsupervised discovery of action classes. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2006), vol. 2, pp. 1654–1661. IEEE (2006)

    Google Scholar 

  14. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)

    Google Scholar 

  15. Chaudhary, S., Murala, S.: TSNet: deep network for human action recognition in hazy videos. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3981–3986 (2018). https://doi.org/10.1109/SMC.2018.00675

  16. Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2011)

    Article  Google Scholar 

  17. Li, P., Ma, J.: What is happening in a still picture? In: The First Asian Conference on Pattern Recognition, pp. 32–36. IEEE (2011)

    Google Scholar 

  18. Jacquot, V., Ying, Z., Kreiman, G.: Can deep learning recognize subtle human activities? In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 14232–14241 (2020). https://doi.org/10.1109/CVPR42600.2020.01425

  19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  20. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2015)

    Google Scholar 

  21. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV), 115(3), 211–252 (2015)

    Google Scholar 

  22. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016)

    Google Scholar 

  23. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567 (2015)

    Google Scholar 

  24. Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR abs/1610.02357 (2016)

    Google Scholar 

  25. Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018). https://github.com/facebookresearch/detectron

  26. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031.

  28. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  29. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  30. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  31. Reaper, T.: Automated image background removal with python. tobias.fyi (2020). https://tobias.fyi/blog/remove-bg-python

  32. Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324

  33. Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018). https://doi.org/10.1109/CVPR.2018.00762

  34. Patil, P.W., Dudhane, A., Kulkarni, A., Murala, S., Gonde, A.B., Gupta, S.: An unified recurrent video object segmentation framework for various surveillance environments. IEEE Trans. Image Process. 30, 7889–7902 (2021)

    Google Scholar 

  35. Praful, H., Dudhane, A., Murala, S.: Single image depth estimation using deep adversarial training. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 989–993. IEEE I(2019)

    Google Scholar 

  36. Patil, P.W., Dudhane, A., Chaudhary, S., Murala, S.: Multi-frame based adversarial learning approach for video surveillance. Pattern Recogn. 122, 108350 (2022)

    Article  Google Scholar 

  37. Chaudhary, S., Murala, S.: Deep network for human action recognition using Weber motion. Neurocomputing 367, 207–216 (2019)

    Article  Google Scholar 

  38. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition CVPR, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  39. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  Google Scholar 

  40. Chaudhary, S., Dudhane, A., Patil, P., Murala, S.: Pose guided dynamic image network for human action recognition in person centric videos. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8 (2019). https://doi.org/10.1109/AVSS.2019.8909835

  41. Belongie, S., Mori, G., Malik, J.: Matching with shape contexts. In: Krim, H., Yezzi, A. (eds) Statistics and Analysis of Shapes. Modeling and Simulation in Science, Engineering and Technology, pp. 81–105. Birkhäuser Boston, Boston (2006). https://doi.org/10.1007/0-8176-4481-4_4

  42. Phutke, S.S., Murala, S.: Diverse receptive field based adversarial concurrent encoder network for image inpainting. IEEE Signal Process. Lett. 28, 1873–1877 (2021)

    Google Scholar 

  43. Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2061–2069 (2019)

    Google Scholar 

  44. Akshay, D., Biradar, K.M., Patil, P.W., Hambarde, P., Murala, S.: Varicolored image de-hazing. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4564–4573 (2020)

    Google Scholar 

  45. Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6399–6408 (2019)

    Google Scholar 

  46. Patil, P.W., Biradar, K.M., Dudhane, A., Murala, S.: An end-to-end edge aggregation network for moving object segmentation. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8149–8158 (2020)

    Google Scholar 

  47. Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1483–1498 (2019).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Palak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Palak, Chaudhary, S. (2022). Human Action Recognition in Still Images. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1568. Springer, Cham. https://doi.org/10.1007/978-3-031-11349-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11349-9_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11348-2

  • Online ISBN: 978-3-031-11349-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics