Human Action Recognition in Still Images

Palak; Chaudhary, Sachin

doi:10.1007/978-3-031-11349-9_42

Palak¹⁰ &
Sachin Chaudhary¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1568))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

875 Accesses

Abstract

In this research work, we are addressing the problem of action recognition in still images, in which the model focuses on recognizing the person's action from a single image. We are using the dataset published by V. Jacquot, Z. Ying, and G. Kreiman in CVPR 2020. The dataset consists of the 3 action classes: Drinking, Reading, and Sitting. The images are not classified into these 3 classes. Instead, binary image classification is used on each class i.e., whether the person is performing that particular action or not. To classify the images, we started with the Detectron2 Object detection model for detecting the person performing the activity and the object related to it (foreground) and then we remove everything else (background) from the image. And then, these images without the background are used for the classification task. The classification is done by using various deep learning models with the help of transfer learning. And as a result, the classification accuracy of HAR in still images increases by 10% on VGG16, 7% on InceptionV3, 1% on Xception, and 4% on the Inception-Resnet model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Guo, G., Lai, A.: A survey on still image-based human action recognition. Pattern Recognit. 47(10), 3343–3361 (2014)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Li, L.-J., Li, F.-F.: What, where and who? classifying events by scene and object recognition. In: ICCV, vol. 2, no. 5, p. 6 (2007)
Google Scholar
Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC 2010 (2010)
Google Scholar
Shapovalova, N., Gong, W., Pedersoli, M., Roca, F.X., Gonzàlez, J.: On importance of interactions and context in human action recognition. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) Pattern Recognition and Image Analysis, IbPRIA 2011. LNCS, vol. 6669, pp. 58–66. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21257-4_8
Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9–16. IEEE (2010)
Google Scholar
Chaudhary, S., Murala, S.: Depth-based end-to-end deep network for human action recognition. IET Comput. Vis. 13(1), 15–22 (2019)
Article Google Scholar
Desai, C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision – ECCV 2012. LNCS, vol. 7575, pp. 158–172. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_12
Thurau, C., Hlavác, V.: Pose primitive based human action recognition in videos or still images. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 9–16. IEEE (2010)
Google Scholar
Chaudhary, S.: Deep learning approaches to tackle the challenges of human action recognition in videos. Dissertation (2019)
Google Scholar
Wang, Y., Jiang, H., Drew, M.S., Li, Z.-N., Mori, G.: Unsupervised discovery of action classes. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2006), vol. 2, pp. 1654–1661. IEEE (2006)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Google Scholar
Chaudhary, S., Murala, S.: TSNet: deep network for human action recognition in hazy videos. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3981–3986 (2018). https://doi.org/10.1109/SMC.2018.00675
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2011)
Article Google Scholar
Li, P., Ma, J.: What is happening in a still picture? In: The First Asian Conference on Pattern Recognition, pp. 32–36. IEEE (2011)
Google Scholar
Jacquot, V., Ying, Z., Kreiman, G.: Can deep learning recognize subtle human activities? In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 14232–14241 (2020). https://doi.org/10.1109/CVPR42600.2020.01425
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2015)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV), 115(3), 211–252 (2015)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567 (2015)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR abs/1610.02357 (2016)
Google Scholar
Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018). https://github.com/facebookresearch/detectron
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031.
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Reaper, T.: Automated image background removal with python. tobias.fyi (2020). https://tobias.fyi/blog/remove-bg-python
Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018). https://doi.org/10.1109/CVPR.2018.00762
Patil, P.W., Dudhane, A., Kulkarni, A., Murala, S., Gonde, A.B., Gupta, S.: An unified recurrent video object segmentation framework for various surveillance environments. IEEE Trans. Image Process. 30, 7889–7902 (2021)
Google Scholar
Praful, H., Dudhane, A., Murala, S.: Single image depth estimation using deep adversarial training. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 989–993. IEEE I(2019)
Google Scholar
Patil, P.W., Dudhane, A., Chaudhary, S., Murala, S.: Multi-frame based adversarial learning approach for video surveillance. Pattern Recogn. 122, 108350 (2022)
Article Google Scholar
Chaudhary, S., Murala, S.: Deep network for human action recognition using Weber motion. Neurocomputing 367, 207–216 (2019)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition CVPR, vol. 1, pp. 886–893 (2005)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article Google Scholar
Chaudhary, S., Dudhane, A., Patil, P., Murala, S.: Pose guided dynamic image network for human action recognition in person centric videos. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8 (2019). https://doi.org/10.1109/AVSS.2019.8909835
Belongie, S., Mori, G., Malik, J.: Matching with shape contexts. In: Krim, H., Yezzi, A. (eds) Statistics and Analysis of Shapes. Modeling and Simulation in Science, Engineering and Technology, pp. 81–105. Birkhäuser Boston, Boston (2006). https://doi.org/10.1007/0-8176-4481-4_4
Phutke, S.S., Murala, S.: Diverse receptive field based adversarial concurrent encoder network for image inpainting. IEEE Signal Process. Lett. 28, 1873–1877 (2021)
Google Scholar
Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2061–2069 (2019)
Google Scholar
Akshay, D., Biradar, K.M., Patil, P.W., Hambarde, P., Murala, S.: Varicolored image de-hazing. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4564–4573 (2020)
Google Scholar
Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6399–6408 (2019)
Google Scholar
Patil, P.W., Biradar, K.M., Dudhane, A., Murala, S.: An end-to-end edge aggregation network for moving object segmentation. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8149–8158 (2020)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1483–1498 (2019).
Google Scholar

Download references

Author information

Authors and Affiliations

Punjab Engineering College, Chandigarh, India
Palak & Sachin Chaudhary

Authors

Palak
View author publications
You can also search for this author in PubMed Google Scholar
Sachin Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Palak .

Editor information

Editors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Ropar, Ropar, India
Subrahmanyam Murala
Jadavpur University, Kolkata, India
Ananda Chowdhury
Indian Institute of Technology Ropar, Ropar, India
Abhinav Dhall
Indian Institute of Technology Ropar, Ropar, India
Puneet Goyal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palak, Chaudhary, S. (2022). Human Action Recognition in Still Images. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1568. Springer, Cham. https://doi.org/10.1007/978-3-031-11349-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-11349-9_42
Published: 24 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11348-2
Online ISBN: 978-3-031-11349-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics