Skip to main content
Log in

A method for action recognition based on pose and interest points

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, action recognition has become a hot research topic in the image processing area. Some studies have shown that based on supervised learning, spatial-temporal interest points which are extracted from videos demonstrate good performance in human action recognition. In this paper, we define the attributes of human pose, and associate human pose with interest points for human action recognition. We find that interest points can be used as samplers of the particle filter method, and improve the precision of pose estimation. Human pose can be used to detect outliers in interest points, and improve the precision of action recognition. Location and density of interest points associated with human pose can also improve the precision of action recognition. Experiment results on the publicly available “Weizmann”, “KTH” and “UIUC” dataset demonstrate that our method outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Blank M, Irani M, Basri R (2005) Actions as space-time shapes. Proceeding of International Conference on Computer Vision 2:1395–1402

    Google Scholar 

  2. Chakraborty B, Holte MB, Moeslund TB, Gonzle J, Roca FX (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. Proceedings of the IEEE International Conference on Computer Vision 1776–1783. doi:10.1109/ICCV.2011.6126443

  3. Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2:142–149

  4. Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1:886–893

  5. de Freitas N, Doucet A, Gordon N (eds) (2001) An introduction to sequential Monte Carlo methods. In: SMC Practice. Springer Verlag

  6. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal feature. Proceedings of VS-PETS 65–72. doi:10.1109/VSPETS.2005.1570899

  7. Fathi A and Mori G (2008) Action recognition by learning mid-level motion features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, Alaska, USA, 24–26 June 2008

  8. Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1–8. doi:10.1109/CVPR.2008.4587468

  9. Ferrari V, Marin-Jimenez M, Zisserman A (2009) Pose search: Retrieving people using their pose. Proceedings of the IEEE Confbverence on Computer Vision and Pattern Recognition 1–8. doi:10.1109/CVPR.2009.5206495

  10. Foley J, Van Dam A, Feiner S, Hughes J (1990) Computer graphics: Principles and practice. Addison-Wesley, Reading, MA

  11. Hess R and Fern A (2009) Discriminatively trained particle filters for complex multi-object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 240–247. doi:10.1109/CVPR.2009.5206801

  12. Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. IEEE International Conference on Computer Vision 1:166–173

    Google Scholar 

  13. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2046–2053. doi:10.1109/CVPR.2010.5539881

  14. Laptev I (2005) On Space-time interest points. Int J Comput Vis 64(2–3):107–123

    Article  Google Scholar 

  15. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. Proceedings of the IEEE International Conference on Computer Vision 1–8. doi:10.1109/CVPR.2008.4587756

  16. Liu J, Kuipers B, Savarese S (2012) Recognizing human actions by attributes. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 3337–3344. doi:10.1109/CVPR.2011.5995353

  17. Liu J and Shah M (2008) Learning human actions via information maximization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 17–24. doi:10.1109/CVPR.2008.4587723

  18. Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Google Scholar 

  19. Perez P, Hue C, Vermaak J, Gangnet M (2002) Color-based probabilistic tracking. Proceedings of European Conference on Computer Vision 2350:661–675

  20. Tran D and Sorokin A (2010) Human activity recognition with metric learning. Proceedings of European Conference on Computer Vision 5302:548–561

    Google Scholar 

  21. Yang Y and Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-part. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1385–1392. doi:10.1109/CVPR.2011.5995741

  22. Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. Proceedings of European Conference on Computer Vision 5305:817–829

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Ju Zhan.

Appendix

Appendix

Property 1

The number of interest points will increase or remain unchanged, if threshold (t1) decreases.

Proof: based on formulas (1) (3) in Section 3.1, the values of points in an image which are calculated by a series of Gabor filters are independent of t1. When t1 decreases, there is no change of the value in the image. Thus the number of points whose values are larger than t1 increases or remains unchanged. □

Property 2

The sum of the absolute distance between parts of a skeleton and the outliers will increase or remain unchanged, if threshold (t1) decreases.

Proof: Based on Property 1, when t1 decreases, the number of interest points increases or remains unchanged. The number of outliers included in these interest points will also increase or unchanged. Since the number of outliers will not decrease, the sum of the absolute distance will increase or remain unchanged. □

Property 3

The number of type II outliers will increase or remain unchanged, when the number of type I outliers increases.

Proof: The calculation of the interest point is independent of the computation of a skeleton. When the number of type I outliers increases, the number of all outliers also increases. Thus the number of type II outliers will increase or remain unchanged. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, L., Zhan, YJ., Jiang, Q. et al. A method for action recognition based on pose and interest points. Multimed Tools Appl 74, 6091–6109 (2015). https://doi.org/10.1007/s11042-014-1910-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1910-9

Keywords

Navigation