Abstract
In recent years, action recognition has become a hot research topic in the image processing area. Some studies have shown that based on supervised learning, spatial-temporal interest points which are extracted from videos demonstrate good performance in human action recognition. In this paper, we define the attributes of human pose, and associate human pose with interest points for human action recognition. We find that interest points can be used as samplers of the particle filter method, and improve the precision of pose estimation. Human pose can be used to detect outliers in interest points, and improve the precision of action recognition. Location and density of interest points associated with human pose can also improve the precision of action recognition. Experiment results on the publicly available “Weizmann”, “KTH” and “UIUC” dataset demonstrate that our method outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Blank M, Irani M, Basri R (2005) Actions as space-time shapes. Proceeding of International Conference on Computer Vision 2:1395–1402
Chakraborty B, Holte MB, Moeslund TB, Gonzle J, Roca FX (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. Proceedings of the IEEE International Conference on Computer Vision 1776–1783. doi:10.1109/ICCV.2011.6126443
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2:142–149
Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1:886–893
de Freitas N, Doucet A, Gordon N (eds) (2001) An introduction to sequential Monte Carlo methods. In: SMC Practice. Springer Verlag
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal feature. Proceedings of VS-PETS 65–72. doi:10.1109/VSPETS.2005.1570899
Fathi A and Mori G (2008) Action recognition by learning mid-level motion features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, Alaska, USA, 24–26 June 2008
Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1–8. doi:10.1109/CVPR.2008.4587468
Ferrari V, Marin-Jimenez M, Zisserman A (2009) Pose search: Retrieving people using their pose. Proceedings of the IEEE Confbverence on Computer Vision and Pattern Recognition 1–8. doi:10.1109/CVPR.2009.5206495
Foley J, Van Dam A, Feiner S, Hughes J (1990) Computer graphics: Principles and practice. Addison-Wesley, Reading, MA
Hess R and Fern A (2009) Discriminatively trained particle filters for complex multi-object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 240–247. doi:10.1109/CVPR.2009.5206801
Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. IEEE International Conference on Computer Vision 1:166–173
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2046–2053. doi:10.1109/CVPR.2010.5539881
Laptev I (2005) On Space-time interest points. Int J Comput Vis 64(2–3):107–123
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. Proceedings of the IEEE International Conference on Computer Vision 1–8. doi:10.1109/CVPR.2008.4587756
Liu J, Kuipers B, Savarese S (2012) Recognizing human actions by attributes. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 3337–3344. doi:10.1109/CVPR.2011.5995353
Liu J and Shah M (2008) Learning human actions via information maximization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 17–24. doi:10.1109/CVPR.2008.4587723
Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Perez P, Hue C, Vermaak J, Gangnet M (2002) Color-based probabilistic tracking. Proceedings of European Conference on Computer Vision 2350:661–675
Tran D and Sorokin A (2010) Human activity recognition with metric learning. Proceedings of European Conference on Computer Vision 5302:548–561
Yang Y and Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-part. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1385–1392. doi:10.1109/CVPR.2011.5995741
Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. Proceedings of European Conference on Computer Vision 5305:817–829
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Property 1
The number of interest points will increase or remain unchanged, if threshold (t1) decreases.
Proof: based on formulas (1) (3) in Section 3.1, the values of points in an image which are calculated by a series of Gabor filters are independent of t1. When t1 decreases, there is no change of the value in the image. Thus the number of points whose values are larger than t1 increases or remains unchanged. □
Property 2
The sum of the absolute distance between parts of a skeleton and the outliers will increase or remain unchanged, if threshold (t1) decreases.
Proof: Based on Property 1, when t1 decreases, the number of interest points increases or remains unchanged. The number of outliers included in these interest points will also increase or unchanged. Since the number of outliers will not decrease, the sum of the absolute distance will increase or remain unchanged. □
Property 3
The number of type II outliers will increase or remain unchanged, when the number of type I outliers increases.
Proof: The calculation of the interest point is independent of the computation of a skeleton. When the number of type I outliers increases, the number of all outliers also increases. Thus the number of type II outliers will increase or remain unchanged. □
Rights and permissions
About this article
Cite this article
Lu, L., Zhan, YJ., Jiang, Q. et al. A method for action recognition based on pose and interest points. Multimed Tools Appl 74, 6091–6109 (2015). https://doi.org/10.1007/s11042-014-1910-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1910-9