Abstract
We present a novel method for human motion recognition. A video sequence is represented with a sparse set of spatial and spatial-temporal features by extracting static and dynamic interest points. Our model learns a set of poses along with the dynamics of the sequence. Pose models and the model of motion dynamics are represented as a constellation of static and dynamic parts, respectively. On top of the layer of individual models we build a higher level model that can be described as “constellation of constellation models”. This model encodes the spatial-temporal relationships between the dynamics of the motion and the appearance of individual poses. We test the model on a publicly available action dataset and demonstrate that our new method performs well on the classification tasks. We also perform additional experiments to show how the classification performance can be improved by increasing the number of pose models in our framework.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics), Secaucus, NJ, USA. Springer, Heidelberg (2006)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Int. Conference on Computer Vision, 1395–1402 (2005)
Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: Conf. on Computer Vision and Pattern Recognition, pp. 462–469 (2005)
Burl, M.C., Weber, M., Perona, P.: A probabilistic approach to object recognition using local photometry and global geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 628–641. Springer, Heidelberg (1998)
Carneiro, G., Lowe, D.: Sparse flexible models of local features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 29–43. Springer, Heidelberg (2006)
Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (October 2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vision 61(1), 55–79 (2005)
Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. Int. J. Comput. Vision 71(3), 273–303 (2007)
Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR 2005. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (June 2005)
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Transactions - Computers 22, 67–92 (1977)
Laptev, I., Lindeberg, T.: Space-time interest points. In: IEEE Int. Conf. on Computer Vision, Nice, France (October 2003)
Leo, M., D’Orazio, T., Gnoni, I., Spagnolo, P., Distante, A.: Complex human activity recognition for monitoring wide outdoor environments. In: ICPR 2004. Proceedings of the Pattern Recognition, 17th International Conference, vol. 4, pp. 913–916. IEEE Computer Society Press, Los Alamitos (2004)
Niebles, J., Wang, H., Wang, H., Fei Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC 2006. British Machine Vision Conference, p. 1249 (2006)
Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA (July 2007)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR 2004. Proceedings of the Pattern Recognition, 17th International Conference, vol. 3, pp. 32–36. IEEE Computer Society Press, Los Alamitos (2004)
Wang, Y., Jiang, H., Drew, M.S., Li, Z.-N., Mori, G.: Unsupervised discovery of action classes. In: CVPR 2006. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1654–1661. IEEE Computer Society Press, Los Alamitos (2006)
Wong, S.-F., Kim, T.-K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA (June 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Filipovych, R., Ribeiro, E. (2007). Combining Models of Pose and Dynamics for Human Motion Recognition. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2007. Lecture Notes in Computer Science, vol 4842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76856-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-76856-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76855-5
Online ISBN: 978-3-540-76856-2
eBook Packages: Computer ScienceComputer Science (R0)