Abstract
This work addresses the problem of human action recognition by introducing a representation of a human action as a collection of short trajectories that are extracted in areas of the scene with significant amount of visual activity. The trajectories are extracted by an auxiliary particle filtering tracking scheme that is initialized at points that are considered salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods in space and time. We implement an online background estimation algorithm in order to deal with inadequate localization of the salient points on the moving parts in the scene, and to improve the overall performance of the particle filter tracking scheme. We use a variant of the Longest Common Subsequence algorithm (LCSS) in order to compare different sets of trajectories corresponding to different actions. We use Relevance Vector Machines (RVM) in order to address the classification problem. We propose new kernels for use by the RVM, which are specifically tailored to the proposed representation of short trajectories. The basis of these kernels is the modified LCSS distance of the previous step. We present results on real image sequences from a small database depicting people performing 12 aerobic exercises.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pantic, M., et al.: Human computing and machine understanding of human behavior: A survey. In: International Conference on Multimodal Interfaces (2006)
Wang, J.J., Singh, S.: Video analysis of human dynamics - A survey. Real Time Imaging 9, 321–346 (2003)
Wang, L., Hu, W., Tan, T.: Recent Developments in Human Motion Analysis. Pattern Recognition 36, 585–601 (2003)
Bar-Shalom, Y., Fortmann, T.: Tracking and Data Association. Academic Press, London (1988)
Julier, S.J., Uhlmann, J.K.: Unscented filtering and nonlinear estimation. Proceedings of the IEEE 92(3), 401–422 (2004)
Wu, Y., et al.: Unscented kalman filtering for additive noise case: Augmented versus nonaugmented. IEEE Signal Processing Letters 12(5), 357–360 (2005)
LaViola, J.: A comparison of unscented and extended Kalman filtering for estimating quaternion motion. In: Proceedings of the American Control Conference, vol. 3, pp. 2435–2440 (2003)
Zhang, Y., Ji, Q.: Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans. Pattern Analysis and Machine Intelligence 27, 699–714 (2005)
Gu, H., Ji, Q.: Information extraction from image sequences of real-world facial expressions. Machine Vision and Applications 16(2), 105–115 (2005)
Isard, M., Blake, A.: Condensation – conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)
Isard, M., Blake, A.: Icondensation: Unifying low-level and high-level tracking in a stochastic framework. European Conference on Computer Vision 29(1), 893–908 (1998)
Lichtenauer, J., Hendriks, M.R.E.: Influence of the observation likelihood function on particle filtering performance in tracking applications. In: Automatic Face and Gesture Recognition, pp. 767–772 (2004)
Chang, C., Ansari, R., Khokhar, A.: Multiple object tracking with kernel particle filter. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 566–573 (2005)
Schmidt, J., Fritsch, J., Kwolek, B.: Kernel particle filter for real-time 3D body tracking in monocular color images. In: Automatic Face and Gesture Recognition, pp. 567–572 (2006)
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-Based Object Tracking. IEEE Trans. Pattern Analysis and Machine Intelligence 25(5), 564–577 (2003)
Yang, C., Duraiswami, R., Davis, L.: Fast multiple object tracking via a hierarchical particle filter. In: Proc. IEEE Int. Conf. Computer Vision, vol. 1, pp. 212–219 (2005)
Shan, C., et al.: Real time hand tracking by combining particle filtering and mean shift. In: Automatic Face and Gesture Recognition, vol. 1, pp. 669–674 (2004)
Pitt, M., Shephard, N.: Filtering via simulation: auxiliary particle filtering. J. American Statistical Association 94, 590 (1999)
Patras, I., Pantic, M.: Tracking deformable motion. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1066–1071 (2005)
Patras, I., Pantic, M.: Particle filtering with factorized likelihoods for tracking facial features. In: Automatic Face and Gesture Recognition, pp. 97–102 (2004)
Matthews, I., Ishikawa, T., Baker, S.: The template update problem. In: Proceedings of the British Machine Vision Conference (2003)
Jepson, A., Fleet, D., El-Maraghi, T.: Robust Online Appearance Models for Visual Tracking. IEEE Trans. Pattern Analysis and Machine Intelligence 25(10), 1296–1311 (2003)
Avidan, S.: Support Vector Tracking. IEEE Trans. Pattern Analysis and Machine Intelligence 26(8), 1064–1072 (2004)
Gavrila, D., Davis, L.: 3-D Model-Based Tracking of Humans in Action: A Multiview Approach. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, pp. 73–80 (1996)
MacCormick, J., Isard, M.: Partitioned Sampling, Articulated Objects and Interface-Quality Hand Tracking. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, pp. 3–19 (2000)
Stenger, B., et al.: Model-Based Hand Tracking Using a Hierarchical Bayesian Filter. IEEE Trans. Pattern Analysis and Machine Intelligence 28(5), 1372–1384 (2006)
Chang, W., Chen, C., Hung, Y.: Appearance-guided particle filtering for articulated hand tracking. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 235–242 (2005)
Sigal, L., et al.: Tracking loose-limbed people. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 421–428 (2004)
Wu, Y., Hua, G., Yu, T.: Tracking articulated body by dynamic Markov network. In: Proc. IEEE Int. Conf. Computer Vision, vol. 2, pp. 1094–1101 (2003)
Han, T., Ning, H., Huang, T.: Efficient Nonparametric Belief Propagation with Application to Articulated Body Tracking. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 214–221 (2006)
Fei, H., Reid, I.: Probabilistic Tracking and Recognition of Non-Rigid Hand Motion. In: Int. Workshop on Analysis and Modeling of Faces and Gestures, pp. 60–67 (2003)
Elgammal, A., et al.: Exemplar-based tracking and recognition of arm gestures. In: Proc. Int. Symposium on Image and Signal Processing and Analysis, vol. 2, pp. 656–661 (2003)
Nickel, K., Seemann, E., Stiefelhagen, R.: 3D-Tracking of Head and Hand for Pointing Gesture Recognition in a Human-Robot Interaction Scenario. In: Automatic Face and Gesture Recognition, pp. 565–570 (2004)
Deutscher, J., et al.: Tracking through Singularities and discontinuities by random sampling. In: Proc. IEEE Int. Conf. Computer Vision, vol. 2, pp. 1144–1149 (1999)
Black, M., Jepson, A.: Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision 26(1), 63–84 (1998)
Kato, M., Chen, Y.W., Xu, G.: Articulated Hand Tracking by PCA-ICA Approach. In: Automatic Face and Gesture Recognition, pp. 329–334 (2006)
Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 126–133 (2000)
Comaniciu, D., Ramesh, V., Meer, P.: Real-Time Tracking of non-rigid objects using mean-shift. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149 (2000)
Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. International Journal of Computer Vision 50(2), 203–226 (2002)
Rao, C., et al.: View-invariant alignment and matching of video sequences. In: Proc. IEEE Int. Conf. Computer Vision, vol. 2, pp. 939–945 (2003)
Gavrila, D.: The Visual Analysis of Human Movement: A Review. Comp. Vision, and Image Understanding 73(1), 82–92 (1999)
Aggarwal, J., Cai, Q.: Human Motion Analysis: A Review. Comp. Vision, and Image Understanding 73(3), 428–440 (1999)
Blank, M., et al.: Actions as space-time shapes. In: Proc. IEEE Int. Conf. Computer Vision, vol. 2, pp. 1395–1402 (2005)
Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 123–130 (2001)
Song, Y., Goncalves, L., Perona, P.: Unsupervised Learning of Human Motion. IEEE Trans. Pattern Analysis and Machine Intelligence 25(7), 814–827 (2003)
Fanti, C., Zelnik-Manor, L., Perona, P.: Hybrid Models for Human Motion Recognition. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1166–1173 (2005)
Feng, X., Perona, P.: Human action recognition by sequence of movelet codewords. In: Proc. Int. Symposium on 3D Data Processing Visualization and Transmission, pp. 105–115 (2002)
Figueroa, P., et al.: Tracking markers for human motion analysis. In: Proc. of IX European Signal Processing Conf., Rhodes, Greece, pp. 941–944 (1998)
Moeslund, T., Nørgaard, L.: A brief overview of hand gestures used in wearable human computer interfaces. Technical Report CVMT 03-02 (2003)
Haralick, R., Shapiro, L.: Computer and Robot Vision II. Addison-Wesley, Reading (1993)
Gilles, S.: Robust Description and Matching of Images. PhD thesis, University of Oxford (1998)
Kadir, T., Brady, M.: Scale saliency: a novel approach to salient feature and scale selection. In: International Conference on Visual Information Engineering, pp. 25–28 (2000)
Oikonomopoulos, A., Patras, I., Pantic, M.: Spatiotemporal Salient Points for Visual Recognition of Human Actions. IEEE Trans. Systems, Man and Cybernetics Part B 36(3), 710–719 (2005)
Stauffer, C.: Adaptive background mixture models for real-time tracking. In: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, pp. 246–252 (1999)
Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. In: Proc. International Conference on Data Engineering, pp. 673–684 (2002)
Buzan, D., Sclaroff, S., Kollios, G.: Extraction and clustering of motion trajectories in video. In: Proceedings, International Conference on Pattern Recognition, vol. 2, pp. 521–524 (2004)
Tipping, M.: The Relevance Vector Machine. In: Advances in Neural Information Processing Systems, pp. 652–658 (1999)
Su, C., et al.: A two-step approach to multiple facial feature tracking: Temporal particle filter and spatial belief propagation. In: Proc. IEEE Int’l Conf. on Automatic Face and Gesture Recognition, pp. 433–438 (2004)
Pantic, M., Patras, I.: Dynamics of Facial Expressions-Recognition of Facial Actions and their Temporal Segments from Face Profile Image Sequences. IEEE Trans. Systems, Man and Cybernetics Part B 36(2), 433–449 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Oikonomopoulos, A., Patras, I., Pantic, M., Paragios, N. (2007). Trajectory-Based Representation of Human Actions. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-72348-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72346-2
Online ISBN: 978-3-540-72348-6
eBook Packages: Computer ScienceComputer Science (R0)