Abstract
Recently, bag of spatio-temporal local features based methods have received significant attention in human action recognition. However, it remains a big challenge to overcome intra-class variations in cases of viewpoint, geometric and illumination variance. In this paper we present Bag of Spatio-temporal Synonym Sets (ST-SynSets) to represent human actions, which can partially bridge the semantic gap between visual appearances and category semantics. Firstly, it re-clusters the original visual words into a higher level ST-SynSet based on the distribution consistency among different action categories using Information Bottleneck clustering method. Secondly, it adaptively learns a distance metric with both the visual and semantic constraints for ST-SynSets projection. Experiments and comparison with state-of-art methods show the effectiveness and robustness of the proposed method for human action recognition, especially in multiple viewpoints and illumination conditions.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1395–1402 (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)
Shechtman, E., Irani, M.: Space-time behavior based correlation. In: IEEE conference on Computer Vision and Pattern Recognition, pp. 405–412 (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp. 65–72 (2005)
Niebles, J.C., Wang, H.C., Li, F.F.: Unsupervised Learning of Human Action Categories using Spatial-Temporal Words. International Journal of Computer Vision 79, 299–318 (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing Human Actions: A Local SVM Approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Savarese, S., DelPozo,Niebles, J.C., Li, F.F.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008)
Boiman, O., Irani, M.: Detecting irregularities in images and in video. International Journal of Computer Vision 74(1), 7–31 (2007)
Liu, J., Shah, M.: Learning Human Actions via Information Maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Winn, J., Criminisi, A., Minka, T.: Object Categorization by Learned Universal Visual Dictionary. In: IEEE International Conference on Computer Vision, pp. 1800–1807 (2005)
Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th ACM SIGIR international conference on research and development in information retrieval, pp. 129–136 (2002)
Lin, J.: Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, vol. 16, pp. 521–528 (2002)
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis Metric from Equivalence Constraints. Journal of Machine Learning Research 6, 937–965 (2005)
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-Theoretic Metric Learning. In: International Conference on Machine Learning, pp. 209–216 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pang, L., Cao, J., Guo, J., Lin, S., Song, Y. (2010). Bag of Spatio-temporal Synonym Sets for Human Action Recognition. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, YP.P. (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-11301-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11300-0
Online ISBN: 978-3-642-11301-7
eBook Packages: Computer ScienceComputer Science (R0)