Skip to main content

Bag of Spatio-temporal Synonym Sets for Human Action Recognition

  • Conference paper
Advances in Multimedia Modeling (MMM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5916))

Included in the following conference series:

  • 2137 Accesses

Abstract

Recently, bag of spatio-temporal local features based methods have received significant attention in human action recognition. However, it remains a big challenge to overcome intra-class variations in cases of viewpoint, geometric and illumination variance. In this paper we present Bag of Spatio-temporal Synonym Sets (ST-SynSets) to represent human actions, which can partially bridge the semantic gap between visual appearances and category semantics. Firstly, it re-clusters the original visual words into a higher level ST-SynSet based on the distribution consistency among different action categories using Information Bottleneck clustering method. Secondly, it adaptively learns a distance metric with both the visual and semantic constraints for ST-SynSets projection. Experiments and comparison with state-of-art methods show the effectiveness and robustness of the proposed method for human action recognition, especially in multiple viewpoints and illumination conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)

    Article  Google Scholar 

  2. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1395–1402 (2005)

    Google Scholar 

  3. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)

    Google Scholar 

  4. Shechtman, E., Irani, M.: Space-time behavior based correlation. In: IEEE conference on Computer Vision and Pattern Recognition, pp. 405–412 (2005)

    Google Scholar 

  5. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp. 65–72 (2005)

    Google Scholar 

  6. Niebles, J.C., Wang, H.C., Li, F.F.: Unsupervised Learning of Human Action Categories using Spatial-Temporal Words. International Journal of Computer Vision 79, 299–318 (2008)

    Article  Google Scholar 

  7. Schuldt, C., Laptev, I., Caputo, B.: Recognizing Human Actions: A Local SVM Approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  8. Savarese, S., DelPozo,Niebles, J.C., Li, F.F.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008)

    Google Scholar 

  9. Boiman, O., Irani, M.: Detecting irregularities in images and in video. International Journal of Computer Vision 74(1), 7–31 (2007)

    Article  Google Scholar 

  10. Liu, J., Shah, M.: Learning Human Actions via Information Maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

    Google Scholar 

  11. Winn, J., Criminisi, A., Minka, T.: Object Categorization by Learned Universal Visual Dictionary. In: IEEE International Conference on Computer Vision, pp. 1800–1807 (2005)

    Google Scholar 

  12. Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  13. Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th ACM SIGIR international conference on research and development in information retrieval, pp. 129–136 (2002)

    Google Scholar 

  14. Lin, J.: Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)

    Article  MATH  Google Scholar 

  15. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, vol. 16, pp. 521–528 (2002)

    Google Scholar 

  16. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis Metric from Equivalence Constraints. Journal of Machine Learning Research 6, 937–965 (2005)

    MathSciNet  Google Scholar 

  17. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-Theoretic Metric Learning. In: International Conference on Machine Learning, pp. 209–216 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pang, L., Cao, J., Guo, J., Lin, S., Song, Y. (2010). Bag of Spatio-temporal Synonym Sets for Human Action Recognition. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, YP.P. (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11301-7_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11300-0

  • Online ISBN: 978-3-642-11301-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics