Mining Layered Grammar Rules for Action Recognition

Wang, Liang; Wang, Yizhou; Gao, Wen

doi:10.1007/s11263-010-0393-z

Mining Layered Grammar Rules for Action Recognition

Published: 21 October 2010

Volume 93, pages 162–182, (2011)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Liang Wang^1,2,
Yizhou Wang³ &
Wen Gao³

478 Accesses
15 Citations
Explore all metrics

Abstract

We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance’s parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and “super rules” that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the production rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proc. int’l conf. very large data bases (pp. 487–499).
Google Scholar
Alhammady, H., & Ramamohanarao, K. (2006). Using emerging patterns to construct weighted decision trees. IEEE Transactions on Knowledge and Data Engineering, 18(7), 865–876.
Article Google Scholar
Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic. Journal of Logic and Computation, 4(5), 531–579.
Article MATH MathSciNet Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proc. IEEE conf. computer vision and pattern recognition (Vol. 1, pp. 886–893).
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In Proc. IEEE int’l workshop on PETS (pp. 65–72).
Google Scholar
Dong, G., & Li, J. (2004). Efficient mining of emerging patterns: discovering trends and differences. In Proc. ACM SIGKDD int’l conf. knowledge discovery and data mining (pp. 43–52).
Google Scholar
Dong, G., Zhang, X., Wong, L., & Li, J. (1999). CAEP: classification by aggregating emerging patterns. Discovery Science, 1721, 737–747.
Article Google Scholar
Gilbert, A., Illingworth, J., & Bowden, R. (2008). Scale invariant action recognition using compound features mined from dense spatio-temporal corners. In Proc. European conf. computer vision (pp. 222–233).
Google Scholar
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proc. Alvey vision conference (pp. 147–152).
Google Scholar
Ivanov, Y. A., & Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Article Google Scholar
Joo, S. W., & Chellappa, R. (2006). Recognition of multi-object events using attribute grammars. In Proc. int’l conf. image processing (pp. 2897–2900).
Google Scholar
Ke, Y., Sukthankar, R., & Hebert, M. (2005). Efficient visual event detection using volumetric features. In Proc. int’l conf. computer vision (pp. 166–173).
Google Scholar
Laptev, I., & Lindeberg, T. (2003). Space-time interest points. In Proc. int’l conf. computer vision (pp. 432–439).
Chapter Google Scholar
Laptev, I., Marszalek, M., Schmid, C., & Rozeneld, B. (2008). Learning realistic human actions from movies. In Proc. int’l conf. computer vision and pattern recognition.
Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77, 259–289.
Article Google Scholar
Leonardis, A., Gupta, A., & Bajcsy, R. (1995). Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14, 253–277.
Article Google Scholar
Lin, L., Gong, H., Li, L., & Wang, L. (2009). Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters, 30, 180–186.
Article Google Scholar
Liu, J., & Shah, M. (2008). Learning human actions via information maximization. In Proc. int’l conf. computer vision and pattern recognition.
Google Scholar
Liu, J., Yang, Y., & Shah, M. (2009). Learning semantic visual vocabularies using diffusion distance. In Proc. IEEE int’l conf. computer vision and pattern recognition.
Google Scholar
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Niebles, J. C., Wang, H., & Fei-Fei, L. (2008). Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3), 299–318.
Article Google Scholar
Nowozin, S., Bakir, G., & Tsuda, K. (2007). Discriminative subsequence mining for action recognition. In Proc. int’l conf. computer vision.
Google Scholar
Quack, T., Ferrari, V., Leibe, B., & Gool, L. V. (2007). Efficient mining of frequent and distinctive feature configurations. In Proc. ICCV.
Google Scholar
Quelhas, P., Monay, F., Odobez, J., Perez, D., & Tuytelaars, T. (2007). A thousand words in a scene. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1575–1589.
Article Google Scholar
Rapantzikos, K., Avrithis, Y., & Kollias, S. (2009). Dense saliency-based spatiotemporal feature points for action recognition. In Proc. IEEE int’l conf. computer vision and pattern recognition (pp. 1–8).
Google Scholar
Rodriguez, M. D., Ahmed, J., & Shah, M. (2008). Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In Proc. int’l conf. computer vision and pattern recognition.
Google Scholar
Ryoo, M. S., & Aggarwal, J. K. (2009). Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision, 82, 1–24.
Article Google Scholar
Schindler, K., & Gool, L. (2008). Action snippets: how many frames does human action recognition require? In Proc. IEEE conf. computer vision and pattern recognition.
Google Scholar
Schnitzspan, P., Fritz, M., Roth, S., & Schiele, B. (2009). Discriminative structure learning of hierarchical representations for object detection. In Proc. IEEE conf. computer vision and pattern recognition (pp. 1–8).
Google Scholar
Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: a local SVM approach. In Proc. int’l conf. pattern recognition (pp. 32–36).
Google Scholar
Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In Proc. int’l conf. computer vision and pattern recognition.
Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., & Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In Proc. IEEE conf. computer vision and pattern recognition (pp. 1–8).
Google Scholar
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18, 77–95.
Article Google Scholar
Wang, Y., & Mori, G. (2009). Max-margin hidden conditional random fields for human action recognition. In Proc. IEEE conf. computer vision and pattern recognition.
Google Scholar
Wong, S. F., & Cipolla, R. (2007). Extracting spatiotemporal interest points using global information. In Proc. IEEE int’l conf. computer vision.
Google Scholar
Yao, B., & Zhu, S. (2009). Learning deformable action templates from cluttered videos. In Proc. int’l conf. computer vision.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
Liang Wang
Nat’l Engineering Lab for Video Technology, Peking University, Beijing, China
Liang Wang
Nat’l Engineering Lab for Video Technology and Key Lab. of Machine Perception (MoE), School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Yizhou Wang & Wen Gao

Authors

Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yizhou Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Wang, Y. & Gao, W. Mining Layered Grammar Rules for Action Recognition. Int J Comput Vis 93, 162–182 (2011). https://doi.org/10.1007/s11263-010-0393-z

Download citation

Received: 14 October 2009
Accepted: 24 September 2010
Published: 21 October 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11263-010-0393-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Layered Grammar Rules for Action Recognition

Abstract

Access this article

Similar content being viewed by others

Space-Time Tree Ensemble for Action Recognition and Localization

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

GRUNTS: Graph Representation for UNsupervised Temporal Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Layered Grammar Rules for Action Recognition

Abstract

Access this article

Similar content being viewed by others

Space-Time Tree Ensemble for Action Recognition and Localization

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

GRUNTS: Graph Representation for UNsupervised Temporal Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation