Abstract
Stochastic grammar has been used in many video analysis and event recognition applications as an efficient model to represent large-scale video activity. However, in previous works, due to the limitation on representing parallel temporal relations, traditional stochastic grammar cannot be used to model complex multi-agent activity including parallel temporal relations between sub-activities (such as “during” relation). In this paper, we extend the traditional grammar by introducing Temporal Relation Events (TRE) to solve the problem. The corresponding grammar parser appending complex temporal inference is also proposed. A system that can recognize two hands’ cooperative action in a “telephone calling” activity is built to demonstrate the effectiveness of our methods. In the experiment, a simple method to model the explicit state duration probability distribution in HMM detector is also proposed for accurate primitive events detection.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Moore, D., Essa, I.: Recognizing multitasked activities using stochastic context-free grammar. In: CVPR WM vs. CV 2001, Kauai, Hawaii (2001)
Minnen, D., Essa, I., Starner, T.: Expectation Grammars: Leveraging High-Level Expectations for Activity Recognition. In: Proc. CVPR 2003, Madison, WI, vol. 2, pp. 626–632 (2003)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE TRANS. PAMI 22(8), 852–872 (2000)
Allen, J.F., Ferguson, F.: Actions and events in interval temporal logic. J. Logic and Computation 4,5, 531–579 (1994)
Starner, T., Pentland, A.: Real-Time American Sign Language Recognition From Video Using Hidden Markov Models. Perceptual Computing Section Technical Report No. 375, MIT Media Lab, Perceptual Computing Group (1996)
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE TRANS. PAMI 22(8), 831–843 (2000)
Gong, S., Xiang, T.: Recognition of group activities using dynamic probabilistic networks. In: Proc. ICCV 2003, Nice, France, October 2003, vol. 2, pp. 742–749 (2003)
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proc. CVPR 2000, vol. 2, pp. 142–149 (2000)
Kojima, A., Tamura, T.: Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions. IJCV 50(2), 171–184 (2002)
Hakeem, A., Sheikh, Y., Shah, M.: CASE E: A Hierarchical Event Representation for the analysis of Videos. In: Proc. AAAI 2004, San Jose, pp. 263–268 (2004)
Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical Language based Representation of Events in Video Streams. In: Proc. CVPR 2003 Workshop on Event Mining, Madison, Wisconsin (2003)
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Z., Huang, K., Tan, T. (2006). Complex Activity Representation and Recognition by Extended Stochastic Grammar. In: Narayanan, P.J., Nayar, S.K., Shum, HY. (eds) Computer Vision – ACCV 2006. ACCV 2006. Lecture Notes in Computer Science, vol 3851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11612032_16
Download citation
DOI: https://doi.org/10.1007/11612032_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31219-2
Online ISBN: 978-3-540-32433-1
eBook Packages: Computer ScienceComputer Science (R0)