Abstract
The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs.
Similar content being viewed by others
References
Antonakaki P, Kosmopoulos D, Perantonis SJ (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process J 89: 1723–1738
Babu RV, Anantharaman B, Ramakrishnan KR, Srinivasan SH (2002) Compressed domain action classification using HMM. Pattern Recognit Lett 23(10): 1203–1213. doi:10.1016/S01167.8655(02)00067-3
Batra D, Chen TH, Sukthankar R (2008) Space-time shapelets for action recognition. In: Proceedings from IEEE workshop on motion and video computing, pp 1–6. doi:10.1109/WMVC.2008.4544051
Baum L (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3: 1–8
Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Vis (PAMI) 24(8): 1091–1104. doi:10.1109/TPAMI.2002.1023805
Bilmes J (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical report TR-97-021, University of Berkeley
Blackburn J, Ribeiro E (2007) Human motion recognition using isomap and dynamic time warping. Lect Notes Pattern Recognit 4814:285–298, Springer, Berlin. doi:10.1007/978.3.540.75703.0
Bouchaffra D, Tan J (2007) Structural hidden markov models based on stochastic context-free grammars. Control Intell Syst 35(3): 211–216
Brand M, Oliver N, Pentland A (1997) Coupled hidden markov models for complex action recognition. In: Proceedings from computer vision and pattern recognition conference (CVPR), pp 994–999
Bui H, Phung D, Venkatesh S (2004) Hierarchical hidden markov models with general state hierarchy. In: Proceedings of the nineteenth national conference of artificial ntelligence, pp 324–329
Campbell L, Becker D, Azarbayejani A (1996) Invariant features for 3-D Jester recognition. In: Proceedings from IEEE automatic face and gesture recognition (AFGR), pp 157–162
Chakraborty B, Rudovic O, Gonzalez J (2008) View invariant human body detection with extension to human action recognition using component-wise HMM of body parts. Lect Notes Comput Sci 5098: 208–217. doi:10.1007/978-3-540-70517-8_20
Chomat O, Crowley JL (2000) A probabilistic sensor for the perception of activities. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 314–319
Colombo C, Comanducci D, Bimbo A (2007) Compact representation and probabilistic classification of human actions in videos. In: Proceedings from IEEE conference on advanced video and signal based surveillance, pp 342–346. doi:10.1109/AVSS.2007.4425334
Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 9(2): 257–267
DARPA mind’s eye broad agency announcement (2010) DARPA-BAA-10-53, (http://www.darpa.mil/tcto/docs/DARPA_ME_BAA-10-53_Mod1.pdf)
Del Rose M, Stein J (2006) Survivability on the ART robotic vehicle. In: Proceedings from the seventeenth ground vehicle survivability symposium
Del Rose M, Wagner C, Frederick P (2011) Evidence feed forward hidden markov model: a new type of hidden markov model. Int J Artif Intell Appl 2(1): 1–19.
Dimitrijevic M, Lepetit V, Fua P (2006) Human body pose detection using Bayesian spatio-temporal templates. Comput Vis Image Underst 104(2): 127–139
Du Y, Chen F, Xu W (2007) Human interaction representation and recognition through motion decomposition. IEEE Signal Process Lett 14(12): 952–955
Fin S, Singer Y, Tishby N (1998) The hierarchical hidden markov model: analysis and application. Mach Learn 32: 41–62
Fisher Iris data set website (http://archive.ics.uci.edu/ml/datasets/Iris)
Galata A, Johnson N, Hogg D (2001) Learning variable length markov models of behaviour. Comput Vis Image Underst 81: 398–413
Gao J, Collins RT, Hauptmann AG, Wactlar HD (2004) Articulated motion modeling for activity analysis. In: Proceedings from international conference on image and video retrieval, pp 1–19
Gao X, Yang Y, Tao D, Li X (2009) Discriminative optical flow tensor for video semantic analysis. Comput Image Underst 113(3): 372–383
Gehrig D, Schulz T (2008) Selecting relevant features for human motion recognition. In: Proceedings from international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4761290
Ghayoori A, Hendessi F, Sheikh A (2006) Application of smooth ergodic hidden markov model in text to speech systems. Int J Signal Process 2(3): 151–157
Gong S, Xiang T (2003) Recognition of roup activity using dynamic probabilistic networks. In: Proceedings from international conference in computer vision, pp 742–749
Han L, Liange W, Wu XX, Jia YD (2008) Human action recognition using discriminative models in the learned hierarchical manifold space. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 1–6. doi:10.1109/AFGR.2008.4813416
Hassan R, Nath B (2005) Stock market forecasting sing hidden markov model: a new approach. In: Proceedings of the fifth international conference on intelligent systems design and application
Hassan R, Nath B, Kirley M (2006) A data clustering algorithm based on single hidden markov model. In: Proceedings of the international multi-conference on computer science and information technology, pp 57–66
Herrera A, Beck A, Bell D, Miller P, Wu Q, Yan W (2008) Behaviour analysis and prediction in image sequences using rough sets. In: Proceedings from international machine vision and image processing conference, pp 71–76. doi:10.1109/IMVIP.2008.24
Herzog DL, Kruger V (2009) Recognition and synthesis of human movements by parametric HMMs. Lect Notes Comput Sci 5064:148–168, Springer, Berlin. doi:10.1007/978-3-642-03061-1_8
Herzog D, Kruger V, Grest D (2008) Parametric hidden markov models for recognition on synthesis of movements. In: Proceedings of the British machine vision conference
Ikizler N, Cinbis RG, Duygulu P (2008) Human action recognition with line and flow histograms. In: Proceedings from international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4671434
Ikizler N, Duygulu P (2007) Human action recognition using distribution of oriented rectangular patches. J Human Motion 271–284
Jang WS, Lee WK, Lee IK, Lee J (2008) Enriching a motion database by analogous combination of partial human motion. Visual Comput 24(4):271–280, Springer, Berlin.
Jenkins OC, Gonzalez G, Loper M (2006) Dynamic motion vocabularies for kinematic tracking and activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp. 147–156. doi:10.1109/CVPRW.2006.67
Kam AH, Ann TK, Lung EH, Yun YW, Wang JX (2004) Automated recognition of highly complex human behavior. In: Proceedings from international conference on pattern recognition, Vol. 4, pp 327–330. doi:10.1109/ICPR.2004.1333769
Kawanaka D, Okatani T, Deguchi K (2006) HHMM based recognition of human activity. Inst Electron, Inf Commun Engineers Trans, Oxford J E89-D(7): 2180–2185
Kitani KM, Okabe T, Sato Y, Sugimoto A (2007) Recovering the basic structure of human activity from a video-based symbol string. In: Proceedings from IEEE workshop on motion and video computing, pp 1–9. doi:10.1109/WMVC.2007.34
Lee H, Kim JH (1999) An HMM based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 961–973
Li X, Parizeau M, Plamondon R (2000) Training hidden markov models with multiple observations—a combinatorial method. IEEE Trans Pattern Anal Mach Intell (PAMI) 22(4): 177–371
Liu X, Chua CS (2006) Multi-agent activity recognition using observation decomposed hidden markov models. Image Vis Comput 24: 166–175
Liu JG, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: Proceedings from computer vision and image processing conference, pp 461–468. doi:10.1109/CVPRW.2009.5206848
Masoud O, Papanikolopoulus NP (2003) A method for human action recognition. Image Vis Comput 21(8): 723–729
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Proceedings from computer vision and pattern recognition, pp 1–8. doi:10.1109/CVPR.2008.4587628
Mokhber A, Achard C, Milgram M (2008) Recognition of human behavior by space-time Silhouette Characterization. Pattern Recognit Lett 29: 81–89
Morellas V, Pavlidis I, Tsaimyartzis P (2003) DETER: detection of events for threat evaluation and recognition. Mach Vis Appl J 15(1): 29–45
Mori T, Segawa Y, Shimosaka M, Sato T (2004) Hierarchical recognition of daily human actions based on continuous hidden markov models. In: Proceedings from IEEE conference on automatic face and gesture recognition, pp 779–784. doi:10.1109/AFGR.2004.1301629
Murphy K (2002) Hidden semi-markov models. Technical report, MIT AI Lab
Natarajan P, Nevatia R (2007) Coupled hidden semi markov models for activity recognition. In: Proceedings of the IEEE workshop on motion and video computing
Ogale A, Karapurkar A, Aloimonos Y (2007) View-invariant modeling and recognition of human actions using grammars. Lect Notes Comput Sci 4358:115–126, Springer, Berlin. doi:10.1007/978-3-540-70932-9_9
Oikonomopoulus A, Pantic M, Patras I (2008) B-spline polynomial descriptors for human activity recognition. Computer vision and pattern recognition conference, pp 1–6. doi:10.1109/CVPR.2008.4563175
Oikonomopoulos A, Patras I, Pantic M (2006) Kernal-based recognition of human actions using spatiotemporal salient points. In: Proceedings from computer vision and pattern recognition conference, pp 151–161. doi:10.1109/CVPRW.2006.114
Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings from IEEE international conference on multimodal inferences (ICMI), pp 3–8
Oliver NM, Rosario B, Pentland AP (2000) A bayesian computer vision system for modeling human interaction. IEEE Trans Pattern Anal Mach Intell 22(8): 831–843
Parameswaran V, Chellappa R (2006) View invariance for human action recognition. Int J Comput Vis 66(1): 83–101
Perez O, Piccardi M, Garcia J, Patricio MA, Molina JM (2007) Comparison between genetic algorithms and the Baum-Welch algorithm in learning HMMs for human activity classification. Lect Notes Comput Sci 4448:399–406, Springer, Berlin
Petrushin V (2007) Hidden markov models: fundamentals and application. EETimes online symposium for electrical engineers (OSEE), Oct 2007
Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, Vol 7, pp 257–286
Rahman M, Nakamura K, Ishikawa S (2002) Recognizing human behavior using universal eigenspace. In: Proceedings from international conference on pattern recognition, pp 295–298. doi:10.1109/ICPR.2002.1044694
Robertson N, Reid ID (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104(2): 232–248
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings from computer vision and image processing conference, pp 1–8. doi:10.1109/CVPR.2008.4587727
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings from international conference on pattern recognition, Vol. 3, pp 32–36. doi:10.1109/ICPR.2004.1334462
Shah M (2003) Understanding human behavior from motion imagery. Mach Vis Appl 14(4):210–214, Springer, Berlin. doi:10.1007/s00138.0003-0124-3
Shi QF, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-Markov Model. In: Proceedings from computer vision and pattern recognition conference, pp 1–8. doi:10.1109/CVPR.2008.4587557
Siebel NT, Maybank SJ (2004) The ADVISOR visual surveillance system. In: Proceedings from applications of computer vision, pp 103–111
Starner T, Weaver J, Pentland A (1998) Real time American sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell (PAMI) 20: 1371–1375
Stern H, Kartoun U, Shmilovici A (2001) A prototype fuzzy system for surveillance picture understanding. In: Proceedings from visual imaging and image processing conference, pp 624–629
Thurau C, Hlavac V (2007) n-Grams of action primitives for recognizing human behavior. Lect Notes Comput Sci 4673:93–100, Springer, Berlin
Truyen TT, Phung DQ, Venkatesh S, Bui HH (2006) AdaBoost.MRF: boosted Markov random forests and application to multilevel activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp 1686–1693. doi:10.1109/CVPR.2006.49
Walter M, Psarrou A, Gong S (2001) Data driven gesture model acquisition using minimum description length. In: Proceedings from British machine vision conference, pp 673–683
Wang Y, Huang KQ, Tan TN (2007) Group activity recognition based on ARMA shape sequence modeling. In: Proceedings from international conference on image processing, Vol. 3, pp. 209–212. doi:10.1109/ICIP.2007.4379283
Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(10): 1762–1774
Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Proceedings from IEEE international workshop on modeling people and human interaction
Wilson A, Bobick A (1999) Parametric hidden markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 884–899
Xiang T, Gong S (2004) Activity based video content trajectory representation and segmentations. In: Proceedings from British machine vision conference, pp 177–186
Xiang T, Gong S (2006) Incremental visual behaviour modelling. In: Proceedings from European conference on computer vision, pp 65–72
Yamamoto M, Mitom H, Fujiwara F, Sato T (2006) Bayesian classification of task-oriented actions based on stochastic context free grammar. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 317–322. doi:10.1109/FGR.2006.28
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time sequential images using hidden markov models. In: Proceedings from IEEE computer vision and pattern recognition (CVPR), pp 379–385
Yang JY, Wang JS, Chen YP (2008) Using acceleration measurements for activity recognition: an effective learning algorithm for constructing neural classifiers. Pattern Recognit Lett 29: 2213–2220
Yu C, Ballard D (2002) Learning to recognize human action sequences. In: Proceedings from international conference on development and earning, pp 28–33
Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimedia 8(3): 509–520
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Del Rose, M.S., Wagner, C.C. Survey on classifying human actions through visual sensors. Artif Intell Rev 37, 301–311 (2012). https://doi.org/10.1007/s10462-011-9232-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9232-z