Skip to main content

Advertisement

Log in

Survey on classifying human actions through visual sensors

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Antonakaki P, Kosmopoulos D, Perantonis SJ (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process J 89: 1723–1738

    Article  MATH  Google Scholar 

  • Babu RV, Anantharaman B, Ramakrishnan KR, Srinivasan SH (2002) Compressed domain action classification using HMM. Pattern Recognit Lett 23(10): 1203–1213. doi:10.1016/S01167.8655(02)00067-3

    Article  MATH  Google Scholar 

  • Batra D, Chen TH, Sukthankar R (2008) Space-time shapelets for action recognition. In: Proceedings from IEEE workshop on motion and video computing, pp 1–6. doi:10.1109/WMVC.2008.4544051

  • Baum L (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3: 1–8

    Google Scholar 

  • Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Vis (PAMI) 24(8): 1091–1104. doi:10.1109/TPAMI.2002.1023805

    Article  Google Scholar 

  • Bilmes J (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical report TR-97-021, University of Berkeley

  • Blackburn J, Ribeiro E (2007) Human motion recognition using isomap and dynamic time warping. Lect Notes Pattern Recognit 4814:285–298, Springer, Berlin. doi:10.1007/978.3.540.75703.0

    Google Scholar 

  • Bouchaffra D, Tan J (2007) Structural hidden markov models based on stochastic context-free grammars. Control Intell Syst 35(3): 211–216

    MATH  Google Scholar 

  • Brand M, Oliver N, Pentland A (1997) Coupled hidden markov models for complex action recognition. In: Proceedings from computer vision and pattern recognition conference (CVPR), pp 994–999

  • Bui H, Phung D, Venkatesh S (2004) Hierarchical hidden markov models with general state hierarchy. In: Proceedings of the nineteenth national conference of artificial ntelligence, pp 324–329

  • Campbell L, Becker D, Azarbayejani A (1996) Invariant features for 3-D Jester recognition. In: Proceedings from IEEE automatic face and gesture recognition (AFGR), pp 157–162

  • Chakraborty B, Rudovic O, Gonzalez J (2008) View invariant human body detection with extension to human action recognition using component-wise HMM of body parts. Lect Notes Comput Sci 5098: 208–217. doi:10.1007/978-3-540-70517-8_20

    Article  Google Scholar 

  • Chomat O, Crowley JL (2000) A probabilistic sensor for the perception of activities. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 314–319

  • Colombo C, Comanducci D, Bimbo A (2007) Compact representation and probabilistic classification of human actions in videos. In: Proceedings from IEEE conference on advanced video and signal based surveillance, pp 342–346. doi:10.1109/AVSS.2007.4425334

  • Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 9(2): 257–267

    Article  Google Scholar 

  • DARPA mind’s eye broad agency announcement (2010) DARPA-BAA-10-53, (http://www.darpa.mil/tcto/docs/DARPA_ME_BAA-10-53_Mod1.pdf)

  • Del Rose M, Stein J (2006) Survivability on the ART robotic vehicle. In: Proceedings from the seventeenth ground vehicle survivability symposium

  • Del Rose M, Wagner C, Frederick P (2011) Evidence feed forward hidden markov model: a new type of hidden markov model. Int J Artif Intell Appl 2(1): 1–19.

    Google Scholar 

  • Dimitrijevic M, Lepetit V, Fua P (2006) Human body pose detection using Bayesian spatio-temporal templates. Comput Vis Image Underst 104(2): 127–139

    Article  Google Scholar 

  • Du Y, Chen F, Xu W (2007) Human interaction representation and recognition through motion decomposition. IEEE Signal Process Lett 14(12): 952–955

    Article  Google Scholar 

  • Fin S, Singer Y, Tishby N (1998) The hierarchical hidden markov model: analysis and application. Mach Learn 32: 41–62

    Article  Google Scholar 

  • Fisher Iris data set website (http://archive.ics.uci.edu/ml/datasets/Iris)

  • Galata A, Johnson N, Hogg D (2001) Learning variable length markov models of behaviour. Comput Vis Image Underst 81: 398–413

    Article  MATH  Google Scholar 

  • Gao J, Collins RT, Hauptmann AG, Wactlar HD (2004) Articulated motion modeling for activity analysis. In: Proceedings from international conference on image and video retrieval, pp 1–19

  • Gao X, Yang Y, Tao D, Li X (2009) Discriminative optical flow tensor for video semantic analysis. Comput Image Underst 113(3): 372–383

    Article  Google Scholar 

  • Gehrig D, Schulz T (2008) Selecting relevant features for human motion recognition. In: Proceedings from international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4761290

  • Ghayoori A, Hendessi F, Sheikh A (2006) Application of smooth ergodic hidden markov model in text to speech systems. Int J Signal Process 2(3): 151–157

    Google Scholar 

  • Gong S, Xiang T (2003) Recognition of roup activity using dynamic probabilistic networks. In: Proceedings from international conference in computer vision, pp 742–749

  • Han L, Liange W, Wu XX, Jia YD (2008) Human action recognition using discriminative models in the learned hierarchical manifold space. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 1–6. doi:10.1109/AFGR.2008.4813416

  • Hassan R, Nath B (2005) Stock market forecasting sing hidden markov model: a new approach. In: Proceedings of the fifth international conference on intelligent systems design and application

  • Hassan R, Nath B, Kirley M (2006) A data clustering algorithm based on single hidden markov model. In: Proceedings of the international multi-conference on computer science and information technology, pp 57–66

  • Herrera A, Beck A, Bell D, Miller P, Wu Q, Yan W (2008) Behaviour analysis and prediction in image sequences using rough sets. In: Proceedings from international machine vision and image processing conference, pp 71–76. doi:10.1109/IMVIP.2008.24

  • Herzog DL, Kruger V (2009) Recognition and synthesis of human movements by parametric HMMs. Lect Notes Comput Sci 5064:148–168, Springer, Berlin. doi:10.1007/978-3-642-03061-1_8

    Google Scholar 

  • Herzog D, Kruger V, Grest D (2008) Parametric hidden markov models for recognition on synthesis of movements. In: Proceedings of the British machine vision conference

  • Ikizler N, Cinbis RG, Duygulu P (2008) Human action recognition with line and flow histograms. In: Proceedings from international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4671434

  • Ikizler N, Duygulu P (2007) Human action recognition using distribution of oriented rectangular patches. J Human Motion 271–284

  • Jang WS, Lee WK, Lee IK, Lee J (2008) Enriching a motion database by analogous combination of partial human motion. Visual Comput 24(4):271–280, Springer, Berlin.

    Google Scholar 

  • Jenkins OC, Gonzalez G, Loper M (2006) Dynamic motion vocabularies for kinematic tracking and activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp. 147–156. doi:10.1109/CVPRW.2006.67

  • Kam AH, Ann TK, Lung EH, Yun YW, Wang JX (2004) Automated recognition of highly complex human behavior. In: Proceedings from international conference on pattern recognition, Vol. 4, pp 327–330. doi:10.1109/ICPR.2004.1333769

  • Kawanaka D, Okatani T, Deguchi K (2006) HHMM based recognition of human activity. Inst Electron, Inf Commun Engineers Trans, Oxford J E89-D(7): 2180–2185

    Google Scholar 

  • Kitani KM, Okabe T, Sato Y, Sugimoto A (2007) Recovering the basic structure of human activity from a video-based symbol string. In: Proceedings from IEEE workshop on motion and video computing, pp 1–9. doi:10.1109/WMVC.2007.34

  • Lee H, Kim JH (1999) An HMM based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 961–973

    Article  Google Scholar 

  • Li X, Parizeau M, Plamondon R (2000) Training hidden markov models with multiple observations—a combinatorial method. IEEE Trans Pattern Anal Mach Intell (PAMI) 22(4): 177–371

    Google Scholar 

  • Liu X, Chua CS (2006) Multi-agent activity recognition using observation decomposed hidden markov models. Image Vis Comput 24: 166–175

    Article  MATH  Google Scholar 

  • Liu JG, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: Proceedings from computer vision and image processing conference, pp 461–468. doi:10.1109/CVPRW.2009.5206848

  • Masoud O, Papanikolopoulus NP (2003) A method for human action recognition. Image Vis Comput 21(8): 723–729

    Article  Google Scholar 

  • Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Proceedings from computer vision and pattern recognition, pp 1–8. doi:10.1109/CVPR.2008.4587628

  • Mokhber A, Achard C, Milgram M (2008) Recognition of human behavior by space-time Silhouette Characterization. Pattern Recognit Lett 29: 81–89

    Article  Google Scholar 

  • Morellas V, Pavlidis I, Tsaimyartzis P (2003) DETER: detection of events for threat evaluation and recognition. Mach Vis Appl J 15(1): 29–45

    Article  Google Scholar 

  • Mori T, Segawa Y, Shimosaka M, Sato T (2004) Hierarchical recognition of daily human actions based on continuous hidden markov models. In: Proceedings from IEEE conference on automatic face and gesture recognition, pp 779–784. doi:10.1109/AFGR.2004.1301629

  • Murphy K (2002) Hidden semi-markov models. Technical report, MIT AI Lab

  • Natarajan P, Nevatia R (2007) Coupled hidden semi markov models for activity recognition. In: Proceedings of the IEEE workshop on motion and video computing

  • Ogale A, Karapurkar A, Aloimonos Y (2007) View-invariant modeling and recognition of human actions using grammars. Lect Notes Comput Sci 4358:115–126, Springer, Berlin. doi:10.1007/978-3-540-70932-9_9

  • Oikonomopoulus A, Pantic M, Patras I (2008) B-spline polynomial descriptors for human activity recognition. Computer vision and pattern recognition conference, pp 1–6. doi:10.1109/CVPR.2008.4563175

  • Oikonomopoulos A, Patras I, Pantic M (2006) Kernal-based recognition of human actions using spatiotemporal salient points. In: Proceedings from computer vision and pattern recognition conference, pp 151–161. doi:10.1109/CVPRW.2006.114

  • Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings from IEEE international conference on multimodal inferences (ICMI), pp 3–8

  • Oliver NM, Rosario B, Pentland AP (2000) A bayesian computer vision system for modeling human interaction. IEEE Trans Pattern Anal Mach Intell 22(8): 831–843

    Article  Google Scholar 

  • Parameswaran V, Chellappa R (2006) View invariance for human action recognition. Int J Comput Vis 66(1): 83–101

    Article  Google Scholar 

  • Perez O, Piccardi M, Garcia J, Patricio MA, Molina JM (2007) Comparison between genetic algorithms and the Baum-Welch algorithm in learning HMMs for human activity classification. Lect Notes Comput Sci 4448:399–406, Springer, Berlin

    Google Scholar 

  • Petrushin V (2007) Hidden markov models: fundamentals and application. EETimes online symposium for electrical engineers (OSEE), Oct 2007

  • Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, Vol 7, pp 257–286

  • Rahman M, Nakamura K, Ishikawa S (2002) Recognizing human behavior using universal eigenspace. In: Proceedings from international conference on pattern recognition, pp 295–298. doi:10.1109/ICPR.2002.1044694

  • Robertson N, Reid ID (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104(2): 232–248

    Article  Google Scholar 

  • Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings from computer vision and image processing conference, pp 1–8. doi:10.1109/CVPR.2008.4587727

  • Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings from international conference on pattern recognition, Vol. 3, pp 32–36. doi:10.1109/ICPR.2004.1334462

  • Shah M (2003) Understanding human behavior from motion imagery. Mach Vis Appl 14(4):210–214, Springer, Berlin. doi:10.1007/s00138.0003-0124-3

  • Shi QF, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-Markov Model. In: Proceedings from computer vision and pattern recognition conference, pp 1–8. doi:10.1109/CVPR.2008.4587557

  • Siebel NT, Maybank SJ (2004) The ADVISOR visual surveillance system. In: Proceedings from applications of computer vision, pp 103–111

  • Starner T, Weaver J, Pentland A (1998) Real time American sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell (PAMI) 20: 1371–1375

    Article  Google Scholar 

  • Stern H, Kartoun U, Shmilovici A (2001) A prototype fuzzy system for surveillance picture understanding. In: Proceedings from visual imaging and image processing conference, pp 624–629

  • Thurau C, Hlavac V (2007) n-Grams of action primitives for recognizing human behavior. Lect Notes Comput Sci 4673:93–100, Springer, Berlin

    Google Scholar 

  • Truyen TT, Phung DQ, Venkatesh S, Bui HH (2006) AdaBoost.MRF: boosted Markov random forests and application to multilevel activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp 1686–1693. doi:10.1109/CVPR.2006.49

  • Walter M, Psarrou A, Gong S (2001) Data driven gesture model acquisition using minimum description length. In: Proceedings from British machine vision conference, pp 673–683

  • Wang Y, Huang KQ, Tan TN (2007) Group activity recognition based on ARMA shape sequence modeling. In: Proceedings from international conference on image processing, Vol. 3, pp. 209–212. doi:10.1109/ICIP.2007.4379283

  • Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(10): 1762–1774

    Article  Google Scholar 

  • Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Proceedings from IEEE international workshop on modeling people and human interaction

  • Wilson A, Bobick A (1999) Parametric hidden markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 884–899

    Article  Google Scholar 

  • Xiang T, Gong S (2004) Activity based video content trajectory representation and segmentations. In: Proceedings from British machine vision conference, pp 177–186

  • Xiang T, Gong S (2006) Incremental visual behaviour modelling. In: Proceedings from European conference on computer vision, pp 65–72

  • Yamamoto M, Mitom H, Fujiwara F, Sato T (2006) Bayesian classification of task-oriented actions based on stochastic context free grammar. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 317–322. doi:10.1109/FGR.2006.28

  • Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time sequential images using hidden markov models. In: Proceedings from IEEE computer vision and pattern recognition (CVPR), pp 379–385

  • Yang JY, Wang JS, Chen YP (2008) Using acceleration measurements for activity recognition: an effective learning algorithm for constructing neural classifiers. Pattern Recognit Lett 29: 2213–2220

    Article  Google Scholar 

  • Yu C, Ballard D (2002) Learning to recognize human action sequences. In: Proceedings from international conference on development and earning, pp 28–33

  • Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimedia 8(3): 509–520

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael S. Del Rose.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Del Rose, M.S., Wagner, C.C. Survey on classifying human actions through visual sensors. Artif Intell Rev 37, 301–311 (2012). https://doi.org/10.1007/s10462-011-9232-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-011-9232-z

Keywords

Navigation