Recognizing human group action by layered model with multiple cues
Section snippets
Introduction and related work
Along with the widespread applications of digital media, the amount of miscellaneous video data grows rapidly. Consequently, the demands of analyzing, understanding and fully utilizing these video contents are upsurging and becoming more and more imperative. Human action analysis, as an important and challenging task in video content analysis, has drawn growing attention of worldwide researchers for its great potential and promising applications in industry, entertainment, security and
Layered model for human group action
As introduced in the previous section, the properties of human group action include: (1) group action involves countable but varied participants and complex internal interactions and (2) group action has visible individual movements and detailed patterns at different granularities. Therefore, it is challenging to properly cope with the representations of human group action. To interpret the group action correctly and clearly, we may need to recognize the internal individual actions, the
Feature representation
To better represent the discriminative information based on the proposed layered model, we adopt diverse features with both motion and appearance information in consideration. Motion features are based on the motion trajectories from each level. The primary trajectories of individual participants can be obtained by existing tracking methods as a preprocessing step. To ease the complexity of tracking, action videos can be divided into small fragments with dozens of frames and the related
Experiments
In contrast to the human action recognition evaluation, there are not many publicly available datasets of the group action at present. In this paper, we conduct experiments on two surveillance-style (real scenes and overhead viewpoint) group action datasets to verify the effectiveness of our approach and also illuminate the possibility of related applications in the real world.
For all experiments, we follow the same recognition routine. On the basis of our layered group action model, we firstly
Conclusion
To analyze and recognize the activities of a group of people, we propose a unified framework with a layered model and multiple informative feature representations. Our layered model explicitly represents group actions from three complementary semantic levels. Other than the previous work, we consider both the motion and appearance information to portray characteristics of group action patterns. Gaussian processes are introduced to depict motion trajectories probabilistically and handle the
Acknowledgments
This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: 61025011, 61133003, 61332016, 61003165, 61035001, 61303153 and 61128007. This work was supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057, Faculty Research Awards by NEC Laboratories of America, and 2012 UTSA START-R Research Award respectively.
Zhongwei Cheng received the B.S. degree in Software Engineering from Nankai University, China, in 2008. He is currently a Ph.D. candidate in the School of Computer and Control Engineering, University of Chinese Academy of Sciences. His research interests include computer vision, pattern recognition and machine learning. He has published technical papers in the area of video content understanding, human action recognition and behavior analysis. He is a reviewer for IEEE Transactions on Circuits
References (38)
- et al.
Human behavior analysis in video surveillancea social signal processing perspective
Neurocomputing
(2013) - et al.
Human activity analysisa review
ACM Comput. Surv. (CSUR)
(2011) - C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, in: Proceedings of the 17th...
- M.S. Ryoo, J.K. Aggarwal, UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA),...
- B. Ni, S. Yan, A. Kassim, Recognizing human group activities with localized causalities, in: IEEE Conference on...
- M. Rodriguez, J. Sivic, I. Laptev, J.-Y. Audibert, Data-driven crowd analysis in videos, in: 2011 IEEE International...
- et al.
Unsupervised learning of human action categories using spatial-temporal words
Int. J. Comput. Vis.
(2008) - et al.
Human action recognition in videos using kinematic features and multiple instance learning
IEEE Trans. Pattern Anal. Mach. Intell.
(2010) - M. Ryoo, J. Aggarwal, Spatio-temporal relationship match: video structure comparison for recognition of complex human...
- et al.
Identifying behaviors in crowd scenes using stability analysis for dynamical systems
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
Generative group activity analysis with quaternion descriptor
Adv. Multimed. Model.
Cited by (71)
Cross-scale generative adversarial network for crowd density estimation from images
2020, Engineering Applications of Artificial IntelligenceVisual analysis of socio-cognitive crowd behaviors for surveillance: A survey and categorization of trends and methods
2019, Engineering Applications of Artificial IntelligenceCitation Excerpt :Similar to other works in the literature, this method is able to recognized one behavior at a time. Furthermore, a layered model for description of crowd characteristics at different levels as in Fig. 13 was proposed by Cheng et al. (2014), where each layer is presented with uniform statistical representation. This model aimed at recognizing groups actions at three semantic levels, where both motion and appearance characteristics of group action patterns where taken into consideration.
Perceiving the person and their interactions with the others for social robotics – A review
2019, Pattern Recognition LettersLayered model for convenient designing of safety system upgrades in railways
2018, Safety ScienceModelling of interactions for the recognition of activities in groups of people
2018, Digital Signal Processing: A Review JournalCitation Excerpt :Moreover, we consider combining all four feature sets to be used as the input for the SVM classifier. These results are compared against those of other approaches: localised Causalities [28], Group interaction zone [12], Multiple-layered model [11], Monte Carlo Tree Search [2], and the New Collective activities [13]. The relative localization and shape correspondence features provide better results than the movement inter-dependence among the moving regions in the case of NUS-HGA, while these results are worse in the case of the New Collective database.
Multiview human activity recognition using uniform rotation invariant local binary patterns
2023, Journal of Ambient Intelligence and Humanized Computing
Zhongwei Cheng received the B.S. degree in Software Engineering from Nankai University, China, in 2008. He is currently a Ph.D. candidate in the School of Computer and Control Engineering, University of Chinese Academy of Sciences. His research interests include computer vision, pattern recognition and machine learning. He has published technical papers in the area of video content understanding, human action recognition and behavior analysis. He is a reviewer for IEEE Transactions on Circuits and Systems for Video Technology.
Lei Qin received the B.S. and M.S. degrees in Mathematics from the Dalian University of Technology, Dalian, China, in 1999 and 2002, respectively, and the Ph.D. degree in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2008. He is currently an associate professor with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His research interests include image/video processing, computer vision, and pattern recognition. He has authored or coauthored over 30 technical papers in the area of computer vision. He is a reviewer for IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Transactions on Cybernetics. He has served as TPC member for various conferences, including ICPR, ICME, PSIVT, ICIMCS and PCM.
Qingming Huang (SM׳08) received the B.S. degree in computer science and Ph.D. degree in Computer Engineering from Harbin Institute of Technology, Harbin, China, in 1988 and 1994, respectively. He is currently a Professor with the University of the Chinese Academy of Sciences (CAS), China, and an Adjunct Research Professor with the Institute of Computing Technology, CAS. His research areas include multimedia computing, image processing, computer vision, pattern recognition and machine learning. He has published more than 200 academic papers in prestigious international journals including IEEE Transactions on Multimedia, IEEE Transactions on CSVT, IEEE Transactions on Image Processing, etc., and top-level conferences such as ACM Multimedia, ICCV, CVPR and ECCV. He is the associate editor of Acta Automatica Sinica, and the reviewer of various international journals including IEEE Transactions on Multimedia, IEEE Transactions on CSVT, IEEE Transactions on Image Processing, etc. He has served as program chair, track chair and TPC member for various conferences, including ACM Multimedia, CVPR, ICCV, ICME and PSIVT.
Shuicheng Yan is currently an Associate Professor in the Department of Electrical and Computer Engineering at National University of Singapore, and the founding lead of the Learning and Vision Research Group (http://www.lv-nus.org). His research areas include computer vision, multimedia and machine learning, and he has authored/co-authored over 350 technical papers over a wide range of research topics, with Google Scholar citation times and H-index-44. He is an associate editor of IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) and ACM Transactions on Intelligent Systems and Technology (ACM TIST), and has been serving as the guest editor of the special issues for TMM and CVIU. He received the Best Paper Awards from ACM MM’13 (best paper and best student paper), ACM MM׳12 (demo), PCM׳11, ACM MM׳10, ICME׳10 and ICIMCS׳09, the winner prizes of the classification task in PASCAL VOC 2010–2012, the winner prize of the segmentation task in PASCAL VOC 2012, the honourable mention prize of the detection task in PASCAL VOC׳10, 2010 TCSVT Best Associate Editor (BAE) Award, 2010 Young Faculty Research Award, 2011 Singapore Young Scientist Award, and 2012 NUS Young Researcher Award.
Qi Tian (M׳96-SM׳03) received the B.E. degree in Electronic Engineering from Tsinghua University, China, in 1992, the M.S. degree in Electrical and Computer Engineering from Drexel University in 1996 and the Ph.D. degree in Electrical and Computer Engineering from the University of Illinois, Urbana Champaign in 2002. He is currently a Professor in the Department of Computer Science at the University of Texas at San Antonio (UTSA). He took a one-year faculty leave at Microsoft Research Asia (MSRA) during 2008–2009. His research interests include multimedia information retrieval and computer vision. He has published over 210 refereed journal and conference papers. His research projects were funded by NSF, ARO, DHS, SALSI, CIAS, and UTSA and he also received faculty research awards from Google, NEC Laboratories of America, FXPAL, Akiira Media Systems, and HP Labs. He received the Best Paper Awards in MMM 2013 and ICIMCS 2012, the Top 10% Paper Award in MMSP 2011, the Best Student Paper in ICASSP 2006, and the Best Paper Candidate in PCM 2007. He received 2010 ACM Service Award. He is the Guest Editor of IEEE Transactions on Multimedia, Journal of Computer Vision and Image Understanding, Pattern Recognition Letter, EURASIP Journal on Advances in Signal Processing, Journal of Visual Communication and Image Representation, and is in the Editorial Board of IEEE Transactions on Circuit and Systems for Video Technology (TCSVT), Multimedia Systems Journal, Journal of Multimedia(JMM) and Journal of Machine Visions and Applications (MVA).