ABSTRACT
Multi-view human action recognition (MVHAR) is essential for many applications including smart video surveillance in shopping malls, airports, railway stations and other public places, as well as the ambient assisted living systems. Currently, a lot of methods realized MVHAR by constructing high dimensional features or by complex calculation process, which causes the recognition speed difficult to meet the needs of real-time application system. To address the problem, a cell summary (CS) descriptor is constructed for each frame by dividing each frame into several cells and further dividing each cell into several radial bins and then extracting spatiotemporal features from each radial bin. Then a video is represented by a sequence of dimension-reduced CS descriptors. The CS descriptor is not only discriminative but also low computational cost. A probabilistic classifier is learned for each view of each action category, and then action classification is carried out independently in each view. A probability estimation based decision fusion algorithm is proposed to make a final decision. Experimental results on the two publically available multi-view human action datasets MuHAVi-MAS-14 and IXMAS show that the proposed approach is superior to state-of-the-art methods in recognition accuracy and moreover, the approach is so computationally efficient that it is appropriate for real-time applications.
- Singh S, Velastin S A, Ragheb H. MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods[C]. IEEE International Conference on Advanced Video and Signal Based Surveillance. Boston, MA, USA: IEEE, 2010: 48--55. Google ScholarDigital Library
- Chaaraoui A A, Climent-Pérez P, Flórez-Revuelta F. An efficient approach for multi-view human action recognition based on bag-of-key-poses[C]. International Workshop on Human Behavior Understanding. Vilamoura, Portugal: Springer, 2012: 29--40. Google ScholarDigital Library
- Modarres A F A, Soryani M. A pyramidal layered HMM for multiview human behavior recognition in asynchronous video streams [J]. International Journal of Computer Applications, 2014, 96(7).Google Scholar
- Chaaraoui A A, Flórez-Revuelta F. A low-dimensional radial silhouette-based feature for fast human action recognition fusing multiple views [J]. International scholarly research notices, 2014, 2014:1--11.Google ScholarCross Ref
- Yan P, Khan S M, Shah M. Learning 4D action feature models for arbitrary view action recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition. Alaska, USA: IEEE, 2008: 1--7.Google Scholar
- Holte M B, Chakraborty B, Gonzalez J, et al. A local 3-D motion descriptor for multi-view human action recognition from 4-D spatio-temporal interest points [J]. IEEE Journal of Selected Topics in Signal Processing, 2012, 6(5): 553--565.Google ScholarCross Ref
- Sargano A, Angelov P, Habib Z. Human action recognition from multiple views based on view-invariant feature descriptor using support vector machines[J]. Applied Sciences, 2016, 6(10): 309--322.Google ScholarCross Ref
- Kong Y, Ding Z, Li J, et al. Deeply learned view-invariant features for cross-view action recognition [J]. IEEE Transactions on Image Processing, 2017, 26(6): 3028--3037. Google ScholarDigital Library
- Aryanfar A, Yaakob R, Halin A A, et al. Multi-view human action recognition using wavelet data reduction and multi-class classification [J]. Procedia Computer Science, 2015, 62: 585--592.Google ScholarCross Ref
- Liu C, Li Z, Shi X, et al. Learning a mid-level representation for multiview action recognition [J]. Advances in Multimedia, 2018, 2018: 1--10. Google ScholarDigital Library
- Cilla R, Patricio M A, Berlanga A, et al. A probabilistic, discriminative and distributed system for the recognition of human actions from multiple views [J]. Neurocomputing, 2012, 75(1): 78--87. Google ScholarDigital Library
- Iosifidis A, Tefas A, Pitas I. Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis[J]. Signal Processing, 2013, 93(6): 1445--1457. Google ScholarDigital Library
- Weinland D, Ronfard R, Boyer E. Free viewpoint action recognition using motion history volumes [J]. Computer vision and image understanding, 2006, 104(2--3): 249--257. Google ScholarDigital Library
- Chaaraoui A A, Climent-Pérez P, Flórez-Revuelta F. Silhouette-based human action recognition using sequences of key poses [J]. Pattern Recognition Letters, 2013, 34(15): 1799--1807. Google ScholarDigital Library
- Alcantara M F, Moreira T P, Pedrini H. Real-time action recognition using a multilayer descriptor with variable size [J]. Journal of Electronic Imaging, 2016, 25(1): 013020.Google ScholarCross Ref
- Putra P U, Shima K, Shimatani K. Markerless human activity recognition method based on deep neural network model using multiple cameras[C]. IEEE International Conference on Control, Decision and Information Technologies. Thessaloniki, Greece: IEEE, 2018: 13--18.Google Scholar
- Cheema S, Eweimidwi A, Thurau C, et al. Action recognition by learning discriminative key poses. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops. Barcelona, Spain: IEEE, 2011. 1302--1309.Google ScholarCross Ref
Index Terms
- Multi-view Human Action Recognition by Cell Summary Descriptor and Decision Fusion
Recommendations
A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition
MM '18: Proceedings of the 26th ACM international conference on MultimediaCurrent researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up ...
Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
MM '15: Proceedings of the 23rd ACM international conference on MultimediaHuman action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the ...
Motion keypoint trajectory and covariance descriptor for human action recognition
Human action recognition from videos is a challenging task in computer vision. In recent years, histogram-based descriptors that are calculated along dense trajectories have shown promising results for human action recognition, but they usually ignore ...
Comments