ABSTRACT
Current researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up barriers. In this paper, we newly collect a large-scale RGB-D action database for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The database includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our database involves more articipants, more viewpoints and a large number of samples. More importantly, it is the first database containing the entire 360? varying-view sequences. The database provides sufficient data for cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.
- Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing Video with Attention based Bidirectional LSTM. IEEE Transactions on Cybernetics (2018).Google Scholar
- Z. Cai, L. Wang, X. Peng, and Y. Qiao. 2014. Multi-View Super Vector for Action Recognition. In CVPR. Google ScholarDigital Library
- Z. Gao, S. Li, Y. Zhu, C. Wang, and H. Zhang. 2017. Collaborative sparse representation leaning model for RGBD action recognition. Journal of Visual Communication and Image Representation 48 (2017), 442--452. Google ScholarDigital Library
- A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham. 2014. 3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding. In CVPR. Google ScholarDigital Library
- K. Hara, H. Kataoka, Y. Satoh, and Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR.Google Scholar
- J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. 2016. Exemplarbased Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology 26(4) (2016), 647--660.Google Scholar
- J. Hu, W. Zheng, L. Ma, and et al. 2016. Real-time RGB-D Activity Prediction by Soft Regression. In Proc. of ECCV.Google Scholar
- J. Hu, W. S. Zheng, J. Lai, and J. Zhang. 2017. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), 2186--2200.Google ScholarDigital Library
- J. Hu, W. S. Zheng, J. H. Lai, and J. Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR.Google Scholar
- M. Hu, Y. Yang, F. Shen, L. Zhang, H. T. Shen, and X. Li. 2017. Robust Web Image Annotation via Exploring Multi-facet and Structural Knowledge. IEEE Transactions on Image Processing 26, 10 (2017), 4871--4884.Google ScholarCross Ref
- Y. Ji, H. Cheng, Y. Zheng, and H. Li. 2015. Learning Contrastive Feature Distribution Model for Interaction Recognition. Journal of Visual Communication and Image Representation, 33 (Nov. 2015), 340--349. Google ScholarDigital Library
- Y. Ji, Y. Ko, A. Shimada, H. Nagahara, and R. Taniguchi. 2012. Cooking Gesture Recognition using Local Feature and Depth Image. In Proc. of ACMMM in workshop CEA. Google ScholarDigital Library
- Y. Ji, Y. Yang, X. Xu, and H. T. Shen. 2017. One-shot Learning based Pattern Transition Map for Action Early Recognition. Signal Processing (2017). Google ScholarDigital Library
- Y.-G. Jiang, Q. Dai, W. Liu, X. Xue, and C.-W. Ngo. 2015. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling. IEEE Transactions on Image Processing 24, 11 (2015), 3781--3795.Google ScholarDigital Library
- Y. G. Jiang, Z. Wu, J. Wang, X. Xue, and S. F. Chang. 2018. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364. Google ScholarDigital Library
- T. Kim and A. Reiter. 2017. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. In CVPRW.Google Scholar
- M. Wang L. Xie, Q. Tian and B. Zhang. 2014. Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Transactions on Image Processing 23, 5 (2014), 1994--2008.Google ScholarCross Ref
- C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. 2017. Temporal convolutional networks for action segmentation and detection. In CVPR.Google Scholar
- C. Li, Z. Huang, Y. Yang, J. Cao, X. Sun, and H. T. Shen. 2017. Hierarchical Latent Concept Discovery for Video Event Detection. IEEE Transactions on Image Processing 26, 5 (2017), 2149--2162. Google ScholarDigital Library
- C. Li, P. Wang, S. Wang, Y. Hou, and W. Li. 2017. Skeleton-based Action Recognition Using LSTM and CNN. CoRR abs/1707.02356 (2017).Google Scholar
- R. Li and T. Zickler. 2012. Discriminative virtual views for crossview action recognition. In CVPR. Google ScholarDigital Library
- A. Liu, N. Xu, W. Nie, Y. Su, Y. Wong, and M. Kankanhalli. 2017. Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition. IEEE Trans. Cybernetics 47 (2017), 1781--1794.Google ScholarCross Ref
- J. Liu, M. Shah, B. Kuipers, and S. Savarese. 2011. Cross-view action recognition via view knowledge transfer. In CVPR.Google Scholar
- M. Liu, H. Liu, and C. Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362. Google ScholarDigital Library
- H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2014. Histogram of Oriented Principal Components for Cross-View Action Recognition. In ECCV.Google Scholar
- H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2016. Histogram of Oriented Principal Components for Cross-View Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2430--2443. Google ScholarDigital Library
- H. Rahmani and A. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In CVPR.Google Scholar
- H. Rahmani and A. Mian. 2016. 3D Action Recognition from Novel Viewpoints. In CVPR.Google Scholar
- L. Rybok, S. Friedberger, U. D. Hanebeck, and R. Stiefelhagen. 2011. The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems. In 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011).Google Scholar
- A. Shahroudy, J. Liu, T. T. Ng, and G. Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In CVPR.Google Scholar
- Y. Shen, R. Ji, S. Zhang, W. Zuo, Y Wang, and F. Huang. 2018. Generative Adversarial Learning Towards Fast Weakly Supervised Detection. In CVPR.Google Scholar
- J. Wang, Z. Liu, Y. Wu, and J. Yuan. 2014. Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 914-- 927.Google ScholarCross Ref
- J. Wang, X. Nie, Y. Xia, Y. Wu, and S.C. Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. In CVPR. Google ScholarDigital Library
- P. Wei, Y. Zhao, N. Zheng, and S.C. Zhu. 2013. Modeling 4D Human-Object Interactions for Event and Object Recognition. In ICCV. Google ScholarDigital Library
- D. Weinland, E. Boyer, and R. Ronfard. 2007. Action Recognition from Arbitrary Views using 3D Exemplars. In ICCV.Google Scholar
- D. Weinland, R. Ronfard, and E. Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104 (2006), 249--257. Google ScholarDigital Library
- P. Yan, S. M. Khan, and M. Shah. 2008. Learning 4D action feature models for arbitrary view action recognition. In CVPR.Google Scholar
- S. Yan, Y. Xiong, and D. Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google Scholar
- L. Yu, Y. Yang, Z. Huang, P. Wang, J. Song, and H. T. Shen. 2016. Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents. IEEE Transactions on Image Processing 25, 12 (2016), 5689--5701. Google ScholarDigital Library
- Z.Cheng, L. Qin, Y. Ye, Q. Huang, and Q. Tian. 2012. Human Daily Action Analysis with Multi-view and Color-Depth Data. In ECCV. Google ScholarDigital Library
- C. Zhang and W. Zheng. 2017. Semi-supervised Multi-view Discrete Hashing for Fast Image Search. IEEE Transactions on Image Processing 26(6) (2017), 2604--2617. Google ScholarDigital Library
- L. Zhang, Y. Yang, M. Wang, R. Hong, L. Nie, and X. Li. 2016. Detecting Densely Distributed Graph Patterns for Fine-grained Image Categorization. IEEE Transactions on Image Processing 25, 2 (2016), 553--565.Google ScholarCross Ref
- Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu, and C. Shi. 2013. Cross-View Action Recognition via a Continuous Virtual Path. In CVPR. Google ScholarDigital Library
- F. Zhu, L. Shao, and M. Lin. 2013. Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognition Letters 34 (2013), 20--24. Google ScholarDigital Library
Index Terms
- A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition
Recommendations
ReadingAct RGB-D action dataset and human action recognition from local features
New action dataset captured using two Kinect sensors, containing 2340 videos.Applied a spatio-temporal local feature approach to depth action videos.Developed an action recognition framework using a dynamic time alignment approach.Experiment on three ...
Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
MM '15: Proceedings of the 23rd ACM international conference on MultimediaHuman action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the ...
Arbitrary-View Human Action Recognition: A Varying-View RGB-D Action Dataset
Current researches of action recognition which focus on single-view and multi-view recognition can hardly satisfy the requirements of human-robot interaction (HRI) applications for recognizing human actions from arbitrary views. Arbitrary-view recognition ...
Comments