skip to main content
10.1145/3240508.3240675acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

Authors Info & Claims
Published:15 October 2018Publication History

ABSTRACT

Current researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up barriers. In this paper, we newly collect a large-scale RGB-D action database for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The database includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our database involves more articipants, more viewpoints and a large number of samples. More importantly, it is the first database containing the entire 360? varying-view sequences. The database provides sufficient data for cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.

References

  1. Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing Video with Attention based Bidirectional LSTM. IEEE Transactions on Cybernetics (2018).Google ScholarGoogle Scholar
  2. Z. Cai, L. Wang, X. Peng, and Y. Qiao. 2014. Multi-View Super Vector for Action Recognition. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Gao, S. Li, Y. Zhu, C. Wang, and H. Zhang. 2017. Collaborative sparse representation leaning model for RGBD action recognition. Journal of Visual Communication and Image Representation 48 (2017), 442--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham. 2014. 3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Hara, H. Kataoka, Y. Satoh, and Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR.Google ScholarGoogle Scholar
  6. J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. 2016. Exemplarbased Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology 26(4) (2016), 647--660.Google ScholarGoogle Scholar
  7. J. Hu, W. Zheng, L. Ma, and et al. 2016. Real-time RGB-D Activity Prediction by Soft Regression. In Proc. of ECCV.Google ScholarGoogle Scholar
  8. J. Hu, W. S. Zheng, J. Lai, and J. Zhang. 2017. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), 2186--2200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Hu, W. S. Zheng, J. H. Lai, and J. Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR.Google ScholarGoogle Scholar
  10. M. Hu, Y. Yang, F. Shen, L. Zhang, H. T. Shen, and X. Li. 2017. Robust Web Image Annotation via Exploring Multi-facet and Structural Knowledge. IEEE Transactions on Image Processing 26, 10 (2017), 4871--4884.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Ji, H. Cheng, Y. Zheng, and H. Li. 2015. Learning Contrastive Feature Distribution Model for Interaction Recognition. Journal of Visual Communication and Image Representation, 33 (Nov. 2015), 340--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Ji, Y. Ko, A. Shimada, H. Nagahara, and R. Taniguchi. 2012. Cooking Gesture Recognition using Local Feature and Depth Image. In Proc. of ACMMM in workshop CEA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Ji, Y. Yang, X. Xu, and H. T. Shen. 2017. One-shot Learning based Pattern Transition Map for Action Early Recognition. Signal Processing (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y.-G. Jiang, Q. Dai, W. Liu, X. Xue, and C.-W. Ngo. 2015. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling. IEEE Transactions on Image Processing 24, 11 (2015), 3781--3795.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. G. Jiang, Z. Wu, J. Wang, X. Xue, and S. F. Chang. 2018. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Kim and A. Reiter. 2017. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. In CVPRW.Google ScholarGoogle Scholar
  17. M. Wang L. Xie, Q. Tian and B. Zhang. 2014. Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Transactions on Image Processing 23, 5 (2014), 1994--2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. 2017. Temporal convolutional networks for action segmentation and detection. In CVPR.Google ScholarGoogle Scholar
  19. C. Li, Z. Huang, Y. Yang, J. Cao, X. Sun, and H. T. Shen. 2017. Hierarchical Latent Concept Discovery for Video Event Detection. IEEE Transactions on Image Processing 26, 5 (2017), 2149--2162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Li, P. Wang, S. Wang, Y. Hou, and W. Li. 2017. Skeleton-based Action Recognition Using LSTM and CNN. CoRR abs/1707.02356 (2017).Google ScholarGoogle Scholar
  21. R. Li and T. Zickler. 2012. Discriminative virtual views for crossview action recognition. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Liu, N. Xu, W. Nie, Y. Su, Y. Wong, and M. Kankanhalli. 2017. Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition. IEEE Trans. Cybernetics 47 (2017), 1781--1794.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Liu, M. Shah, B. Kuipers, and S. Savarese. 2011. Cross-view action recognition via view knowledge transfer. In CVPR.Google ScholarGoogle Scholar
  24. M. Liu, H. Liu, and C. Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2014. Histogram of Oriented Principal Components for Cross-View Action Recognition. In ECCV.Google ScholarGoogle Scholar
  26. H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2016. Histogram of Oriented Principal Components for Cross-View Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2430--2443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Rahmani and A. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In CVPR.Google ScholarGoogle Scholar
  28. H. Rahmani and A. Mian. 2016. 3D Action Recognition from Novel Viewpoints. In CVPR.Google ScholarGoogle Scholar
  29. L. Rybok, S. Friedberger, U. D. Hanebeck, and R. Stiefelhagen. 2011. The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems. In 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011).Google ScholarGoogle Scholar
  30. A. Shahroudy, J. Liu, T. T. Ng, and G. Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In CVPR.Google ScholarGoogle Scholar
  31. Y. Shen, R. Ji, S. Zhang, W. Zuo, Y Wang, and F. Huang. 2018. Generative Adversarial Learning Towards Fast Weakly Supervised Detection. In CVPR.Google ScholarGoogle Scholar
  32. J. Wang, Z. Liu, Y. Wu, and J. Yuan. 2014. Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 914-- 927.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Wang, X. Nie, Y. Xia, Y. Wu, and S.C. Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Wei, Y. Zhao, N. Zheng, and S.C. Zhu. 2013. Modeling 4D Human-Object Interactions for Event and Object Recognition. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Weinland, E. Boyer, and R. Ronfard. 2007. Action Recognition from Arbitrary Views using 3D Exemplars. In ICCV.Google ScholarGoogle Scholar
  36. D. Weinland, R. Ronfard, and E. Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104 (2006), 249--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Yan, S. M. Khan, and M. Shah. 2008. Learning 4D action feature models for arbitrary view action recognition. In CVPR.Google ScholarGoogle Scholar
  38. S. Yan, Y. Xiong, and D. Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google ScholarGoogle Scholar
  39. L. Yu, Y. Yang, Z. Huang, P. Wang, J. Song, and H. T. Shen. 2016. Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents. IEEE Transactions on Image Processing 25, 12 (2016), 5689--5701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z.Cheng, L. Qin, Y. Ye, Q. Huang, and Q. Tian. 2012. Human Daily Action Analysis with Multi-view and Color-Depth Data. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Zhang and W. Zheng. 2017. Semi-supervised Multi-view Discrete Hashing for Fast Image Search. IEEE Transactions on Image Processing 26(6) (2017), 2604--2617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. Zhang, Y. Yang, M. Wang, R. Hong, L. Nie, and X. Li. 2016. Detecting Densely Distributed Graph Patterns for Fine-grained Image Categorization. IEEE Transactions on Image Processing 25, 2 (2016), 553--565.Google ScholarGoogle ScholarCross RefCross Ref
  43. Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu, and C. Shi. 2013. Cross-View Action Recognition via a Continuous Virtual Path. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Zhu, L. Shao, and M. Lin. 2013. Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognition Letters 34 (2013), 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MM '18: Proceedings of the 26th ACM international conference on Multimedia
            October 2018
            2167 pages
            ISBN:9781450356657
            DOI:10.1145/3240508

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 October 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader