poster

Human action recognition and retrieval using sole depth information

Authors:

Hong-Ming ChenAuthors Info & Claims

MM '12: Proceedings of the 20th ACM international conference on Multimedia

Pages 1053 - 1056

https://doi.org/10.1145/2393347.2396381

Published: 29 October 2012 Publication History

Get Access

Abstract

Observing the widespread use of Kinect-like depth cameras, in this work, we investigate into the problem of using sole depth data for human action recognition and retrieval in videos. We proposed the use of simple depth descriptors without learning optimization to achieve promising performances as compatible to those of the leading methods based on color images and videos, and can be effectively applied for real-time applications. Because of the infrared nature of depth cameras, the proposed approach will be especially useful under poor lighting conditions, e.g. the surveillance environments without sufficient lighting. Meanwhile, we proposed a large Depth-included Human Action video dataset, namely DHA, which contains 357 videos of performed human actions belonging to 17 categories. To the best of our knowledge, the DHA is one of the largest depth-included video datasets of human actions.

References

[1]

W. Brendel and S. Todorovic. Activities as time series of human postures. In Proceedings of the 11th European conference on Computer vision: Part II, ECCV'10, pages 721--734, Berlin, Heidelberg, 2010. Springer-Verlag.

Digital Library

Google Scholar

[2]

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247--2253, December 2007.

Digital Library

Google Scholar

[3]

T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971--987, jul 2002.

Digital Library

Google Scholar

[4]

OpenNI organization. OpenNI User Guide, November 2010. http://www.openni.org/documentation.

Google Scholar

[5]

PrimeSense Inc. Prime Sensor™ NITE 1.3 Algorithms notes, 2010. http://www.primesense.com.

Google Scholar

[6]

C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 3, pages 32--36 Vol.3, aug. 2004.

Digital Library

Google Scholar

[7]

X. Wu, D. Xu, L. Duan, and J. Luo. Action recognition using context and appearance distribution features. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 489--496, june 2011.

Digital Library

Google Scholar

[8]

M.-C. Yeh and K.-T. Cheng. A string matching approach for visual retrieval and classification. In Proceedings of the 1st ACM international conference on Multimedia information retrieval, MIR'08, pages 52--58, New York, NY, USA, 2008. ACM.

Digital Library

Google Scholar

Cited By

View all

Yang MSinaga K(2025)Federated Multi-View K-Means ClusteringIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352070847:4(2446-2459)Online publication date: Apr-2025
https://doi.org/10.1109/TPAMI.2024.3520708
Luo CXu JRen YMa JZhu XLarson K(2024)Simple contrastive multi-view clustering with data-level fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/519(4697-4705)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/519
Li MZhang RZhang YPiao XZhao SYin B(2024)SCAE: Structural Contrastive Auto-Encoder for Incomplete Multi-View Representation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367207820:9(1-24)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672078
Show More Cited By

Index Terms

Human action recognition and retrieval using sole depth information
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Actions speak louder than words: searching human action video based on body movement
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Human action video search is a frequent demand in multimedia applications, and conventional video search schemes based on keywords usually fail to correctly find relevant videos due to noisy video tags. Observing the widespread use of Kinect-like depth ...
A Vision-based Human Action Recognition System for Moving Cameras Through Deep Learning
SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning

This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of a robot is moving toward the target person from various directions. Therefore, the ...
Human action recognition based on 3D body mask and depth spatial-temporal maps
Abstract
In this paper, a method based on depth spatial-temporal maps(DSTMs) is presented for human action recognition from depth video sequences, which provides compact global spatial and temporal information of human motion for action recognition. In our ...

Comments

Information & Contributors

Information

Published In

MM '12: Proceedings of the 20th ACM international conference on Multimedia

October 2012

1584 pages

ISBN:9781450310895

DOI:10.1145/2393347

General Chairs:
Noboru Babaguchi
Osaka University, Japan
,
Kiyoharu Aizawa
The University of Tokyo, Japan
,
John Smith
IBM, USA
,
Program Chairs:
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Thomas Plagemann
University of Oslo, Norway
,
Xian-Sheng Hua
Microsoft, USA
,
Rong Yan
Facebook, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

MM '12

Sponsor:

SIGMM

MM '12: ACM Multimedia Conference

October 29 - November 2, 2012

Nara, Japan

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
684
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yang MSinaga K(2025)Federated Multi-View K-Means ClusteringIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352070847:4(2446-2459)Online publication date: Apr-2025
https://doi.org/10.1109/TPAMI.2024.3520708
Luo CXu JRen YMa JZhu XLarson K(2024)Simple contrastive multi-view clustering with data-level fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/519(4697-4705)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/519
Li MZhang RZhang YPiao XZhao SYin B(2024)SCAE: Structural Contrastive Auto-Encoder for Incomplete Multi-View Representation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367207820:9(1-24)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672078
Zhang QZhang LSong RCong RLiu YZhang W(2024)Learning Common Semantics via Optimal Transport for Contrastive Multi-View ClusteringIEEE Transactions on Image Processing10.1109/TIP.2024.343661533(4501-4515)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3436615
Wang JXu ZYang XLi HLi BMeng X(2024)Self‐supervised multi‐view clustering in computer visionIET Computer Vision10.1049/cvi2.1229918:6(709-734)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1049/cvi2.12299
Zhang KDu SWang YDeng T(2024)Deep incomplete multi-view clustering via attention-based direct contrastive learningExpert Systems with Applications10.1016/j.eswa.2024.124745255(124745)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124745
Shafizadegan FNaghsh-Nilchi AShabaninia E(2024)Multimodal vision-based human action recognition using deep learning: a reviewArtificial Intelligence Review10.1007/s10462-024-10730-557:7Online publication date: 19-Jun-2024
https://doi.org/10.1007/s10462-024-10730-5
Zhang MLi XWu Q(2023)Spatio-Temporal Information Fusion and Filtration for Human Action RecognitionSymmetry10.3390/sym1512217715:12(2177)Online publication date: 8-Dec-2023
https://doi.org/10.3390/sym15122177
Liu LWang KTian BAbdulla WGao MJeon G(2023)Human Behavior Recognition via Hierarchical Patches Descriptor and Approximate Locality-Constrained Linear CodingSensors10.3390/s2311517923:11(5179)Online publication date: 29-May-2023
https://doi.org/10.3390/s23115179
Xie HLo LShuai HCheng W(2023)An Overview of Facial Micro-Expression Analysis: Data, Methodology and ChallengeIEEE Transactions on Affective Computing10.1109/TAFFC.2022.314310014:3(1857-1875)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3143100
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Actions speak louder than words: searching human action video based on body movement

A Vision-based Human Action Recognition System for Moving Cameras Through Deep Learning

Human action recognition based on 3D body mask and depth spatial-temporal maps

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations