skip to main content
10.1145/3323873.3325013acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Benchmarking Search and Annotation in Continuous Human Skeleton Sequences

Published: 05 June 2019 Publication History

Abstract

Motion capture data are digital representations of human movements in form of 3D trajectories of multiple body joints. To understand the captured motions, similarity-based processing and deep learning have already proved to be effective, especially in classifying pre-segmented actions. However, in real-world scenarios motion data are typically captured as long continuous sequences, without explicit knowledge of semantic partitioning. To make such unsegmented data accessible and reusable as required by many applications, there is a strong requirement to analyze, search, annotate and mine them automatically. However, there is currently an absence of datasets and benchmarks to test and compare the capabilities of the developed techniques for continuous motion data processing. In this paper, we introduce a new large-scale LSMB19 dataset consisting of two 3D skeleton sequences of a total length of 54.5 hours. We also define a benchmark on two important multimedia retrieval operations: subsequence search and annotation. Additionally, we exemplify the usability of the benchmark by establishing baseline results for these operations.

References

[1]
Andreas Aristidou, Daniel Cohen-Or, Jessica K Hodgins, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Deep motifs and motion signatures. In SIGGRAPH Asia 2018 Technical Papers. ACM, ACM, New York, NY, USA, 187.
[2]
Jürgen Bernard, Nils Wilhelm, Björn Krüger, Thorsten May, Tobias Schreck, and Jörn Kohlhammer. 2013. Motionexplorer: Exploratory search in human motion capture data based on hierarchical aggregation. IEEE transactions on visualization and computer graphics, Vol. 19, 12 (2013), 2257--2266.
[3]
Victoria Bloom, Vasileios Argyriou, and Dimitrios Makris. 2015. G3Di: A Gaming Interaction Dataset with a Real Time Detection and Evaluation Framework. In Computer Vision - ECCV 2014 Workshops, Lourdes Agapito, Michael M. Bronstein, and Carsten Rother (Eds.). Springer International Publishing, Cham, 698--712.
[4]
Said Yacine Boulahia, Eric Anquetil, Franck Multon, and Richard Kulpa. 2018. CuDi3D: Curvilinear displacement based approach for online 3D action detection. Computer Vision and Image Understanding, Vol. 174 (2018), 57--69.
[5]
Judith Butepage, Michael J. Black, Danica Kragic, and Hedvig Kjellstrom. 2017. Deep Representation Learning for Human Motion Prediction and Classification. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6158--6166.
[6]
Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, and Federico Tombari. 2018. Human Motion Analysis with Deep Metric Learning. In European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, 693--710.
[7]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In Int. Conference on Computer Vision and Pattern Recognition (CVPR 2015). IEEE, 1110--1118.
[8]
Petr Elias, Jan Sedmidubsky, and Pavel Zezula. 2017. A Real-Time Annotation of Motion Data Streams. In 19th International Symposium on Multimedia. IEEE Computer Society, 154--161.
[9]
Adso Fernandez-Baena, Antonio Susin, and Xavier Lligadas. 2012. Biomechanical validation of upper-body and lower-body joint movements of kinect motion capture data for rehabilitation treatments. In Intelligent networking and collaborative systems (INCoS), 2012 4th international conference on. IEEE, IEEE, 656--661.
[10]
Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing People for Training Gestural Interactive Systems. In SIGCHI Conference on Human Factors in Computing Systems (CHI 2012). ACM, 1737--1746.
[11]
A. Jain, A. R. Zamir, S. Savarese, and A. Saxena. 2016. Structural-RNN: Deep Learning on Spatio-Temporal Graphs. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5308--5317.
[12]
Mubbasir Kapadia, I-kao Chiang, Tiju Thomas, Norman Badler, and Joseph Kider. 2013. Efficient Motion Retrieval in Large Motion Databases. In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D 2013). ACM, 19--28.
[13]
Hema Swetha Koppula, Rudhir Gupta, and Ashutosh Saxena. 2013. Learning Human Activities and Object Affordances from RGB-D Videos. International Journal of Robotics Research, Vol. 32, 8 (July 2013), 951--970.
[14]
Björn Krüger, Anna Vögele, Tobias Willig, Angela Yao, Reinhard Klein, and Andreas Weber. 2017. Efficient Unsupervised Temporal Segmentation of Motion Data. IEEE Transactions on Multimedia, Vol. 19, 4 (2017), 797--812.
[15]
Wenbo Li, Longyin Wen, Mooi Choo Chuah, and Siwei Lyu. 2015. Category-blind human action recognition: A practical recognition system. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4444--4452.
[16]
Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks. In European Conference on Computer Vision (ECCV). Springer International Publishing, 203--220.
[17]
I. Lillo, A. Soto, and J. C. Niebles. 2014. Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 812--819.
[18]
Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. PKU-MMD: A Large Scale Benchmark for Skeleton-Based Human Action Understanding. In Workshop on Visual Analysis in Smart and Connected Communities (VSCC 2017). ACM, 1--8.
[19]
Jun Liu, Gang Wang, Ling-Yu Duan, Ping Hu, and Alex C. Kot. 2018. Skeleton Based Human Action Recognition with Global Context-Aware Attention LS™ Networks. IEEE Transactions on Image Processing, Vol. 27, 4 (2018), 1586--1599.
[20]
Meinard Müller, Andreas Baak, and Hans-Peter Seidel. 2009. Efficient and Robust Annotation of Motion Capture Data. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA 2009). ACM Press, 17--26.
[21]
M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007--2. Universitat Bonn.
[22]
Juan C. Nú nez, Raúl Cabido, Juan J. Pantrigo, Antonio S. Montemayor, and José F. Vélez. 2018. Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition, Vol. 76 (2018), 80--94.
[23]
Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2016. Similarity Searching in Long Sequences of Motion Capture Data. In 9th Int. Conference on Similarity Search and Applications (SISAP). Springer, 271--285.
[24]
Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2018. Effective and Efficient Similarity Searching in Motion Capture Data. Multimedia Tools and Applications, Vol. 77, 10 (2018), 12073--12094.
[25]
Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2019. Searching for variable-speed motions in long sequences of motion capture data. Information Systems, Vol. 80 (2019), 148--158.
[26]
Jan Sedmidubsky, Jakub Valcik, and Pavel Zezula. 2013. A Key-Pose Similarity Algorithm for Motion Data Retrieval. In Advanced Concepts for Intelligent Vision Systems (ACIVS 2013). LNCS, Vol. 8192. Springer, 669--681.
[27]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1010--1019.
[28]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2018. Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection. IEEE Transactions on Image Processing, Vol. 27, 7 (July 2018), 3459--3471.
[29]
Y. Wang and M. Neff. 2015. Deep signatures for indexing and retrieval in large motion databases. In ACM SIGGRAPH Conf. on Motion in Games. ACM, ACM, 37--45.
[30]
David Webster and Ozkan Celik. 2014. Systematic review of Kinect applications in elderly care and stroke rehabilitation. Journal of NeuroEngineering and Rehabilitation, Vol. 11, 1 (03 Jul 2014), 108.
[31]
C. Wu, J. Zhang, O. Sener, B. Selman, S. Savarese, and A. Saxena. 2018a. Watch-n-Patch: Unsupervised Learning of Actions and Relations. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (Feb 2018), 467--481.
[32]
Chenxia Wu, Jiemi Zhang, Ozan Sener, Bart Selman, Silvio Savarese, and Ashutosh Saxena. 2018b. Watch-n-patch: unsupervised learning of actions and relations. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (2018), 467--481.
[33]
D. Wu and L. Shao. 2014. Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 724--731.
[34]
Xiaomin Yu, Weibin Liu, and Weiwei Xing. 2017. Behavioral segmentation for human motion capture data based on graph cut method. Journal of Visual Languages & Computing, Vol. 43 (2017), 50--59.
[35]
K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 28--35.
[36]
Xin Zhao, Xue Li, Chaoyi Pang, Quan Z. Sheng, Sen Wang, and Mao Ye. 2014. Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 11, 1s (Oct. 2014), 22:1--22:18.
[37]
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LS™ Networks. In 30th Conference on Artificial Intelligence (AAAI 2016). AAAI Press, 3697--3703.

Cited By

View all
  • (2021)Activity Recognition with Combination of Deeply Learned Visual Attention and Pose EstimationApplied Sciences10.3390/app1109415311:9(4153)Online publication date: 1-May-2021
  • (2021)BABEL: Bodies, Action and Behavior with English Labels2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00078(722-731)Online publication date: Jun-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval
June 2019
427 pages
ISBN:9781450367653
DOI:10.1145/3323873
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. action detection
  2. benchmark
  3. continuous 3D skeleton sequence
  4. mining
  5. motion capture dataset
  6. stream-based processing
  7. subsequence search

Qualifiers

  • Short-paper

Funding Sources

  • Grantová Agentura ðeské Republiky

Conference

ICMR '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Activity Recognition with Combination of Deeply Learned Visual Attention and Pose EstimationApplied Sciences10.3390/app1109415311:9(4153)Online publication date: 1-May-2021
  • (2021)BABEL: Bodies, Action and Behavior with English Labels2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00078(722-731)Online publication date: Jun-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media