short-paper

Benchmarking Search and Annotation in Continuous Human Skeleton Sequences

Authors:

Jan Sedmidubsky,

Pavel ZezulaAuthors Info & Claims

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pages 38 - 42

https://doi.org/10.1145/3323873.3325013

Published: 05 June 2019 Publication History

Abstract

Motion capture data are digital representations of human movements in form of 3D trajectories of multiple body joints. To understand the captured motions, similarity-based processing and deep learning have already proved to be effective, especially in classifying pre-segmented actions. However, in real-world scenarios motion data are typically captured as long continuous sequences, without explicit knowledge of semantic partitioning. To make such unsegmented data accessible and reusable as required by many applications, there is a strong requirement to analyze, search, annotate and mine them automatically. However, there is currently an absence of datasets and benchmarks to test and compare the capabilities of the developed techniques for continuous motion data processing. In this paper, we introduce a new large-scale LSMB19 dataset consisting of two 3D skeleton sequences of a total length of 54.5 hours. We also define a benchmark on two important multimedia retrieval operations: subsequence search and annotation. Additionally, we exemplify the usability of the benchmark by establishing baseline results for these operations.

References

[1]

Andreas Aristidou, Daniel Cohen-Or, Jessica K Hodgins, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Deep motifs and motion signatures. In SIGGRAPH Asia 2018 Technical Papers. ACM, ACM, New York, NY, USA, 187.

Digital Library

[2]

Jürgen Bernard, Nils Wilhelm, Björn Krüger, Thorsten May, Tobias Schreck, and Jörn Kohlhammer. 2013. Motionexplorer: Exploratory search in human motion capture data based on hierarchical aggregation. IEEE transactions on visualization and computer graphics, Vol. 19, 12 (2013), 2257--2266.

Digital Library

[3]

Victoria Bloom, Vasileios Argyriou, and Dimitrios Makris. 2015. G3Di: A Gaming Interaction Dataset with a Real Time Detection and Evaluation Framework. In Computer Vision - ECCV 2014 Workshops, Lourdes Agapito, Michael M. Bronstein, and Carsten Rother (Eds.). Springer International Publishing, Cham, 698--712.

[4]

Said Yacine Boulahia, Eric Anquetil, Franck Multon, and Richard Kulpa. 2018. CuDi3D: Curvilinear displacement based approach for online 3D action detection. Computer Vision and Image Understanding, Vol. 174 (2018), 57--69.

[5]

Judith Butepage, Michael J. Black, Danica Kragic, and Hedvig Kjellstrom. 2017. Deep Representation Learning for Human Motion Prediction and Classification. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6158--6166.

[6]

Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, and Federico Tombari. 2018. Human Motion Analysis with Deep Metric Learning. In European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, 693--710.

[7]

Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In Int. Conference on Computer Vision and Pattern Recognition (CVPR 2015). IEEE, 1110--1118.

[8]

Petr Elias, Jan Sedmidubsky, and Pavel Zezula. 2017. A Real-Time Annotation of Motion Data Streams. In 19th International Symposium on Multimedia. IEEE Computer Society, 154--161.

[9]

Adso Fernandez-Baena, Antonio Susin, and Xavier Lligadas. 2012. Biomechanical validation of upper-body and lower-body joint movements of kinect motion capture data for rehabilitation treatments. In Intelligent networking and collaborative systems (INCoS), 2012 4th international conference on. IEEE, IEEE, 656--661.

Digital Library

[10]

Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing People for Training Gestural Interactive Systems. In SIGCHI Conference on Human Factors in Computing Systems (CHI 2012). ACM, 1737--1746.

[11]

A. Jain, A. R. Zamir, S. Savarese, and A. Saxena. 2016. Structural-RNN: Deep Learning on Spatio-Temporal Graphs. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5308--5317.

[12]

Mubbasir Kapadia, I-kao Chiang, Tiju Thomas, Norman Badler, and Joseph Kider. 2013. Efficient Motion Retrieval in Large Motion Databases. In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D 2013). ACM, 19--28.

[13]

Hema Swetha Koppula, Rudhir Gupta, and Ashutosh Saxena. 2013. Learning Human Activities and Object Affordances from RGB-D Videos. International Journal of Robotics Research, Vol. 32, 8 (July 2013), 951--970.

Digital Library

[14]

Björn Krüger, Anna Vögele, Tobias Willig, Angela Yao, Reinhard Klein, and Andreas Weber. 2017. Efficient Unsupervised Temporal Segmentation of Motion Data. IEEE Transactions on Multimedia, Vol. 19, 4 (2017), 797--812.

Digital Library

[15]

Wenbo Li, Longyin Wen, Mooi Choo Chuah, and Siwei Lyu. 2015. Category-blind human action recognition: A practical recognition system. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4444--4452.

Digital Library

[16]

Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks. In European Conference on Computer Vision (ECCV). Springer International Publishing, 203--220.

[17]

I. Lillo, A. Soto, and J. C. Niebles. 2014. Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 812--819.

Digital Library

[18]

Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. PKU-MMD: A Large Scale Benchmark for Skeleton-Based Human Action Understanding. In Workshop on Visual Analysis in Smart and Connected Communities (VSCC 2017). ACM, 1--8.

Digital Library

[19]

Jun Liu, Gang Wang, Ling-Yu Duan, Ping Hu, and Alex C. Kot. 2018. Skeleton Based Human Action Recognition with Global Context-Aware Attention LS™ Networks. IEEE Transactions on Image Processing, Vol. 27, 4 (2018), 1586--1599.

Digital Library

[20]

Meinard Müller, Andreas Baak, and Hans-Peter Seidel. 2009. Efficient and Robust Annotation of Motion Capture Data. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA 2009). ACM Press, 17--26.

[21]

M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007--2. Universitat Bonn.

[22]

Juan C. Nú nez, Raúl Cabido, Juan J. Pantrigo, Antonio S. Montemayor, and José F. Vélez. 2018. Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition, Vol. 76 (2018), 80--94.

Digital Library

[23]

Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2016. Similarity Searching in Long Sequences of Motion Capture Data. In 9th Int. Conference on Similarity Search and Applications (SISAP). Springer, 271--285.

[24]

Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2018. Effective and Efficient Similarity Searching in Motion Capture Data. Multimedia Tools and Applications, Vol. 77, 10 (2018), 12073--12094.

Digital Library

[25]

Jan Sedmidubsky, Petr Elias, and Pavel Zezula. 2019. Searching for variable-speed motions in long sequences of motion capture data. Information Systems, Vol. 80 (2019), 148--158.

[26]

Jan Sedmidubsky, Jakub Valcik, and Pavel Zezula. 2013. A Key-Pose Similarity Algorithm for Motion Data Retrieval. In Advanced Concepts for Intelligent Vision Systems (ACIVS 2013). LNCS, Vol. 8192. Springer, 669--681.

[27]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1010--1019.

[28]

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2018. Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection. IEEE Transactions on Image Processing, Vol. 27, 7 (July 2018), 3459--3471.

[29]

Y. Wang and M. Neff. 2015. Deep signatures for indexing and retrieval in large motion databases. In ACM SIGGRAPH Conf. on Motion in Games. ACM, ACM, 37--45.

Digital Library

[30]

David Webster and Ozkan Celik. 2014. Systematic review of Kinect applications in elderly care and stroke rehabilitation. Journal of NeuroEngineering and Rehabilitation, Vol. 11, 1 (03 Jul 2014), 108.

[31]

C. Wu, J. Zhang, O. Sener, B. Selman, S. Savarese, and A. Saxena. 2018a. Watch-n-Patch: Unsupervised Learning of Actions and Relations. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (Feb 2018), 467--481.

Digital Library

[32]

Chenxia Wu, Jiemi Zhang, Ozan Sener, Bart Selman, Silvio Savarese, and Ashutosh Saxena. 2018b. Watch-n-patch: unsupervised learning of actions and relations. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2 (2018), 467--481.

Digital Library

[33]

D. Wu and L. Shao. 2014. Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 724--731.

[34]

Xiaomin Yu, Weibin Liu, and Weiwei Xing. 2017. Behavioral segmentation for human motion capture data based on graph cut method. Journal of Visual Languages & Computing, Vol. 43 (2017), 50--59.

Digital Library

[35]

K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 28--35.

[36]

Xin Zhao, Xue Li, Chaoyi Pang, Quan Z. Sheng, Sen Wang, and Mao Ye. 2014. Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 11, 1s (Oct. 2014), 22:1--22:18.

Digital Library

[37]

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LS™ Networks. In 30th Conference on Artificial Intelligence (AAAI 2016). AAAI Press, 3697--3703.

Digital Library

Cited By

Kim JLee D(2021)Activity Recognition with Combination of Deeply Learned Visual Attention and Pose EstimationApplied Sciences10.3390/app1109415311:9(4153)Online publication date: 1-May-2021
https://doi.org/10.3390/app11094153
Punnakkal AChandrasekaran AAthanasiou NQuiros-Ramirez ABlack M(2021)BABEL: Bodies, Action and Behavior with English Labels2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00078(722-731)Online publication date: Jun-2021
https://doi.org/10.1109/CVPR46437.2021.00078

Index Terms

Benchmarking Search and Annotation in Continuous Human Skeleton Sequences
1. Information systems
  1. Information retrieval

Recommendations

Similarity Search in 3D Human Motion Data
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Motion capture technologies can digitize human movements into a discrete sequence of 3D skeletons. Such spatio-temporal data have a great application potential in many fields, ranging from computer animation, through security and sports to medicine, but ...
Similarity-Based Processing of Motion Capture Data
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal data have an enormous application potential in many fields, ranging from computer animation, through security and ...
Subsequence Search in Event-Interval Sequences
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

We study the problem of subsequence search in databases of event-interval sequences, or e-sequences. In contrast to sequences of instantaneous events, e-sequences contain events that have a duration. In Information Retrieval applications, e-sequences ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

June 2019

427 pages

ISBN:9781450367653

DOI:10.1145/3323873

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Alberto Del Bimbo
University of Florence, Italy
,
Zhongfei Zhang
Binghamton University, State University of New York, USA
,
Program Chairs:
Alexander Hauptmann
Carnegie Mellon University, USA
,
K. Selcuk Candan
Arizona State University, USA
,
Marco Bertini
University of Florence, Italy
,
Lexing Xie
Australia National University, Australia
,
Xiao-Yong Wei
Sichuan University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Grantová Agentura ðeské Republiky

Conference

ICMR '19

Sponsor:

SIGMM

ICMR '19: International Conference on Multimedia Retrieval

June 10 - 13, 2019

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
322
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim JLee D(2021)Activity Recognition with Combination of Deeply Learned Visual Attention and Pose EstimationApplied Sciences10.3390/app1109415311:9(4153)Online publication date: 1-May-2021
https://doi.org/10.3390/app11094153
Punnakkal AChandrasekaran AAthanasiou NQuiros-Ramirez ABlack M(2021)BABEL: Bodies, Action and Behavior with English Labels2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00078(722-731)Online publication date: Jun-2021
https://doi.org/10.1109/CVPR46437.2021.00078

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten