Abstract
Human motion capture (mocap) data, recording the movement from markers attached to specific joints, has gradually become the most popular solution of animation production. However, the raw motion data are often corrupted due to joint occlusion, marker shedding, and the lack of equipment precision, which severely limits the performance in real-world applications. Since human motion is essentially sequential data, the latest methods resort to variants of long short-time memory network (LSTM) to solve related problems, but most of them tend to obtain visually unreasonable results. This is mainly because these methods hardly capture long-term dependencies and cannot explicitly utilize relevant context. To address these issues, we propose a deep bidirectional attention network which can not only capture the long-term dependencies but also adaptively extract relevant information at each time step. Moreover, the proposed model, embedded attention mechanism in the bidirectional LSTM structure at the encoding and decoding stages, can decide where to borrow information and use it to recover the corrupted frame effectively. Extensive experiments on CMU database demonstrate that the proposed model consistently outperforms other state-of-the-art methods in terms of recovery accuracy and visualization.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Aristidou A, Cameron J, Lasenby J (2008) Real-time estimation of missing markers in human motion capture. In: The 2nd international conference on bioinformatics and biomedical engineering, 2008. ICBBE 2008. IEEE, pp 1343–1346
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473
Bütepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1591–1599
Feng Y, Xiao J, Zhuang Y, Yang X, Zhang JJ, Song R (2014) Exploiting temporal stability and low-rank structure for motion capture data refinement. Inf Sci 277(2):777–793
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: 2015 IEEE international conference on computer vision (ICCV), pp 4346–4354
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ICML. https://doi.org/10.18653/v1/P16-1220. arxiv:1705.03122
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Holden D (2018) Robust solving of optical motion capture data by denoising. ACM Trans Graph 37:165:1–165:12
Holden D, Komura T, Saito J (2017) Phase-functioned neural networks for character control. ACM Trans Graph 36:42:1–42:13
Holden D, Saito J, Komura T (2016) A deep learning framework for character motion synthesis and editing. ACM Trans Graph 35:138:1–138:11
Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia technical briefs
Hou J, Chau LP, He Y, Chen J, Magnenat-Thalmann N (2014) Human motion capture data recovery via trajectory-based sparse representation. In: IEEE international conference on image processing, pp 709–713
Hu W, Wang Z, Liu S, Yang X, Yu G, Zhang JJ (2018) Motion capture data completion via truncated nuclear norm regularization. IEEE Signal Process Lett 25:258–262
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR abs/1312.6114
Koltchinskii V, Lounici K, Tsybakov AB et al (2011) Nuclear norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329
Kucherenko T, Kjellström H (2018) A neural network approach to missing marker reconstruction. CoRR abs/1803.02665
Lai RYQ, Yuen PC, Lee KKW (2011) Motion capture data completion and denoising by singular value thresholding. Proc Eurogr Assoc 11(3):924–929
Li C, Zhang Z, Lee WS, Lee GH (2018) Convolutional sequence to sequence model for human dynamics. In: CVPR
Lu X, Chen H, Yeung SK, Deng Z, Chen W (2018) Unsupervised articulated skeleton extraction from point set sequences captured by a single depth camera. In: AAAI
Mall U, Lal GR, Chaudhuri S, Chaudhuri P (2017) A deep recurrent framework for cleaning motion capture data. arXiv preprint arXiv:1712.03380
Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Unders 104(2):90–126
Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: IJCAI
Ruiz AH, Gall J, Moreno-Noguer F (2018) Human motion prediction via spatio-temporal inpainting. CoRR abs/1812.05478
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Tan CH, Hou J, Chau LP (2013) Human motion capture data recovery using trajectory-based matrix completion. Electron Lett 49(12):752–754
Uller-Ulhaas KD (2007) Robust optical user motion tracking using a kalman filter. In: ACM symposium on virtual reality software & technology
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: ICML
Wang L, Ding Z, Fu Y (2018) Learning transferable subspace for human motion segmentation. In: AAAI
Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recognit 36(3):585–601
Wiley DJ, Hahn JK (1997) Interpolation synthesis of articulated figure motion. IEEE Comput Graph Appl 17(6):39
Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044
Xia G, Sun H, Chen B, Liu Q, Feng L, Zhang G, Hang R (2018) Nonlinear low-rank matrix completion for human motion recovery. IEEE Trans Image Process 27:3011–3024
Xia G, Sun H, Feng L, Zhang G, Liu Y (2018) Human motion segmentation via robust kernel sparse subspace clustering. IEEE Trans Image Process 27(1):135–150
Xia G, Sun H, Zhang G, Feng L (2016) Human motion recovery jointly utilizing statistical and kinematic information. Inf Sci 339:189–205
Xiao J, Feng Y, Hu W (2011) Predicting missing markers in human motion capture using l1-sparse representation. Comput Anim Virtual Worlds 22(2–3):221–228
Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: HLT-NAACL
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4651–4659
Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: ACL
Zhou X, Liu S, Pavlakos G, Kumar V, Daniilidis K (2018) Human motion capture using a drone. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2027–2033
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61772272) and the Project of Science and Technology of Jiangsu Province of China under Grant BE2017031.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cui, Q., Sun, H., Li, Y. et al. Efficient human motion recovery using bidirectional attention network. Neural Comput & Applic 32, 10127–10142 (2020). https://doi.org/10.1007/s00521-019-04543-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04543-9