Efficient human motion recovery using bidirectional attention network

Cui, Qiongjie; Sun, Huaijiang; Li, Yupeng; kong, Yue

doi:10.1007/s00521-019-04543-9

Efficient human motion recovery using bidirectional attention network

Original Article
Published: 23 October 2019

Volume 32, pages 10127–10142, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Qiongjie Cui¹,
Huaijiang Sun¹,
Yupeng Li¹ &
…
Yue kong¹

505 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Human motion capture (mocap) data, recording the movement from markers attached to specific joints, has gradually become the most popular solution of animation production. However, the raw motion data are often corrupted due to joint occlusion, marker shedding, and the lack of equipment precision, which severely limits the performance in real-world applications. Since human motion is essentially sequential data, the latest methods resort to variants of long short-time memory network (LSTM) to solve related problems, but most of them tend to obtain visually unreasonable results. This is mainly because these methods hardly capture long-term dependencies and cannot explicitly utilize relevant context. To address these issues, we propose a deep bidirectional attention network which can not only capture the long-term dependencies but also adaptively extract relevant information at each time step. Moreover, the proposed model, embedded attention mechanism in the bidirectional LSTM structure at the encoding and decoding stages, can decide where to borrow information and use it to recover the corrupted frame effectively. Extensive experiments on CMU database demonstrate that the proposed model consistently outperforms other state-of-the-art methods in terms of recovery accuracy and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skip-attention encoder–decoder framework for human motion prediction

Article 11 June 2021

Multi-level Motion Attention for Human Motion Prediction

Article 16 June 2021

MFOGCN: multi-feature-based orthogonal graph convolutional network for 3D human motion prediction

Article 30 November 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://mocap.cs.cmu.edu/.

References

Aristidou A, Cameron J, Lasenby J (2008) Real-time estimation of missing markers in human motion capture. In: The 2nd international conference on bioinformatics and biomedical engineering, 2008. ICBBE 2008. IEEE, pp 1343–1346
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473
Bütepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1591–1599
Feng Y, Xiao J, Zhuang Y, Yang X, Zhang JJ, Song R (2014) Exploiting temporal stability and low-rank structure for motion capture data refinement. Inf Sci 277(2):777–793
Article Google Scholar
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: 2015 IEEE international conference on computer vision (ICCV), pp 4346–4354
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ICML. https://doi.org/10.18653/v1/P16-1220. arxiv:1705.03122
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Holden D (2018) Robust solving of optical motion capture data by denoising. ACM Trans Graph 37:165:1–165:12
Article Google Scholar
Holden D, Komura T, Saito J (2017) Phase-functioned neural networks for character control. ACM Trans Graph 36:42:1–42:13
Article Google Scholar
Holden D, Saito J, Komura T (2016) A deep learning framework for character motion synthesis and editing. ACM Trans Graph 35:138:1–138:11
Article Google Scholar
Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia technical briefs
Hou J, Chau LP, He Y, Chen J, Magnenat-Thalmann N (2014) Human motion capture data recovery via trajectory-based sparse representation. In: IEEE international conference on image processing, pp 709–713
Hu W, Wang Z, Liu S, Yang X, Yu G, Zhang JJ (2018) Motion capture data completion via truncated nuclear norm regularization. IEEE Signal Process Lett 25:258–262
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR abs/1312.6114
Koltchinskii V, Lounici K, Tsybakov AB et al (2011) Nuclear norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329
Article MathSciNet MATH Google Scholar
Kucherenko T, Kjellström H (2018) A neural network approach to missing marker reconstruction. CoRR abs/1803.02665
Lai RYQ, Yuen PC, Lee KKW (2011) Motion capture data completion and denoising by singular value thresholding. Proc Eurogr Assoc 11(3):924–929
Google Scholar
Li C, Zhang Z, Lee WS, Lee GH (2018) Convolutional sequence to sequence model for human dynamics. In: CVPR
Lu X, Chen H, Yeung SK, Deng Z, Chen W (2018) Unsupervised articulated skeleton extraction from point set sequences captured by a single depth camera. In: AAAI
Mall U, Lal GR, Chaudhuri S, Chaudhuri P (2017) A deep recurrent framework for cleaning motion capture data. arXiv preprint arXiv:1712.03380
Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Unders 104(2):90–126
Article Google Scholar
Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: IJCAI
Ruiz AH, Gall J, Moreno-Noguer F (2018) Human motion prediction via spatio-temporal inpainting. CoRR abs/1812.05478
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article MATH Google Scholar
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Tan CH, Hou J, Chau LP (2013) Human motion capture data recovery using trajectory-based matrix completion. Electron Lett 49(12):752–754
Article Google Scholar
Uller-Ulhaas KD (2007) Robust optical user motion tracking using a kalman filter. In: ACM symposium on virtual reality software & technology
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: ICML
Wang L, Ding Z, Fu Y (2018) Learning transferable subspace for human motion segmentation. In: AAAI
Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recognit 36(3):585–601
Article Google Scholar
Wiley DJ, Hahn JK (1997) Interpolation synthesis of articulated figure motion. IEEE Comput Graph Appl 17(6):39
Article Google Scholar
Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044
Article Google Scholar
Xia G, Sun H, Chen B, Liu Q, Feng L, Zhang G, Hang R (2018) Nonlinear low-rank matrix completion for human motion recovery. IEEE Trans Image Process 27:3011–3024
Article MathSciNet MATH Google Scholar
Xia G, Sun H, Feng L, Zhang G, Liu Y (2018) Human motion segmentation via robust kernel sparse subspace clustering. IEEE Trans Image Process 27(1):135–150
Article MathSciNet MATH Google Scholar
Xia G, Sun H, Zhang G, Feng L (2016) Human motion recovery jointly utilizing statistical and kinematic information. Inf Sci 339:189–205
Article Google Scholar
Xiao J, Feng Y, Hu W (2011) Predicting missing markers in human motion capture using l1-sparse representation. Comput Anim Virtual Worlds 22(2–3):221–228
Article Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: HLT-NAACL
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4651–4659
Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Article Google Scholar
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: ACL
Zhou X, Liu S, Pavlakos G, Kumar V, Daniilidis K (2018) Human motion capture using a drone. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2027–2033

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61772272) and the Project of Science and Technology of Jiangsu Province of China under Grant BE2017031.

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, Jiangsu Province, China
Qiongjie Cui, Huaijiang Sun, Yupeng Li & Yue kong

Authors

Qiongjie Cui
View author publications
You can also search for this author inPubMed Google Scholar
Huaijiang Sun
View author publications
You can also search for this author inPubMed Google Scholar
Yupeng Li
View author publications
You can also search for this author inPubMed Google Scholar
Yue kong
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Huaijiang Sun.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, Q., Sun, H., Li, Y. et al. Efficient human motion recovery using bidirectional attention network. Neural Comput & Applic 32, 10127–10142 (2020). https://doi.org/10.1007/s00521-019-04543-9

Download citation

Received: 29 March 2019
Accepted: 03 October 2019
Published: 23 October 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00521-019-04543-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient human motion recovery using bidirectional attention network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Skip-attention encoder–decoder framework for human motion prediction

Multi-level Motion Attention for Human Motion Prediction

MFOGCN: multi-feature-based orthogonal graph convolutional network for 3D human motion prediction

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now