Skip to main content
Log in

Efficient human motion recovery using bidirectional attention network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Human motion capture (mocap) data, recording the movement from markers attached to specific joints, has gradually become the most popular solution of animation production. However, the raw motion data are often corrupted due to joint occlusion, marker shedding, and the lack of equipment precision, which severely limits the performance in real-world applications. Since human motion is essentially sequential data, the latest methods resort to variants of long short-time memory network (LSTM) to solve related problems, but most of them tend to obtain visually unreasonable results. This is mainly because these methods hardly capture long-term dependencies and cannot explicitly utilize relevant context. To address these issues, we propose a deep bidirectional attention network which can not only capture the long-term dependencies but also adaptively extract relevant information at each time step. Moreover, the proposed model, embedded attention mechanism in the bidirectional LSTM structure at the encoding and decoding stages, can decide where to borrow information and use it to recover the corrupted frame effectively. Extensive experiments on CMU database demonstrate that the proposed model consistently outperforms other state-of-the-art methods in terms of recovery accuracy and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://mocap.cs.cmu.edu/.

References

  1. Aristidou A, Cameron J, Lasenby J (2008) Real-time estimation of missing markers in human motion capture. In: The 2nd international conference on bioinformatics and biomedical engineering, 2008. ICBBE 2008. IEEE, pp 1343–1346

  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473

  3. Bütepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1591–1599

  4. Feng Y, Xiao J, Zhuang Y, Yang X, Zhang JJ, Song R (2014) Exploiting temporal stability and low-rank structure for motion capture data refinement. Inf Sci 277(2):777–793

    Article  Google Scholar 

  5. Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: 2015 IEEE international conference on computer vision (ICCV), pp 4346–4354

  6. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ICML. https://doi.org/10.18653/v1/P16-1220. arxiv:1705.03122

  7. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610

    Article  Google Scholar 

  8. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  9. Holden D (2018) Robust solving of optical motion capture data by denoising. ACM Trans Graph 37:165:1–165:12

    Article  Google Scholar 

  10. Holden D, Komura T, Saito J (2017) Phase-functioned neural networks for character control. ACM Trans Graph 36:42:1–42:13

    Article  Google Scholar 

  11. Holden D, Saito J, Komura T (2016) A deep learning framework for character motion synthesis and editing. ACM Trans Graph 35:138:1–138:11

    Article  Google Scholar 

  12. Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia technical briefs

  13. Hou J, Chau LP, He Y, Chen J, Magnenat-Thalmann N (2014) Human motion capture data recovery via trajectory-based sparse representation. In: IEEE international conference on image processing, pp 709–713

  14. Hu W, Wang Z, Liu S, Yang X, Yu G, Zhang JJ (2018) Motion capture data completion via truncated nuclear norm regularization. IEEE Signal Process Lett 25:258–262

    Article  Google Scholar 

  15. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980

  16. Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR abs/1312.6114

  17. Koltchinskii V, Lounici K, Tsybakov AB et al (2011) Nuclear norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329

    Article  MathSciNet  MATH  Google Scholar 

  18. Kucherenko T, Kjellström H (2018) A neural network approach to missing marker reconstruction. CoRR abs/1803.02665

  19. Lai RYQ, Yuen PC, Lee KKW (2011) Motion capture data completion and denoising by singular value thresholding. Proc Eurogr Assoc 11(3):924–929

    Google Scholar 

  20. Li C, Zhang Z, Lee WS, Lee GH (2018) Convolutional sequence to sequence model for human dynamics. In: CVPR

  21. Lu X, Chen H, Yeung SK, Deng Z, Chen W (2018) Unsupervised articulated skeleton extraction from point set sequences captured by a single depth camera. In: AAAI

  22. Mall U, Lal GR, Chaudhuri S, Chaudhuri P (2017) A deep recurrent framework for cleaning motion capture data. arXiv preprint arXiv:1712.03380

  23. Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Unders 104(2):90–126

    Article  Google Scholar 

  24. Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: IJCAI

  25. Ruiz AH, Gall J, Moreno-Noguer F (2018) Human motion prediction via spatio-temporal inpainting. CoRR abs/1812.05478

  26. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  MATH  Google Scholar 

  27. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  28. Tan CH, Hou J, Chau LP (2013) Human motion capture data recovery using trajectory-based matrix completion. Electron Lett 49(12):752–754

    Article  Google Scholar 

  29. Uller-Ulhaas KD (2007) Robust optical user motion tracking using a kalman filter. In: ACM symposium on virtual reality software & technology

  30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS

  31. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: ICML

  32. Wang L, Ding Z, Fu Y (2018) Learning transferable subspace for human motion segmentation. In: AAAI

  33. Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recognit 36(3):585–601

    Article  Google Scholar 

  34. Wiley DJ, Hahn JK (1997) Interpolation synthesis of articulated figure motion. IEEE Comput Graph Appl 17(6):39

    Article  Google Scholar 

  35. Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044

    Article  Google Scholar 

  36. Xia G, Sun H, Chen B, Liu Q, Feng L, Zhang G, Hang R (2018) Nonlinear low-rank matrix completion for human motion recovery. IEEE Trans Image Process 27:3011–3024

    Article  MathSciNet  MATH  Google Scholar 

  37. Xia G, Sun H, Feng L, Zhang G, Liu Y (2018) Human motion segmentation via robust kernel sparse subspace clustering. IEEE Trans Image Process 27(1):135–150

    Article  MathSciNet  MATH  Google Scholar 

  38. Xia G, Sun H, Zhang G, Feng L (2016) Human motion recovery jointly utilizing statistical and kinematic information. Inf Sci 339:189–205

    Article  Google Scholar 

  39. Xiao J, Feng Y, Hu W (2011) Predicting missing markers in human motion capture using l1-sparse representation. Comput Anim Virtual Worlds 22(2–3):221–228

    Article  Google Scholar 

  40. Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: HLT-NAACL

  41. You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4651–4659

  42. Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624

    Article  Google Scholar 

  43. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: ACL

  44. Zhou X, Liu S, Pavlakos G, Kumar V, Daniilidis K (2018) Human motion capture using a drone. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2027–2033

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61772272) and the Project of Science and Technology of Jiangsu Province of China under Grant BE2017031.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaijiang Sun.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, Q., Sun, H., Li, Y. et al. Efficient human motion recovery using bidirectional attention network. Neural Comput & Applic 32, 10127–10142 (2020). https://doi.org/10.1007/s00521-019-04543-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04543-9

Keywords