Abstract
Human motion prediction, although in the field of human-computer interaction, personnel tracking, automatic driving and other fields have very important significance. However, human motion prediction is affected by uncertainties such as motion speed and amplitude, which results in the predicted first frame is discontinuous and the time for accurate prediction is short. This paper proposes a method that combines sequence-to-sequence (seq2seq) structure and Attention mechanisms to improve the problems of current methods. We refer to the proposed structure as the At-seq2seq model, which is a sequence-to-sequence model based on GRU (Gated Recurrent Unit). We added an attention mechanism in the decoder part of the seq2seq model to further encode the output of the encoder into a vector sequence containing multiple subsets so that the decoder selects the most relevant part of the sequence for decoding prediction. The At-seq2seq model has been validated on the human3.6 m dataset. The experimental results show that the proposed model can not only improve the error of short-term motion prediction but also significantly increase the time of accurate prediction.
Similar content being viewed by others
References
Akhter I, Simon T, Khan S et al (2012) Bilinear spatiotemporal basis models. ACM Trans Graph 31(2):1–12
Brand M (2000) Style machines. Siggraph Computer Graphics Proceedings, 183–192
Cascianelli S, Costante G, Ciarfuglia TA et al (2018) Full-GRU Natural Language Video Description for Service Robotics Applications. IEEE Robotics & Automation Letters 3(2):841–848
Cho K, Van Merrienboer B, Gulcehre C et al (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Comput Therm Sci. https://doi.org/10.3115/v1/D14-1179
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K et al (2014) Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(4):677–691
Fragkiadaki, K., Levine, S., Felsen, P., & Malik, J. (2015). Recurrent network models for human dynamics. https://doi.org/10.1109/ICCV.2015.494
Graves A (2013) Generating sequences with recurrent neural networks. Computer Science. https://arxiv.org/abs/1308.0850
Gwynne SMV, Hulse LM, Kinsey MJ (2017) Guidance for the Model Developer on Representing Human Behavior in Egress Models. Fire Technol 53(2):649
Ionescu C, Papava D, Olaru V et al (2014) Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Jain A, Zamir AR, Savarese S, et al (2015) Structural-RNN: Deep Learning on Spatio-Temporal Graphs. 5308-5317. https://doi.org/10.1109/CVPR.2016.573
Jozefowicz R, Zaremba W, Sutskever I (2015) An Empirical Exploration of Recurrent Network Architectures. International Conference on International Conference on Machine Learning. JMLR.org
Kim B, Choi J, Lee GG (2016) ASR Error Management Using RNN Based Syllable Prediction for Spoken Dialog Applications. Advances in Parallel and Distributed Computing and Ubiquitous Services. Springer Singapore
Kombrink S (2011) Recurrent neural network based language modeling in meeting recognition. Proc. INTERSPEECH, 2011
Lee YM, Kim JH (2017) Trajectory Generation Using RNN with Context Information for Mobile Robots. Robot Intelligence Technology and Applications 4
Li X, Mao C, Huang S, Ye Z (2017) Chinese Sign Language Recognition Based on SHS Descriptor and Encoder-Decoder LSTM Model. Chinese Conference on Biometric Recognition. Springer, Cham
Lin C, Chi M (2017) A Comparisons of BKT, RNN and LSTM for Learning Gain Prediction. International Conference on Artificial Intelligence in Education. Springer, Cham. https://doi.org/10.1007/978-3-319-61425-0_58
Mao C, Huang S, Li X, et al (2017) Chinese Sign Language Recognition with Sequence to Sequence Learning. CCF Chinese Conference on Computer Vision. Springer, Singapore. https://doi.org/10.1007/978-981-10-7299-4_15
Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. https://doi.org/10.1109/CVPR.2017.497
Noah W, Balasubramanian LS (2018) The fine line between linguistic generalization and failure in seq2seq-attention models. https://arxiv.org/abs/1805.01445
Pavlovic V (2001) Learning switching linear models of human motion. Advances in Neural Information Processing Systems. 13:981--987. Advances in Neural Information Processing Systems 13 (NIPS 2000)
Saini S, Rambli DRBA, Zakaria N, Sulaiman SB (2014) A review on particle swarm optimization algorithm and its variants to human motion tracking. Math Probl Eng 2014
Shen Y, Phan N, Xiao X et al (2016) Dynamic Socialized Gaussian Process Models for Human Behavior Prediction in a Health Social Network. Knowl Inf Syst 49(2):1–25
Strobelt H, Gehrmann S, Behrisch M et al (2018) SEQ2SEQ-VIS: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Trans Vis Comput Graph:1–1
Sutskever I, Martens J, Hinton GE (2011) Generating Text with Recurrent Neural Networks. Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, DBLP
Vinyals O, Toshev A, Bengio S et al (2014) Show and Tell: A Neural Image Caption Generator 3156-3164. https://arxiv.org/abs/1411.4555
Wang JM, Fleet DJ, Hertzmann A (2007) Gaussian process dynamical models for human motion
Wang J, Fleet D, Hertzmann A (2007) Multifactor Gaussian process models for style-content separation. International Conference on Machine Learning. ACM
Xia J, Zhang J, Wang R (2016) Modeling of Adaptive Human–Machine Systems Based on Fuzzy Inference Petri Nets. Advances in Cognitive Neurodynamics (V). Springer Singapore. https://doi.org/10.1007/978-981-10-0207-6_67
Yu Z, Yu J, Fan J, et al (2017) Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering. https://doi.org/10.1109/ICCV.2017.202
Zhang J, Edwards TE (2017) Guest editorial for special issue on modeling and analysis of human–machine systems in transportation. Cogn Tech Work:1–2
Zhu Z, Zhang J, Zou J (2018) A multi-kernel based Gaussian process dynamic model for human motion modeling. International Conference on Security. IEEE. https://doi.org/10.1109/SPAC.2017.8304322
Acknowledgments
Thanks are due to the National Natural Science Foundation of China under grant nos. 61773105 and 61374147 and the Fundamental Research Funds for the Central Universities under grant no. N182008004 for supporting this research work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sang, HF., Chen, ZZ. & He, DK. Human Motion prediction based on attention mechanism. Multimed Tools Appl 79, 5529–5544 (2020). https://doi.org/10.1007/s11042-019-08269-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08269-7