Abstract
Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing the results for cases that follow a homogeneous distribution (in distribution) and not discussing the issues of the domain shift problem, where training and testing data follow a heterogeneous (out of distribution) problem, which is the reality when such models are used in practice. However, recent research proposed addressing domain shift issues by augmenting the discriminative model with a generative model and obtained better results. In the present investigation, we propose regularizing the extended network by inserting linear layers to minimize the rank of the latent space and train the entire end-to-end network. We regularize the network to strengthen the model to deal effectively with domain shift scenarios. Both training and testing data come from different distribution sets; to deal with this, we toughen our network by adding the extra linear layers to the network encoder. We tested our model with the benchmark datasets, CMU Motion Capture and Human3.6M, and proved that our model outperforms 14 OoD actions of H3.6M and 7 OoD actions of CMU MoCap in terms of the Euclidean distance calculated between predicted and ground truth joint angle values. Our average results of 14 OoD actions for short-term (80, 160, 320, 400) are 0.34, 0.6, 0.96, 1.07, and for CMU MoCap of 7 OoD actions for short-term and long term (80, 160, 320, 400, 1000) are 0.28, 0.45, 0.77, 0.89, 1.46. All these results are much better than the other state-of-the-art results.
Similar content being viewed by others
Data Availability Statement
No associated data is available
References
Gui L-Y, Zhang K, Wang Y-X, Liang X, Moura JM, Veloso M (2018) Teaching robots to predict human motion. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 562–567
Geertsema EE, Thijs RD, Gutter T, Vledder B, Arends JB, Leijten FS, Visser GH, Kalitzin SN (2018) Automated video-based detection of nocturnal convulsive seizures in a residential care setting. Epilepsia 59:53–60
Shirai A, Geslin E, Richir S (2007) Wiimedia: motion analysis methods and applications using a consumer video game controller. In: Proceedings of the 2007 ACM SIGGRAPH symposium on video games, pp 133–140
Rofougaran AR, Rofougaran M, Seshadri N, Ibrahim BB, Walley J, Karaoguz J (2018) Game console and gaming object with motion prediction modeling and methods for use therewith. Google Patents, US Patent 9,943.760
Zhang B, Zhong J, Cai W (2022) A data-driven approach for pedestrian intention prediction in large public places. In: SIGSIM Conference on principles of advanced discrete simulation, pp 33–36
Ma Q, Zou Q, Huang Y, Wang N (2022) Dynamic pedestrian trajectory forecasting with lstm-based delaunay triangulation. Appl Intell 52(3):3018–3028
Hsu Y. -C., Shen Y, Jin H, Kira Z (2020) Generalized odin: detecting out-of-distribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10951–10960
Singh D, Srivastava R (2022) Graph neural network with rnns based trajectory prediction of dynamic agents for autonomous vehicle. Appl Intell 1–16
Kalatian A, Farooq B (2022) A context-aware pedestrian trajectory prediction framework for automated vehicles. Transportation Research Part C: Emerging Technologies 134:103453
Dafrallah S, Amine A, Mousset S, Bensrhair A (2021) Monocular pedestrian orientation recognition based on capsule network for a novel collision warning system. IEEE Access 9:141635–141650
Bourached A, Griffiths R. -R., Gray R, Jha A, Nachev P (2020) Generative model-enhanced human motion prediction. Applied AI Letters
Mao W, Liu M, Salzmann M, Li H (2019) Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9489–9497
Jing L, Zbontar J, et al. (2020) Implicit rank-minimizing autoencoder. Adv Neural Inf Process Syst 33:14736–14746
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36 (7):1325–1339
CMU Graphics Lab Motion Capture Database. http://mocap.cs.cmu.edu/
Li M, Chen S, Zhao Y, Zhang Y, Wang Y, Tian Q (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 214–223
Butepage J, Black MJ, Kragic D, Kjellstrom H (2017) Deep representation learning for human motion prediction and classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6158–6166
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: Proceedings of the IEEE international conference on computer vision, pp 4346–4354
Mao W, Liu M, Salzmann M (2020) History repeats itself: human motion prediction via motion attention. In: European conference on computer vision. Springer, pp 474–489
Yu Y, Tian N, Hao X, Ma T, Yang C (2022) Human motion prediction with gated recurrent unit model of multi-dimensional input. Appl Intell 52(6):6769–6781
Zhang C, Yang Z, He X, Deng L (2020) Multimodal intelligence: representation learning, information fusion, and applications. IEEE J Sel Top Signal Process 14(3):478–493
Aldhubri A, Lasheng Y, Mohsen F, Al-Qatf M (2021) Variational autoencoder bayesian matrix factorization (vabmf) for collaborative filtering. Appl Intell 51(7):5132–5145
Lopez R, Boyeau P, Yosef N, Jordan M, Regier J (2020) Decision-making with auto-encoding variational bayes. Adv Neural Inf Process Syst 33:5081–5092
Zietlow D, Rolinek M, Martius G (2021) Demystifying inductive biases for (beta-) vae based architectures. In: International conference on machine learning. PMLR, pp 12945–12954
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607
Liang S, Li Y, Srikant R (2018) Enhancing the reliability of out-of-distribution image detection in neural networks. In: International conference on learning representations
Hendrycks D, Mazeika M, Dietterich T (2018) Deep anomaly detection with outlier exposure. In: International conference on learning representations
Gustafsson FK, Danelljan M, Schon TB (2020) Evaluating scalable bayesian deep learning methods for robust computer vision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 318–319
Lee K, Lee K, Lee H, Shin J (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems 31
Saxe AM, McClelland JL, Ganguli S (2019) A mathematical theory of semantic development in deep neural networks. Proc Natl Acad Sci 116(23):11537–11546
Gunasekar S, Woodworth B, Bhojanapalli S, Neyshabur B, Srebro N (2018) Implicit regularization in matrix factorization. In: 2018 information theory and applications workshop (ITA). IEEE, pp 1–10
Soudry D, Hoffer E, Nacson MS, Gunasekar S, Srebro N (2018) The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research 19(1):2822–2878
Gidel G, Bach F, Lacoste-Julien S (2019) Implicit regularization of discrete gradient dynamics in linear neural networks. Adv Neural Inf Process Syst 32
Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36 (7):1325–1339
Yadav GK, Nandi G (2020) Development of adaptive sampling based strategy for human activity predictions using sequential networks. In: 2020 IEEE 4th conference on information & communication technology (CICT). IEEE, pp 1–6
Lian J, Ren W, Li L, Zhou Y, Zhou B (2022) Ptp-stgcn: pedestrian trajectory prediction based on a spatio-temporal graph convolutional neural network. Appl Intell 1–17
Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2891–2900
Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459–1469
Myronenko A (2018) 3d mri brain tumor segmentation using autoencoder regularization. In: International MICCAI brainlesion workshop. Springer, pp 311–320
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th international symposium on quality of Service (IWQoS). Ieee, pp 1–2
Lebailly T, Kiciroglu S, Salzmann M, Fua P, Wang W (2020) Motion prediction using temporal inception module. In: Proceedings of the asian conference on computer vision
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yadav, G.K., Abdel-Nasser, M., Rashwan, H.A. et al. Implicit regularization of a deep augmented neural network model for human motion prediction. Appl Intell 53, 18027–18040 (2023). https://doi.org/10.1007/s10489-022-04419-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04419-x