Abstract
Action recognition and positional tracking are critical issues in many applications in Virtual Reality (VR). In this paper, a novel feature representation method is proposed to recognize actions based on sensor signals. The feature extraction is achieved by jointly learning Convolutional Auto-Encoder (CAE) and the representation of motion bases via clustering, which is called the Sequence of Cluster Centroids (SoCC). Then, the learned features are used to train the action recognition classifier. We have collected new dataset of actions of limbs by sensor signals. In addition, a novel action tracking method is proposed for the VR environment. It extends the sensor signals from three Degrees of Freedom (DoF) of rotation to 6DoF of position plus rotation. Experimental results demonstrate that CAE-SoCC feature is effective for action recognition and accurate prediction of position displacement.
Similar content being viewed by others
References
Aparecido Garcia F, Mazzoni Ranieri C, Aparecida Francelin Romero R (2019) Temporal approaches for human activity recognition using inertial sensors. In: 2019 Latin american robotics symposium (LARS), 2019 brazilian symposium on robotics (SBR) and 2019 workshop on robotics in education (WRE), pp. 121–125
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149
Chavarriaga R, Sagha H, Calatroni A, Digumarti ST, Tröster G, Millán JdR, Roggen D (2013) The opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recogn Lett 34(15):2033–2042
Chen C, Lu X, Markham A, Trigoni A (2018) Ionet: Learning to cure the curse of drift in inertial odometry. arXiv:abs/1802.02209
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning, PMLR, pp 933–941
Jaouedi N, Boujnah N, Bouhlel MS (2020) A new hybrid deep learning model for human action recognition. Journal of King Saud University - Computer and Information Sciences 32 (4):447–453. https://doi.org/10.1016/j.jksuci.2019.09.004, http://www.sciencedirect.com/science/article/pii/S1319157819300412. Emerging Software Systems
Jegham I, Khalifa AB, Alouani I, Mahjoub MA (2020) Vision-based human action recognition: an overview and real world challenges. Forensic Science International: Digital Investigation 32:200901
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35 (1):221–231
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: International conference on machine learning, PMLR, pp 2342–2350
Kapoor A, Singhal A (2017) A comparative study of k-means, k-means++ and fuzzy c-means clustering algorithms. In: 2017 3Rd international conference on computational intelligence & communication technology (CICT), IEEE, pp 1–6
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li F, Shirahama K, Nisar MA, Köping L, Grzegorzek M (2018) Comparison of feature learning methods for human activity recognition using wearable sensors. Sensors 18(2):679
Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Ma S, Sigal L, Sclaroff S (2016) Learning activity progression in lstms for activity detection and early detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1942–1950
Nafea O, Abdul W, Muhammad G, Alsulaiman M (2021) Sensor-based human activity recognition with spatio-temporal deep learning. Sensors 21(6):2141
Poppe R (2010) A survey on vision-based human action recognition. Image and Vision Computing 28(6):976–990
Qian H, Pan SJ, Da B, Miao C (2019) A novel distribution-embedded neural network for sensor-based activity recognition
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Saker M, Frith J (2018) From hybrid space to dislocated space: Mobile virtual reality and a third stage of mobile media theory. New Media —& Society 21:146144481879240. https://doi.org/10.1177/1461444818792407
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2018) Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE Access 6:1155–1166
Van Hees VT, Gorzelniak L, Leon ECD, Eder M, Pias M, Taherian S, Ekelund U, Renström F, Franks PW, Horsch A et al (2013) Separating movement and gravity components in an acceleration signal and implications for the assessment of human daily physical activity. PloS One 8(4):e61691
Wang L, Liu R (2020) Human activity recognition based on wearable sensor using hierarchical deep lstm networks. Circuits, Systems, and Signal Processing 39(2):837–856
Yang J, Nguyen MN, San PP, Li XL, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Twenty-fourth international joint conference on artificial intelligence
Yong D, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1110–1118
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/P16-2034, https://www.aclweb.org/anthology/P16-2034. Association for Computational Linguistics, Berlin, pp 207–212
Zhu A, Wu Q, Cui R, Wang T, Hang W, Hua G, Snoussi H (2020) Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional lstm-cnn. Neurocomputing 414:90–100. https://doi.org/10.1016/j.neucom.2020.07.068, http://www.sciencedirect.com/science/article/pii/S0925231220311760
Acknowledgements
This work was supported by the Advanced Institute of Manufacturing with High-tech Innovations and Center for Innovative Research on Aging Society (CIRAS) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by Ministry of Education (MOE) in Taiwan. It was also supported by Ministry of Science and Technology under the grant 109-2218-E-194-009.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, HT., Liu, YP., Chang, YK. et al. Action recognition and tracking via deep representation extraction and motion bases learning. Multimed Tools Appl 81, 11845–11864 (2022). https://doi.org/10.1007/s11042-021-11888-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11888-8