Human activity recognition based on multi-modal fusion

Zhang, Cheng; Zu, Tianqi; Hou, Yibin; He, Jian; Yang, Shengqi; Dong, Ruihai

doi:10.1007/s42486-023-00132-x

Human activity recognition based on multi-modal fusion

Regular Paper
Published: 22 May 2023

Volume 5, pages 321–332, (2023)
Cite this article

CCF Transactions on Pervasive Computing and Interaction Aims and scope Submit manuscript

Cheng Zhang ORCID: orcid.org/0000-0002-8593-1423¹,
Tianqi Zu¹,
Yibin Hou^1,2,
Jian He^1,2,
Shengqi Yang^1,2 &
…
Ruihai Dong³

343 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent years, human activity recognition (HAR) methods are developing rapidly. However, most existing methods base on single input data modality, and suffers from accuracy and robustness issues. In this paper, we present a novel multi-modal HAR architecture which fuses signals from both RGB visual data and Inertial Measurement Units (IMU) data. As for the RGB modality, the speed-weighted star RGB representation is proposed to aggregate the temporal information, and a convolutional network is employed to extract features; As for the IMU modality, Fast Fourier transform and multi-layer perceptron are employed to extract the dynamical features of IMU data. As for the feature fusion scheme, the global soft attention layer is designed to adjust the weights according to the concatenated features, and the L-softmax with soft voting is adopted to classify activities. The proposed method is evaluated on the UP-Fall dataset, the F1-scores are 0.92 and 1.00 for 11 classes classification task and fall/non-fall binary classification task respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Article 15 June 2021

Multi-cue Information Fusion for Two-Layer Activity Recognition

Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition

Article 12 March 2023

Data availability

No data was used for the research described in the article.

References

Abebe, G., Cavallaro, A.: Inertial-vision: cross-domain knowledge transfer for wearable sensors. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1392–1400 (2017)
Ahad, M., Rahman, A., Tan, J., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Article Google Scholar
Balli, S., Sağbaş, E.A., Peker, M.: Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm. Meas. Control 52(1–2), 37–45 (2019)
Article Google Scholar
Barros, P., Parisi, G.I., Jirak, D., Wermter, S.: Real-time gesture recognition using a humanoid robot with a deep neural architecture. In: 2014 IEEE-RAS International Conference on Humanoid Robots. IEEE, pp. 646–651 (2014)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Brena, R.F., Aguileta, A.A., Trejo, L.A., Molino-Minero-Re, E., Mayora, O.: Choosing the best sensor fusion method: a machine-learning approach. Sensors 20(8), 2350 (2020)
Article Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: A survey of depth and inertial sensor fusion for human action recognition. Multim. Tools Appl. 76(3), 4405–4425 (2017)
Article Google Scholar
Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from rgbd sensors. Comput. Intell. Neurosci. 2016 (2016)
Demrozi, F., Pravadelli, G., Bihorac, A., Rashidi, P.: Human activity recognition using inertial, physiological and environmental sensors: a comprehensive survey. IEEE Access 8, 210-210 836 (2010). (836)
Google Scholar
dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using cnns and star rgb: a temporal information condensation. Neurocomputing 400, 238–254 (2020)
Article Google Scholar
Espinosa, R., Ponce, H., Gutiérrez, S., Martínez-Villaseñor, L., Brieva, J., Moya-Albor, E.: A vision-based approach for fall detection using multiple cameras and convolutional neural networks: a case study using the up-fall detection dataset. Comput. Biol. Med. 115, 103520 (2019)
Article Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Feichtenhofer, C.: X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: A survey. Comput. Vis. Image Underst. 134, 1–21 (2015)
Article MATH Google Scholar
Galvão, Y.M., Ferreira, J., Albuquerque, V.A., Barros, P., Fernandes, B.J.: A multimodal approach using deep learning for fall detection. Expert Syst. Appl. 168, 114226 (2021)
Article Google Scholar
Gjoreski, H., Stankoski, S., Kiprijanovska, I., Nikolovska, A., Mladenovska, N., Trajanoska, M., Velichkovska, B., Gjoreski, M., Luštrek, M., Gams, M.: Wearable sensors data-fusion and machine-learning method for fall detection and activity recognition. In: Challenges and Trends in Multimodal Fall Detection for Healthcare. Springer, pp. 81–96 (2020)
Han, J., Bhanu, B.: Human activity recognition in thermal infrared imagery. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops. IEEE, pp. 17 (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, J., Zhang, Z., Wang, X., Yang, S.: A low power fall sensing technology based on fd-cnn. IEEE Sens. J. 19(13), 5110–5118 (2019)
Article Google Scholar
He, J., Zhang, C., He, X., Dong, R.: Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390, 248–259 (2020)
Article Google Scholar
Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)
Article MATH Google Scholar
Hwang, I., Cha, G., Oh, S.: Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data. In: 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, pp. 278–283 (2017)
Li, Z., Wu, H.: A survey of maneuvering target tracking using Kalman filter. In: 2015 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. Atlantis Press, pp. 542–545 (2015)
Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 507–516 (2016)
Lu, Y., Velipasalar, S.: Autonomous human activity classification from wearable multi-modal sensors. IEEE Sens. J. 19(23), 11 403-11 412 (2019)
Article Google Scholar
Lucas, B.D., Kanade, T. et al.: An iterative image registration technique with an application to stereo vision. Vancouver 81 (1981)
Luo, F., Poslad, S., Bodanese, E.: Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Internet Things J. 7(8), 7432–7442 (2020)
Article Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv:1508.04025 (2015)
Mallat, R., Bonnet, V., Khalil, M., Mohammed, S.: Toward an affordable multi-modal motion capture system framework for human kinematics and kinetics assessment. In: International Symposium on Wearable Robotics. Springer, pp. 65–69 (2018)
Mao, A., Ma, X., He, Y., Luo, J.: Highly portable, sensor-based system for human fall monitoring. Sensors 17(9), 2096 (2017)
Article Google Scholar
Martínez-Villaseñor, L., Ponce, H., Brieva, J., Moya-Albor, E., Núñez-Martínez, J., Peñafort-Asturiano, C.: Up-fall detection dataset: a multimodal approach. Sensors 19(9), 1988 (2019)
Article Google Scholar
Ometov, A., Shubina, V., Klus, L., Skibińska, J., Saafi, S., Pascacio, P., Flueratoru, L., Gaibor, D.Q., Chukhno, N., Chukhno, O., et al.: A survey on wearable technology: history, state-of-the-art and current challenges. Comput. Netw. 193, 108074 (2021)
Article Google Scholar
Ponce, H., Martínez-Villaseñor, L.: Approaching fall classification using the up-fall detection dataset: Analysis and results from an international competition. In: Challenges and Trends in Multimodal Fall Detection for Healthcare. Springer, pp. 121–133 (2020)
Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from accelerometer data. In: Aaai, vol. 5, no. 2005. Pittsburgh, PA, pp. 1541–1546 (2005)
Rivera, P., Valarezo, E., Choi, M.-T., Kim, T.-S.: Recognition of human hand activities based on a single wrist imu using recurrent neural networks. Int. J. Pharma Med. Biol. Sci. 6(4), 114–118 (2017)
Google Scholar
Salehzadeh, A., Calitz, A.P., Greyling, J.: Human activity recognition using deep electroencephalography learning. Biomed. Signal Process. Control 62, 102094 (2020)
Article Google Scholar
Steven Eyobu, O., Han, D.S.: Feature representation and data augmentation for human activity classification based on wearable imu sensor data using a deep lstm neural network. Sensors 18(9), 2892 (2018)
Article Google Scholar
Stoeve, M., Schuldhaus, D., Gamp, A., Zwick, C., Eskofier, B.M.: From the laboratory to the field: Imu-based shot and pass detection in football training and game scenarios using deep learning. Sensors 21(9), 3071 (2021)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (of-mhi) for action recognition. Signal Image Video Process. 9(8), 1897–1906 (2015)
Article Google Scholar
Wan, S., Qi, L., Xu, X., Tong, C., Gu, Z.: Deep learning models for real-time human activity recognition with smartphones. Mob. Netw. Appl. 25(2), 743–755 (2020)
Article Google Scholar
Zhu, Y., Yu, J., Hu, F., Li, Z., Ling, Z.: Human activity recognition via smart-belt in wireless body area networks. Int. J. Distrib. Sens. Netw. 15(5), 1550147719849357 (2019)
Article Google Scholar
Zimmermann, T., Taetz, B., Bleser, G.: Imu-to-segment assignment and orientation alignment for the lower body using deep learning. Sensors 18(1), 302 (2018)
Article Google Scholar

Download references

Funding

This work was supported by the National Key Research and Development Plan under Grant 2020YFB2104400.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Cheng Zhang, Tianqi Zu, Yibin Hou, Jian He & Shengqi Yang
Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, 100124, China
Yibin Hou, Jian He & Shengqi Yang
Insight Centre for Data Analytics, University College Dublin, Dublin 4, Ireland
Ruihai Dong

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianqi Zu
View author publications
You can also search for this author in PubMed Google Scholar
Yibin Hou
View author publications
You can also search for this author in PubMed Google Scholar
Jian He
View author publications
You can also search for this author in PubMed Google Scholar
Shengqi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ruihai Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian He.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, C., Zu, T., Hou, Y. et al. Human activity recognition based on multi-modal fusion. CCF Trans. Pervasive Comp. Interact. 5, 321–332 (2023). https://doi.org/10.1007/s42486-023-00132-x

Download citation

Received: 07 January 2023
Accepted: 27 April 2023
Published: 22 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s42486-023-00132-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human activity recognition based on multi-modal fusion

Abstract

Access this article

Similar content being viewed by others

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Multi-cue Information Fusion for Two-Layer Activity Recognition

Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human activity recognition based on multi-modal fusion

Abstract

Access this article

Similar content being viewed by others

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Multi-cue Information Fusion for Two-Layer Activity Recognition

Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation