Skip to main content
Log in

Human activity recognition based on multi-modal fusion

  • Regular Paper
  • Published:
CCF Transactions on Pervasive Computing and Interaction Aims and scope Submit manuscript

Abstract

In recent years, human activity recognition (HAR) methods are developing rapidly. However, most existing methods base on single input data modality, and suffers from accuracy and robustness issues. In this paper, we present a novel multi-modal HAR architecture which fuses signals from both RGB visual data and Inertial Measurement Units (IMU) data. As for the RGB modality, the speed-weighted star RGB representation is proposed to aggregate the temporal information, and a convolutional network is employed to extract features; As for the IMU modality, Fast Fourier transform and multi-layer perceptron are employed to extract the dynamical features of IMU data. As for the feature fusion scheme, the global soft attention layer is designed to adjust the weights according to the concatenated features, and the L-softmax with soft voting is adopted to classify activities. The proposed method is evaluated on the UP-Fall dataset, the F1-scores are 0.92 and 1.00 for 11 classes classification task and fall/non-fall binary classification task respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

No data was used for the research described in the article.

References

  • Abebe, G., Cavallaro, A.: Inertial-vision: cross-domain knowledge transfer for wearable sensors. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1392–1400 (2017)

  • Ahad, M., Rahman, A., Tan, J., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)

    Article  Google Scholar 

  • Balli, S., Sağbaş, E.A., Peker, M.: Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm. Meas. Control 52(1–2), 37–45 (2019)

    Article  Google Scholar 

  • Barros, P., Parisi, G.I., Jirak, D., Wermter, S.: Real-time gesture recognition using a humanoid robot with a deep neural architecture. In: 2014 IEEE-RAS International Conference on Humanoid Robots. IEEE, pp. 646–651 (2014)

  • Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  • Brena, R.F., Aguileta, A.A., Trejo, L.A., Molino-Minero-Re, E., Mayora, O.: Choosing the best sensor fusion method: a machine-learning approach. Sensors 20(8), 2350 (2020)

    Article  Google Scholar 

  • Chen, C., Jafari, R., Kehtarnavaz, N.: A survey of depth and inertial sensor fusion for human action recognition. Multim. Tools Appl. 76(3), 4405–4425 (2017)

    Article  Google Scholar 

  • Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from rgbd sensors. Comput. Intell. Neurosci. 2016 (2016)

  • Demrozi, F., Pravadelli, G., Bihorac, A., Rashidi, P.: Human activity recognition using inertial, physiological and environmental sensors: a comprehensive survey. IEEE Access 8, 210-210 836 (2010). (836)

    Google Scholar 

  • dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using cnns and star rgb: a temporal information condensation. Neurocomputing 400, 238–254 (2020)

    Article  Google Scholar 

  • Espinosa, R., Ponce, H., Gutiérrez, S., Martínez-Villaseñor, L., Brieva, J., Moya-Albor, E.: A vision-based approach for fall detection using multiple cameras and convolutional neural networks: a case study using the up-fall detection dataset. Comput. Biol. Med. 115, 103520 (2019)

    Article  Google Scholar 

  • Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)

  • Feichtenhofer, C.: X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)

  • Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: A survey. Comput. Vis. Image Underst. 134, 1–21 (2015)

    Article  MATH  Google Scholar 

  • Galvão, Y.M., Ferreira, J., Albuquerque, V.A., Barros, P., Fernandes, B.J.: A multimodal approach using deep learning for fall detection. Expert Syst. Appl. 168, 114226 (2021)

    Article  Google Scholar 

  • Gjoreski, H., Stankoski, S., Kiprijanovska, I., Nikolovska, A., Mladenovska, N., Trajanoska, M., Velichkovska, B., Gjoreski, M., Luštrek, M., Gams, M.: Wearable sensors data-fusion and machine-learning method for fall detection and activity recognition. In: Challenges and Trends in Multimodal Fall Detection for Healthcare. Springer, pp. 81–96 (2020)

  • Han, J., Bhanu, B.: Human activity recognition in thermal infrared imagery. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops. IEEE, pp. 17 (2005)

  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  • He, J., Zhang, Z., Wang, X., Yang, S.: A low power fall sensing technology based on fd-cnn. IEEE Sens. J. 19(13), 5110–5118 (2019)

    Article  Google Scholar 

  • He, J., Zhang, C., He, X., Dong, R.: Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390, 248–259 (2020)

    Article  Google Scholar 

  • Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)

    Article  MATH  Google Scholar 

  • Hwang, I., Cha, G., Oh, S.: Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data. In: 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, pp. 278–283 (2017)

  • Li, Z., Wu, H.: A survey of maneuvering target tracking using Kalman filter. In: 2015 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. Atlantis Press, pp. 542–545 (2015)

  • Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 507–516 (2016)

  • Lu, Y., Velipasalar, S.: Autonomous human activity classification from wearable multi-modal sensors. IEEE Sens. J. 19(23), 11 403-11 412 (2019)

    Article  Google Scholar 

  • Lucas, B.D., Kanade, T. et al.: An iterative image registration technique with an application to stereo vision. Vancouver 81 (1981)

  • Luo, F., Poslad, S., Bodanese, E.: Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Internet Things J. 7(8), 7432–7442 (2020)

    Article  Google Scholar 

  • Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv:1508.04025 (2015)

  • Mallat, R., Bonnet, V., Khalil, M., Mohammed, S.: Toward an affordable multi-modal motion capture system framework for human kinematics and kinetics assessment. In: International Symposium on Wearable Robotics. Springer, pp. 65–69 (2018)

  • Mao, A., Ma, X., He, Y., Luo, J.: Highly portable, sensor-based system for human fall monitoring. Sensors 17(9), 2096 (2017)

    Article  Google Scholar 

  • Martínez-Villaseñor, L., Ponce, H., Brieva, J., Moya-Albor, E., Núñez-Martínez, J., Peñafort-Asturiano, C.: Up-fall detection dataset: a multimodal approach. Sensors 19(9), 1988 (2019)

    Article  Google Scholar 

  • Ometov, A., Shubina, V., Klus, L., Skibińska, J., Saafi, S., Pascacio, P., Flueratoru, L., Gaibor, D.Q., Chukhno, N., Chukhno, O., et al.: A survey on wearable technology: history, state-of-the-art and current challenges. Comput. Netw. 193, 108074 (2021)

    Article  Google Scholar 

  • Ponce, H., Martínez-Villaseñor, L.: Approaching fall classification using the up-fall detection dataset: Analysis and results from an international competition. In: Challenges and Trends in Multimodal Fall Detection for Healthcare. Springer, pp. 121–133 (2020)

  • Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from accelerometer data. In: Aaai, vol. 5, no. 2005. Pittsburgh, PA, pp. 1541–1546 (2005)

  • Rivera, P., Valarezo, E., Choi, M.-T., Kim, T.-S.: Recognition of human hand activities based on a single wrist imu using recurrent neural networks. Int. J. Pharma Med. Biol. Sci. 6(4), 114–118 (2017)

    Google Scholar 

  • Salehzadeh, A., Calitz, A.P., Greyling, J.: Human activity recognition using deep electroencephalography learning. Biomed. Signal Process. Control 62, 102094 (2020)

    Article  Google Scholar 

  • Steven Eyobu, O., Han, D.S.: Feature representation and data augmentation for human activity classification based on wearable imu sensor data using a deep lstm neural network. Sensors 18(9), 2892 (2018)

    Article  Google Scholar 

  • Stoeve, M., Schuldhaus, D., Gamp, A., Zwick, C., Eskofier, B.M.: From the laboratory to the field: Imu-based shot and pass detection in football training and game scenarios using deep learning. Sensors 21(9), 3071 (2021)

    Article  Google Scholar 

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  • Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (of-mhi) for action recognition. Signal Image Video Process. 9(8), 1897–1906 (2015)

    Article  Google Scholar 

  • Wan, S., Qi, L., Xu, X., Tong, C., Gu, Z.: Deep learning models for real-time human activity recognition with smartphones. Mob. Netw. Appl. 25(2), 743–755 (2020)

    Article  Google Scholar 

  • Zhu, Y., Yu, J., Hu, F., Li, Z., Ling, Z.: Human activity recognition via smart-belt in wireless body area networks. Int. J. Distrib. Sens. Netw. 15(5), 1550147719849357 (2019)

    Article  Google Scholar 

  • Zimmermann, T., Taetz, B., Bleser, G.: Imu-to-segment assignment and orientation alignment for the lower body using deep learning. Sensors 18(1), 302 (2018)

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Key Research and Development Plan under Grant 2020YFB2104400.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian He.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Zu, T., Hou, Y. et al. Human activity recognition based on multi-modal fusion. CCF Trans. Pervasive Comp. Interact. 5, 321–332 (2023). https://doi.org/10.1007/s42486-023-00132-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42486-023-00132-x

Keywords

Navigation