skip to main content
10.1145/3503161.3551598acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Transformer Based Approach for Activity Detection

Published: 10 October 2022 Publication History

Abstract

Non-invasive physiological sensors allow for the collection of user-specific data in realistic environments. In this paper, using physiological data, we investigate the effectiveness of Convolutional Neural Network (CNN) based feature embeddings and Transformer architecture for the human activity recognition task. 1D-CNN representation is used for the heart rate, and 2D-CNN is used for short-term Fourier transformation of the accelerometer data. Post fusion, the feature is input into a transformer. The experiments are performed on the harAGE dataset. The findings indicate the discriminative ability of the feature-fusion on transformer-based architecture, and the method outperforms the harAGE baseline by an absolute 3.7%.

Supplementary Material

MOV File (MM_22.mov)
In this paper, we propose a CNN-Transformer fusion model for detecting human activity from commercial smartwatch data. The proposed method extracts relevant features from the accelerometer and heart-rate sensors using CNNs, and the transformer model receives input from these modality-specific CNNs. The experiments are performed on the harAGE dataset and the method outperforms the harAGE baseline by an absolute 3.7%.

References

[1]
Djamila Romaissa Beddiar, Brahim Nini, Mohammad Sabokrou, and Abdenour Hadid. 2020. Vision-based human activity recognition: a survey. Multimedia Tools and Applications 79, 41--42 (2020), 30509--30555. https://doi.org/10.1007/s11042-020-09004-3
[2]
D. Griffin and Jae Lim. 1984. Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 2 (1984), 236--243. https://doi.org/10.1109/TASSP.1984.1164317
[3]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-Normalizing Neural Networks. In Advances in Neu- ral Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 5d44ee6f2c3f71b73125876103c8f6c4-Paper.pdf
[4]
Yande Li, Lulan Yu, Jun Liao, Guoxin Su, Hashmi Ammarah, Li Liu, and Shu Wang. 2022. A single smartwatch-based segmentation approach in human activity recognition. Pervasive and Mobile Computing 83 (2022), 101600. https://doi.org/ 10.1016/j.pmcj.2022.101600
[5]
Adria Mallol-Ragolta, Anastasia Semertzidou, Maria Pateraki, and Björn Schuller. 2021. harAGE: A Novel Multimodal Smartwatch-based Dataset for Human Activity Recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). 01-07. https://doi.org/10.1109/FG52635. 2021.9666947
[6]
Adria Mallol-Ragolta, Anastasia Semertzidou, Maria Pateraki, and Björn Schuller. 2022. Outer Product-Based Fusion of Smartwatch Sensor Data for Human Activity Recognition. Frontiers in Computer Science 4 (2022). https://doi.org/10.3389/ fcomp.2022.796866
[7]
Nastaran Mohammadian Rad, Seyed Mostafa Kia, Calogero Zarbo, Twan van Laarhoven, Giuseppe Jurman, Paola Venuti, Elena Marchiori, and Cesare Furlanello. 2018. Deep learning for automatic stereotypical motor movement detection using wearable sensors in autism spectrum disorders. Signal Processing 144 (2018), 180--191. https://doi.org/10.1016/j.sigpro.2017.10.011
[8]
Vishvak S. Murahari and Thomas Plötz. 2018. On Attention Models for Human Activity Recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers (Singapore, Singapore) (ISWC '18). Association for Computing Machinery, New York, NY, USA, 100--103. https://doi.org/10.1145/3267242. 3267287
[9]
Viral Parekh, Pin Sym Foong, Shengdong Zhao, and Ramanathan Subramanian. 2018. AVEID: Automatic Video System for Measuring Engagement In Dementia. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI '18). 409--413. https://doi.org/10.1145/3172944.3173010
[10]
Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, and Stephen Roberts. 2022. The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes. (2022). https://doi.org/10.48550/ARXIV.2205.06799
[11]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Cited By

View all
  • (2024)MARS: A Multiview Contrastive Approach to Human Activity Recognition From Accelerometer SensorIEEE Sensors Letters10.1109/LSENS.2024.33579418:3(1-4)Online publication date: Mar-2024
  • (2024)A Systematic Review of Human Activity Recognition Based on Mobile Devices: Overview, Progress and TrendsIEEE Communications Surveys & Tutorials10.1109/COMST.2024.335759126:2(890-929)Online publication date: Oct-2025
  • (2023)Conformer-Based Human Activity Recognition Using Inertial Measurement UnitsSensors10.3390/s2317735723:17(7357)Online publication date: 23-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cnn
  2. human activity recognition
  3. transformers

Qualifiers

  • Research-article

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MARS: A Multiview Contrastive Approach to Human Activity Recognition From Accelerometer SensorIEEE Sensors Letters10.1109/LSENS.2024.33579418:3(1-4)Online publication date: Mar-2024
  • (2024)A Systematic Review of Human Activity Recognition Based on Mobile Devices: Overview, Progress and TrendsIEEE Communications Surveys & Tutorials10.1109/COMST.2024.335759126:2(890-929)Online publication date: Oct-2025
  • (2023)Conformer-Based Human Activity Recognition Using Inertial Measurement UnitsSensors10.3390/s2317735723:17(7357)Online publication date: 23-Aug-2023
  • (2023)BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial ReactionsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612860(9536-9540)Online publication date: 26-Oct-2023
  • (2023)MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group SettingsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612858(9526-9530)Online publication date: 26-Oct-2023
  • (2023)Learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognitionKnowledge-Based Systems10.1016/j.knosys.2023.110789276:COnline publication date: 27-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media