research-article

Multimodal Attribute and Feature Embedding for Activity Recognition

Authors:

Jitao SangAuthors Info & Claims

MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia

Article No.: 44, Pages 1 - 7

https://doi.org/10.1145/3338533.3366592

Published: 10 January 2020 Publication History

Abstract

Human Activity Recognition (HAR) automatically recognizes human activities such as daily life and work based on digital records, which is of great significance to medical and health fields. Egocentric video and human acceleration data comprehensively describe human activity patterns from different aspects, which have laid a foundation for activity recognition based on multimodal behavior data. However, on the one hand, the low-level multimodal signal structures differ greatly and the mapping to high-level activities is complicated. On the other hand, the activity labeling based on multimodal behavior data has high cost and limited data amount, which limits the technical development in this field. In this paper, an activity recognition model MAFE based on multimodal attribute feature embedding is proposed. Before the activity recognition, the middle-level attribute features are extracted from the low-level signals of different modes. On the one hand, the mapping complexity from the low-level signals to the high-level activities is reduced, and on the other hand, a large number of middle-level attribute labeling data can be used to reduce the dependency on the activity labeling data. We conducted experiments on Stanford-ECM datasets to verify the effectiveness of the proposed MAFE method.

References

[1]

Sven Bambach. 2015. A survey on recent advances of computer vision algorithms for egocentric video. arXiv preprint arXiv:1501.02825 (2015).

[2]

Ling Bao and Stephen S Intille. 2004. Activity recognition from user-annotated acceleration data. In International conference on pervasive computing. Springer, 1--17.

[3]

Billur Barshan and Murat Cihan Yüksek. 2014. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Comput. J. 57, 11 (2014), 1649--1667.

[4]

Alejandro Betancourt, Pietro Morerio, Carlo S Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Transactions on Circuits and Systems for Video Technology 25, 5 (2015), 744--760.

Digital Library

[5]

Cagatay Catal, Selin Tufekci, Elif Pirmit, and Guner Kocabag. 2015. On the use of ensemble of classifiers for accelerometer-based activity recognition. Applied Soft Computing 37 (2015), 1018--1022.

Digital Library

[6]

Yuqing Chen and Yang Xue. 2015. A deep learning approach to human activity recognition based on single accelerometer. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 1488--1492.

Digital Library

[7]

Alexander Diete, Timo Sztyler, and Heiner Stuckenschmidt. 2019. Vision and acceleration modalities: Partners for recognizing complex activities. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 101--106.

[8]

Alireza Fathi, Ali Farhadi, and James M Rehg. 2011. Understanding egocentric activities. In 2011 International Conference on Computer Vision. IEEE, 407--414.

Digital Library

[9]

Alircza Fathi, Jessica K Hodgins, and James M Rehg. 2012. Social interactions: A first-person perspective. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1226--1233.

[10]

Alireza Fathi, Yin Li, and James M Rehg. 2012. Learning to recognize daily actions using gaze. In European Conference on Computer Vision. Springer, 314--327.

Digital Library

[11]

Alireza Fathi and James M Rehg. 2013. Modeling actions through state changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2579--2586.

Digital Library

[12]

Alireza Fathi, Xiaofeng Ren, and James M Rehg. 2011. Learning to recognize objects in egocentric activities. In CVPR 2011. IEEE, 3281--3288.

Digital Library

[13]

Nils Y Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016).

Digital Library

[14]

Albert Haque, Alexandre Alahi, and Li Fei-Fei. 2016. Recurrent attention models for depth-based person identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1238.

[15]

Takeo Kanade and Martial Hebert. 2012. First-person vision. Proc. IEEE 100, 8 (2012), 2442--2453.

[16]

Kris M Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. 2011. Fast unsupervised ego-action learning for first-person sports videos. In CVPR 2011. IEEE, 3241--3248.

Digital Library

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[18]

Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353.

[19]

Jun Long, WuQing Sun, Zhan Yang, Osolo Ian Raymond, and Bin Li. 2019. Dual Residual Network for Accurate Human Activity Recognition. arXiv preprint arXiv:1903.05359 (2019).

[20]

Minghuang Ma, Haoqi Fan, and Kris M Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1894--1903.

[21]

Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. 2017. Jointly learning energy expenditures and activities using egocentric multimodal signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1868--1877.

[22]

Motofumi Nakanishi, Shintaro Izumi, Sho Nagayoshi, Hironori Sato, Hiroshi Kawaguchi, Masahiko Yoshimoto, Takafumi Ando, Satoshi Nakae, Chiyoko Usui, Tomoko Aoyama, et al. 2015. Physical activity group classification algorithm using triaxial acceleration and heart rate. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 510--513.

[23]

Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. 2016. Recognition of activities of daily living with egocentric vision: A review. Sensors 16, 1 (2016), 72.

[24]

Katsunori Ohnishi, Atsushi Kanehira, Asako Kanezaki, and Tatsuya Harada. 2016. Recognizing activities of daily living with a wrist-mounted camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3103--3111.

[25]

Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2847--2854.

[26]

Yair Poleg, Chetan Arora, and Shmuel Peleg. 2014. Temporal segmentation of egocentric videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2537--2544.

Digital Library

[27]

Michael S Ryoo and Larry Matthies. 2013. First-person activity recognition: What are they doing to me?. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2730--2737.

Digital Library

[28]

Michael S Ryoo, Brandon Rothrock, and Larry Matthies. 2015. Pooled motion features for first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 896--904.

[29]

Suriya Singh, Chetan Arora, and CV Jawahar. 2016. First person action recognition using deep learned descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2620--2628.

[30]

Ekaterina H Spriggs, Fernando De La Torre, and Martial Hebert. 2009. Temporal segmentation and activity classification from first-person sensing. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 17--24.

[31]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[32]

Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.

Digital Library

[33]

Ryo Yonetani, Kris M Kitani, and Yoichi Sato. 2016. Recognizing micro-actions and reactions from paired egocentric videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2629--2638.

Cited By

Peng BHe LQiu YDong WChi MMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Image-Signal Correlation Network for Textile Fiber IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548310(3848-3856)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548310
Ma XYang XGao JXu C(2021)Health Status Prediction with Local-Global Heterogeneous Behavior GraphACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345789317:4(1-21)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3457893
Li BShu XYan RChua TWang JTian QGurrin CJia JZhang HSun Q(2021)Storyboard relational model for group activity recognitionProceedings of the 2nd ACM International Conference on Multimedia in Asia10.1145/3444685.3446255(1-7)Online publication date: 3-May-2021
https://doi.org/10.1145/3444685.3446255
Show More Cited By

Index Terms

Multimodal Attribute and Feature Embedding for Activity Recognition
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Viewpoint Integration for Hand-Based Recognition of Social Interactions from a First-Person View
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Wearable devices are becoming part of everyday life, from first-person cameras (GoPro, Google Glass), to smart watches (Apple Watch), to activity trackers (FitBit). These devices are often equipped with advanced sensors that gather data about the wearer ...
Activity recognition with hand-worn magnetic sensors

Activity recognition is a key technology for realizing ambient assisted living applications such as care of the elderly and home automation. This paper proposes a new activity recognition method that employs hand-worn magnetic sensors to recognize a ...
Preliminary Investigation of Object-based Activity Recognition Using Egocentric Video Based on Web Knowledge
MUM '18: Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia

This study shows a preliminary investigation of daily activity recognition based on a wearable camera without using training data prepared by a user in her environment. Recently, deep learning frameworks have been publicly available, and we can now ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia

December 2019

403 pages

ISBN:9781450368414

DOI:10.1145/3338533

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the National Natural Science Foundation of China
the National Key R&D Program of China
the Beijing Municipal Science & Technology Commission

Conference

MMAsia '19

Sponsor:

SIGMM

MMAsia '19: ACM Multimedia Asia

December 15 - 18, 2019

Beijing, China

Acceptance Rates

MMAsia '19 Paper Acceptance Rate 59 of 204 submissions, 29%;

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Peng BHe LQiu YDong WChi MMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Image-Signal Correlation Network for Textile Fiber IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548310(3848-3856)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548310
Ma XYang XGao JXu C(2021)Health Status Prediction with Local-Global Heterogeneous Behavior GraphACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345789317:4(1-21)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3457893
Li BShu XYan RChua TWang JTian QGurrin CJia JZhang HSun Q(2021)Storyboard relational model for group activity recognitionProceedings of the 2nd ACM International Conference on Multimedia in Asia10.1145/3444685.3446255(1-7)Online publication date: 3-May-2021
https://doi.org/10.1145/3444685.3446255
Yu HJia WZhang LPan MLiu YSun M(2020)A hierarchical parallel fusion framework for egocentric ADL recognition based on discernment frame partitioning and belief coarseningJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02241-212:2(1693-1715)Online publication date: 9-Jul-2020
https://doi.org/10.1007/s12652-020-02241-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten