skip to main content
10.1145/3338533.3366592acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Attribute and Feature Embedding for Activity Recognition

Published: 10 January 2020 Publication History

Abstract

Human Activity Recognition (HAR) automatically recognizes human activities such as daily life and work based on digital records, which is of great significance to medical and health fields. Egocentric video and human acceleration data comprehensively describe human activity patterns from different aspects, which have laid a foundation for activity recognition based on multimodal behavior data. However, on the one hand, the low-level multimodal signal structures differ greatly and the mapping to high-level activities is complicated. On the other hand, the activity labeling based on multimodal behavior data has high cost and limited data amount, which limits the technical development in this field. In this paper, an activity recognition model MAFE based on multimodal attribute feature embedding is proposed. Before the activity recognition, the middle-level attribute features are extracted from the low-level signals of different modes. On the one hand, the mapping complexity from the low-level signals to the high-level activities is reduced, and on the other hand, a large number of middle-level attribute labeling data can be used to reduce the dependency on the activity labeling data. We conducted experiments on Stanford-ECM datasets to verify the effectiveness of the proposed MAFE method.

References

[1]
Sven Bambach. 2015. A survey on recent advances of computer vision algorithms for egocentric video. arXiv preprint arXiv:1501.02825 (2015).
[2]
Ling Bao and Stephen S Intille. 2004. Activity recognition from user-annotated acceleration data. In International conference on pervasive computing. Springer, 1--17.
[3]
Billur Barshan and Murat Cihan Yüksek. 2014. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Comput. J. 57, 11 (2014), 1649--1667.
[4]
Alejandro Betancourt, Pietro Morerio, Carlo S Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Transactions on Circuits and Systems for Video Technology 25, 5 (2015), 744--760.
[5]
Cagatay Catal, Selin Tufekci, Elif Pirmit, and Guner Kocabag. 2015. On the use of ensemble of classifiers for accelerometer-based activity recognition. Applied Soft Computing 37 (2015), 1018--1022.
[6]
Yuqing Chen and Yang Xue. 2015. A deep learning approach to human activity recognition based on single accelerometer. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 1488--1492.
[7]
Alexander Diete, Timo Sztyler, and Heiner Stuckenschmidt. 2019. Vision and acceleration modalities: Partners for recognizing complex activities. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 101--106.
[8]
Alireza Fathi, Ali Farhadi, and James M Rehg. 2011. Understanding egocentric activities. In 2011 International Conference on Computer Vision. IEEE, 407--414.
[9]
Alircza Fathi, Jessica K Hodgins, and James M Rehg. 2012. Social interactions: A first-person perspective. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1226--1233.
[10]
Alireza Fathi, Yin Li, and James M Rehg. 2012. Learning to recognize daily actions using gaze. In European Conference on Computer Vision. Springer, 314--327.
[11]
Alireza Fathi and James M Rehg. 2013. Modeling actions through state changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2579--2586.
[12]
Alireza Fathi, Xiaofeng Ren, and James M Rehg. 2011. Learning to recognize objects in egocentric activities. In CVPR 2011. IEEE, 3281--3288.
[13]
Nils Y Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016).
[14]
Albert Haque, Alexandre Alahi, and Li Fei-Fei. 2016. Recurrent attention models for depth-based person identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1238.
[15]
Takeo Kanade and Martial Hebert. 2012. First-person vision. Proc. IEEE 100, 8 (2012), 2442--2453.
[16]
Kris M Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. 2011. Fast unsupervised ego-action learning for first-person sports videos. In CVPR 2011. IEEE, 3241--3248.
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[18]
Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353.
[19]
Jun Long, WuQing Sun, Zhan Yang, Osolo Ian Raymond, and Bin Li. 2019. Dual Residual Network for Accurate Human Activity Recognition. arXiv preprint arXiv:1903.05359 (2019).
[20]
Minghuang Ma, Haoqi Fan, and Kris M Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1894--1903.
[21]
Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. 2017. Jointly learning energy expenditures and activities using egocentric multimodal signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1868--1877.
[22]
Motofumi Nakanishi, Shintaro Izumi, Sho Nagayoshi, Hironori Sato, Hiroshi Kawaguchi, Masahiko Yoshimoto, Takafumi Ando, Satoshi Nakae, Chiyoko Usui, Tomoko Aoyama, et al. 2015. Physical activity group classification algorithm using triaxial acceleration and heart rate. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 510--513.
[23]
Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. 2016. Recognition of activities of daily living with egocentric vision: A review. Sensors 16, 1 (2016), 72.
[24]
Katsunori Ohnishi, Atsushi Kanehira, Asako Kanezaki, and Tatsuya Harada. 2016. Recognizing activities of daily living with a wrist-mounted camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3103--3111.
[25]
Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2847--2854.
[26]
Yair Poleg, Chetan Arora, and Shmuel Peleg. 2014. Temporal segmentation of egocentric videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2537--2544.
[27]
Michael S Ryoo and Larry Matthies. 2013. First-person activity recognition: What are they doing to me?. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2730--2737.
[28]
Michael S Ryoo, Brandon Rothrock, and Larry Matthies. 2015. Pooled motion features for first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 896--904.
[29]
Suriya Singh, Chetan Arora, and CV Jawahar. 2016. First person action recognition using deep learned descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2620--2628.
[30]
Ekaterina H Spriggs, Fernando De La Torre, and Martial Hebert. 2009. Temporal segmentation and activity classification from first-person sensing. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 17--24.
[31]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.
[32]
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.
[33]
Ryo Yonetani, Kris M Kitani, and Yoichi Sato. 2016. Recognizing micro-actions and reactions from paired egocentric videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2629--2638.

Cited By

View all
  • (2022)Image-Signal Correlation Network for Textile Fiber IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548310(3848-3856)Online publication date: 10-Oct-2022
  • (2021)Health Status Prediction with Local-Global Heterogeneous Behavior GraphACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345789317:4(1-21)Online publication date: 12-Nov-2021
  • (2021)Storyboard relational model for group activity recognitionProceedings of the 2nd ACM International Conference on Multimedia in Asia10.1145/3444685.3446255(1-7)Online publication date: 3-May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
December 2019
403 pages
ISBN:9781450368414
DOI:10.1145/3338533
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acceleration and heart rate
  2. activity recognition
  3. attribute feature embedding
  4. egocentric video
  5. multimodal
  6. neural networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the National Natural Science Foundation of China
  • the National Key R&D Program of China
  • the Beijing Municipal Science & Technology Commission

Conference

MMAsia '19
Sponsor:
MMAsia '19: ACM Multimedia Asia
December 15 - 18, 2019
Beijing, China

Acceptance Rates

MMAsia '19 Paper Acceptance Rate 59 of 204 submissions, 29%;
Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Image-Signal Correlation Network for Textile Fiber IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548310(3848-3856)Online publication date: 10-Oct-2022
  • (2021)Health Status Prediction with Local-Global Heterogeneous Behavior GraphACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345789317:4(1-21)Online publication date: 12-Nov-2021
  • (2021)Storyboard relational model for group activity recognitionProceedings of the 2nd ACM International Conference on Multimedia in Asia10.1145/3444685.3446255(1-7)Online publication date: 3-May-2021
  • (2020)A hierarchical parallel fusion framework for egocentric ADL recognition based on discernment frame partitioning and belief coarseningJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02241-212:2(1693-1715)Online publication date: 9-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media