skip to main content
10.1145/3394171.3413801acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Person Action Recognition in Microwave Sensors

Published: 12 October 2020 Publication History

Abstract

The usage of surveillance cameras for video understanding, raises concerns about privacy intrusion recently. This motivates the research community to seek potential alternatives of cameras for emerging multimedia applications. Stepping to this goal, a few researchers have explored the usage of Wi-Fi or Bluetooth sensors to handle action recognition. However, the practical ability of these sensors is limited by their frequency band and deployment inconvenience because of the separate transmitter/receiver architecture. Motivated by the same purpose of reducing privacy issues, we introduce a latest microwave sensor for multi-person action recognition in this paper. The microwave sensor works at 77GHz ~ 80GHz band, and is implemented with both transmitter and receiver inside itself, thus can be easily deployed for action recognition. Although with its advantages, two main challenging issues still remain. One is the difficulty of labelling the invisible signal data with embedding actions. The other is the difficulty of cancelling the environment noise for high-accurate action recognition. To address the challenges, we propose a novel learning framework by designed original loss functions with the considerations on weakly-supervised multi-label learning and attention mechanism to improve the accuracy for action recognition. We build a new microwave sensor data set, and conduct comprehensive experiments to evaluate the recognition accuracy of our proposed framework, and the effectiveness of parameters in each component. The experiment results show that our framework outperforms the state-of-the-art methods up to 14% in terms of mAP.

Supplementary Material

MP4 File (3394171.3413801.mp4)
The usage of surveillance cameras raises concerns about privacy intrusion recently. This motivates researchers to seek potential alternatives of cameras for emerging multimedia applications. Stepping to this goal, we introduce a latest microwave sensor for multi-person action recognition in this paper. The sensor is implemented with both transmitter and receiver inside, thus easy to deploy. However, two main challenging issues still remain for applications. One is difficult to label the invisible signal data with embedding actions. The other is difficult of cancel the environment noise for high-accurate action recognition. To address the challenges, we propose a novel learning framework by designed original loss functions with the considerations on weakly-supervised multi-label learning and attention mechanism to improve the accuracy. We conduct comprehensive experiments to evaluate the recognition accuracy. The experiment results show that our framework outperforms the state-of-the-arts up to 14% of mAP.

References

[1]
Heba Abdelnasser, Moustafa Youssef, and Khaled A Harras. 2015. Wigest: A ubiquitous wifi-based gesture recognition system. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 1472--1480.
[2]
Ricardo Cabral, Fernando De la Torre, Joao Paulo Costeira, and Alexandre Bernardino. 2014. Matrix completion for weakly-supervised multi-label image classification. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 1 (2014), 121--135.
[3]
Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[4]
Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In 2015 IEEE International conference on image processing (ICIP). IEEE, 168--172.
[5]
Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019 b. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522--531.
[6]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 403--412.
[7]
Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019 a. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177--5186.
[8]
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the iEEE conference on computer vision and pattern recognition. 1335--1344.
[9]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 1. IEEE, 539--546.
[10]
RPW Christoph and Feichtenhofer Axel Pinz. 2016. Spatiotemporal residual networks for video action recognition. Advances in Neural Information Processing Systems (2016), 3468--3476.
[11]
Ramazan Gokberk Cinbis, Jakob Verbeek, and Cordelia Schmid. 2016. Weakly supervised object localization with multi-fold multiple instance learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 1 (2016), 189--203.
[12]
Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, and Masashi Sugiyama. 2020. Classification from Triplet Comparison Data. Neural Computation, Vol. 32, 3 (2020), 659--681.
[13]
Navneet Dalal, Bill Triggs, and Cordelia Schmid. 2006. Human detection using oriented histograms of flow and appearance. In European conference on computer vision. Springer, 428--441.
[14]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.
[15]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE International Conference on Computer Vision. 6202--6211.
[16]
Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1933--1941.
[17]
Rohit Girdhar and Deva Ramanan. 2017. Attentional pooling for action recognition. In Advances in Neural Information Processing Systems. 34--45.
[18]
Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, and Song Wang. 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729--739.
[19]
Linlin Guo, Lei Wang, Jialin Liu, Wei Zhou, Bingxian Lu Tao Liu, Guangxu Li, and Chen Li. 2017. A novel benchmark on human activity recognition using WiFi signals. In 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, 1--6.
[20]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR. 6546--6555.
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[22]
Judy Hoffman, Deepak Pathak, Trevor Darrell, and Kate Saenko. 2015. Detector discovery in the wild: Joint multiple instance and representation learning. In Proceedings of the ieee conference on computer vision and pattern recognition. 2883--2891.
[23]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132--7141.
[24]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700--4708.
[25]
Tatsuya Ishihara, Kris M Kitani, Chieko Asakawa, and Michitaka Hirose. 2018. Deep Radio-Visual Localization. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 596--605.
[26]
Matthew Keally, Gang Zhou, Guoliang Xing, Jianxin Wu, and Andrew Pyles. 2011. Pbn: towards practical activity recognition using smartphone-based body sensor networks. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems. 246--259.
[27]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS. 1097--1105.
[28]
Shiu Kumar and Seong Ro Lee. 2014. Android based smart home system with control via Bluetooth and internet connectivity. In The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014). IEEE, 1--2.
[29]
Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[30]
Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2018. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1576--1585.
[31]
Junnan Li, Jianquan Liu, Wong Yongkang, Shoji Nishimura, and Mohan Kankanhalli. 2020. Weakly-Supervised Multi-Person Action Recognition in 360$^circ$ Videos. In The IEEE Winter Conference on Applications of Computer Vision (WACV) .
[32]
Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, and Dina Katabi. 2019. Making the invisible visible: Action recognition through walls and occlusions. In Proceedings of the IEEE International Conference on Computer Vision. 872--881.
[33]
Yuncheng Li, Yale Song, and Jiebo Luo. 2017. Improving pairwise ranking for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3617--3625.
[34]
Yishu Liu and Chao Huang. 2017. Scene classification via triplet networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 11, 1 (2017), 220--237.
[35]
George Papandreou, Liang-Chieh Chen, Kevin P Murphy, and Alan L Yuille. 2015. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the IEEE international conference on computer vision. 1742--1750.
[36]
Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1713--1721.
[37]
Daniel A Pollard, Alan M Moses, Venky N Iyer, and Michael B Eisen. 2006. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC bioinformatics, Vol. 7, 1 (2006), 376.
[38]
Qifan Pu, Sidhant Gupta, Shyamnath Gollakota, and Shwetak Patel. 2013. Whole-home gesture recognition using wireless signals. In Proceedings of the 19th annual international conference on Mobile computing & networking. 27--38.
[39]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.
[40]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[41]
Weiwei Shi, Yihong Gong, Xiaoyu Tao, and Nanning Zheng. 2017. Training DCNN by combining max-margin, max-correlation objectives, and correntropy loss for multilabel image classification. IEEE transactions on neural networks and learning systems, Vol. 29, 7 (2017), 2896--2908.
[42]
Karen Simonyan and Andrew Zisserman. 2015. Two-stream convolutional networks for action recognition. In Proceedings of the Neural Information Processing Systems (NIPS) .
[43]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.
[44]
Vayyar.com. 2019. How It Works. https://vayyar.com/technology
[45]
Fei Wang, Jianwei Feng, Yinliang Zhao, Xiaobin Zhang, Shiyuan Zhang, and Jinsong Han. 2019. Joint activity recognition and indoor localization with WiFi fingerprints. IEEE Access, Vol. 7 (2019), 80058--80068.
[46]
Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision. 3551--3558.
[47]
Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016b. Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2285--2294.
[48]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016a. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20--36.
[49]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.
[50]
Yan Wang, Jian Liu, Yingying Chen, Marco Gruteser, Jie Yang, and Hongbo Liu. 2014. E-eyes: device-free location-oriented activity identification using fine-grained wifi signatures. In Proceedings of the 20th annual international conference on Mobile computing and networking. 617--628.
[51]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492--1500.
[52]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE international conference on computer vision. 4733--4742.
[53]
Tal Z Zarsky. 2016. Incompatible: the GDPR in the age of big data. Seton Hall L. Rev., Vol. 47 (2016), 995.
[54]
Dingwen Zhang, Deyu Meng, and Junwei Han. 2016. Co-saliency detection via a self-paced multiple-instance learning framework. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 5 (2016), 865--878.
[55]
Dingwen Zhang, Deyu Meng, Chao Li, Lu Jiang, Qian Zhao, and Junwei Han. 2015. A self-paced multiple-instance learning framework for co-saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 594--602.
[56]
Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, and Shih-Fu Chang. 2017. Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise r-fcn. In Proceedings of the IEEE International Conference on Computer Vision. 4233--4241.
[57]
Rui-Wei Zhao, Jianguo Li, Yurong Chen, Jia-Ming Liu, Yu-Gang Jiang, and Xiangyang Xue. 2016. Regional Gating Neural Networks for Multi-label Image Classification. In BMVC. 1--12.

Cited By

View all
  • (2024)XRF55Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435438:1(1-34)Online publication date: 6-Mar-2024
  • (2022)Human Detection and Biometric Authentication with Ambient SensorsBiomedical Sensing and Analysis10.1007/978-3-030-99383-2_2(55-98)Online publication date: 20-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention reinforcement
  2. microwave sensor
  3. multi-label pair-wise ranking
  4. multi-person action recognition

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)XRF55Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435438:1(1-34)Online publication date: 6-Mar-2024
  • (2022)Human Detection and Biometric Authentication with Ambient SensorsBiomedical Sensing and Analysis10.1007/978-3-030-99383-2_2(55-98)Online publication date: 20-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media