Abstract
Internet of Things (IoT) brings opportunities for wireless sensing and device-free action recognition becomes a hot topic for recognizing human activities. Existing works are trying to fuse WiFi and traditional vision modality in a straightforward way for performance improvement. To overcome the problems such as privacy invasion and computational burden, we design an end-to-end cross-modal learning architecture termed teacher-student network (TS-Net) for device-free action recognition. Different from previous methods with both modalities used for the entire process, our model only use WiFi features without any video information involved during the testing phase. More specifically, we construct a cross-modal supervision scheme in which the visual knowledge and robustness capacity of teacher videos can be transferred into the synchronously collected student wireless signals. The experiments show that our TS-Net can efficiently identify human actions at multi-location without environmental constrains of indoor illumination and occlusion.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jiang, W., et al.: Towards 3D human pose construction using WiFi. In: Proceedings of Mobile Computing and Networking, pp. 1–14 (2020)
Wang, W., Liu, A.X., Shahzad, M., Ling, K., Lu, S.: Understanding and modeling of WiFi signal based human activity recognition. In: Proceedings of Mobile Computing and Networking, pp. 65–76. ACM (2015)
Yousefi, S., Narui, H., Dayal, S., Ermon, S., Valaee, S.: A survey on behavior recognition using WiFi channel state information. IEEE Commun. Mag. 55(10), 98–104 (2017)
Yang, Z., Zhou, Z., Liu, Y.: From RSSI to CSI: indoor localization via channel response. ACM Comput. Surv. (CSUR) 46(2), 1–32 (2013)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop, pp. 1–9 (2015)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Wang, F., Song, Y., Zhang, J., Han, J., Huang, D.: Temporal Unet: sample level human action recognition using WiFi. arXiv preprint arXiv:1904.11953 (2019)
Zheng, Y., et al.: Zero-effort cross-domain gesture recognition with Wi-Fi. In: Proceedings of the International Conference on Mobile Systems, Applications, and Services, pp. 313–325 (2019)
Li, C., Liu, M., Cao, Z.: WiHF: enable user identified gesture recognition with WiFi. In: IEEE Conference on Computer Communications (INFOCOM), pp. 586–595 (2020)
Yao, S., et al.: STFNets: learning sensing signals from the time-frequency perspective with short-time Fourier neural networks. In: The World Wide Web Conference, pp. 2192–2202 (2019)
Zou, H., Yang, J., Das, H.P., Liu, H., Zhou, Y., Spanos, C.J.: WiFi and vision multimodal learning for accurate and robust device-free human activity recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 426–433 (2019)
Zhao, M., et al.: Through-wall human pose estimation using radio signals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7356–7365 (2018)
Korany, B., Karanam, C.R., Cai, H., Mostofi, Y.: XMODAL-ID: using WiFi for through-wall person identification from candidate video footage. In: The 25th Annual International Conference on Mobile Computing and Networking, pp. 1–15 (2019)
Wang, F., Zhou, S., Panev, S., Han, J., Huang, D.: Person-in-WiFi: fine-grained person perception using WiFi. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5452–5461 (2019)
Xue, H., et al.: DeepFusion: a deep learning framework for the fusion of heterogeneous sensory data. In: Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 151–160 (2019)
Xue, H., et al.: DeepMV: multi-view deep learning for device-free human activity recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4(1), 1–26 (2020)
Chen, Z., Zhang, L., Jiang, C., Cao, Z., Cui, W.: WiFi CSI based passive human activity recognition using attention based BLSTM. IEEE Trans. Mob. Comput. 18(11), 2714–2724 (2018)
Halperin, D., Hu, W., Sheth, A., Wetherall, D.: Tool release: gathering 802.11 n traces with channel state information. ACM SIGCOMM Comput. Commun. Rev. 41(1), 53 (2011)
Acknowledgement
This work is supported by National Natural Science Foundation of China under Grant No. 61932013, 61803212, 61972201, 61972210, Natural Science Foundation of Jiangsu Province under Grant No. BK20180744, BK20190068, China Postdoctoral Science Foundation under Grant No. 2019M651920, 2020T130315, and NUPTSF Grant No. NY218117.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sheng, B., Gui, L., Xiao, F. (2021). TS-Net: Device-Free Action Recognition with Cross-Modal Learning. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-85928-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85927-5
Online ISBN: 978-3-030-85928-2
eBook Packages: Computer ScienceComputer Science (R0)