Elsevier

Computer Communications

Volume 197, 1 January 2023, Pages 87-95
Computer Communications

CSI-based location-independent Human Activity Recognition with parallel convolutional networks

https://doi.org/10.1016/j.comcom.2022.10.027Get rights and content

Abstract

Human Activity Recognition (HAR) based on Wi-Fi has a broad application prospect in human–computer interaction. Since Wi-Fi signals are sensitive to the environmental changes, the features of the same category of human activity at different locations have significant difference. The existing HAR systems based on Wi-Fi need to re-collect samples or retrain models when recognizing the same activity at new locations, which reduces their practicability in human–computer interaction. To address this challenge, this paper proposes a CSI-based Parallel Convolutional Networks-based location-independent HAR system (CSI-PCNH). CSI-PCNH enhances the inter-class difference by extracting the inter-class features of the different activity samples. In addition, CSI-PCNH improves the generalization ability of activity recognition at any location by extracting the intra-class features of the same category of activity at different locations. In order to obtain the inter-class features and intra-class features of activity samples, we design a parallel convolutional network model which is composed of 3DCNN combined with Channel Attention Mechanism (CAM) and 2DCNN with LSTM to extract the global and local spatial–temporal features of the activity samples. The experimental results show that in the 8 m × 7 m indoor area, the proposed HAR system trained by the activity samples at 12 known locations, the average recognition accuracy for 6 categories of activities at any other 10 locations can reach 91.7%.

Introduction

Recently, the application of Human Activity Recognition(HAR) in the fields of smart home, safety monitoring and health monitoring has attracted more and more attention [1]. The methods of HAR mainly include wearable-sensor based [2], [3], [4] and computer-vision based [5], [6]. In the application process of HAR, some of the sensor-based methods require the user to wear the device all the time, which will cause inconvenience to the user. Some methods based on computer vision have the requirement of Line-of-Sight and lighting, and easily invade user’s privacy [7]. Compared with the above methods, HAR based on commercial Wi-Fi has attracted widespread attention due to its advantages of not invading user’s privacy and not being limited by devices [8].

HAR based on Wi-Fi can be realized by Receiving Signal Strength Information (RSSI) and Channel State Information (CSI). Compared with RSSI, CSI has more fine-grained information and can distinguish multipath components. Human behavior perception technologies based on CSI, such as daily behavior recognition  [9], [10], [11], [12], [13], [14] , fall detection [15], [16], breath detection [17], [18], sleep status monitoring [19] and user identification [20], [21] have been deeply studied.

Although the above methods have achieved impressive results, it is difficult to obtain the same performance for activity samples collected at non-training locations and face the challenge of location generalization. This is because the original CSI data contains not only information about activity, but also information from the environment. The change of location means the change of the perceptual environment, which affects the reflection, diffraction and refraction effects of Wi-Fi signal transmission and leads to different data distribution for CSI data of the same category of activity collected at different locations [22]. An obvious solution is to collect activity samples from as many locations as possible to learn the features of activities at different locations. However, in practical applications, it takes time and energy to collect activity samples of as many locations as possible, which affects the user experience. Some researchers tried to use the metric learning [23], virtual sample generation [24], [25] and environment-independent feature extraction [22], [26], [27], [28] methods to solve the urgent need and challenge of location-independent HAR based on Wi-Fi.

Though the above methods show significant results in improving the location generalization ability by reducing the influence of environmental factors and extracting the intra-class features of the samples of the same activity, they do not fully utilize the inter-class and intra-class features. By observing a large number of activity samples, we find that different human activities also have similar feature segments, as shown in Fig. 1, the segments A and B are the similar segments existing in the CSI signals of “clap” and “chest enlargement” at the same location. Though there are similar activity segments in the execution process of these two activities, the temporal sequence corresponding to these segments is different. And they can be used as an important inter-class feature of activity samples after combining the time information. In general, the spatial features can express the change of amplitudes of the signal, and the temporal features can reflect the time sequence of the signal. If such important inter-class information contained in the temporal and spatial features of the different activity samples is not considered, the accuracy of activity recognition will be decreased. Therefore, in order to better extract the inter-class features of the different activity samples to enhance the inter-class difference, time information needs to be fully obtained.

Meanwhile, there are also some similar segments in the samples of the same activity at different locations. As shown in Fig. 2, the segment C is the similar segment in the CSI signals of “wave” at two different locations. The generalization ability of location-independent HAR can be improved by extracting the spatial–temporal features of similar parts in the samples of the same activity at different locations to obtain the intra-class features.

Considering the role of inter-class features and intra-class features, as well as the importance of fully obtaining time information for achieving location-independent HAR, and combining with the advantages of existing HAR methods, we propose a location-independent human activity recognition system (CSI-PCNH). The system extracts and combines the global and local spatial–temporal features of the activity samples by a parallel convolutional network model. The global spatial–temporal features are the spatial and temporal features extracted simultaneously by 3D convolution. Using the property of 3D convolution modeling depthwise information can effectively prevent the loss of temporal information of samples and fully preserve the temporal correlation between spatial features. Meanwhile, the global spatial–temporal features can also better reflect the overall change trend of the activity signal, so as to more accurately retain the temporal sequence information corresponding to similar segments of different activity samples. In this way, the important inter-class features of activity samples can be more fully obtained, so as to enhance the inter-class difference of activity samples, which is more conducive to activity recognition. The local spatial–temporal features include the spatial features at each time step of the activity sample data and their corresponding temporal relationship. By extracting more detailed spatial features at each time step, the similar intra-class features can be obtained more efficiently from the similar segments in the samples of the same activity at different locations. The combination of global and local spatial–temporal features will facilitate the recognition of activities at new locations.

In order to reduce the impact of environmental factors and facilitate the extraction of the inter-class features and intra-class features of activity samples, we combine CSI amplitude data with DS data to form activity sample data to enhance the mapping relationship between signals and activities. To obtain the inter-class features and intra-class features of activity samples, we design a parallel convolutional network model. The proposed model uses 3DCNN combined with Channel Attention Mechanism (CAM) to extract the global spatial–temporal features and performs adaptive feature refinement to emphasize the features that play an important role in classification. Meanwhile, the model uses 2DCNN with LSTM to extract the local spatial–temporal features of activity samples. CSI-PCNH system obtains the inter-class features and intra-class features by combining the global and local spatial–temporal features to improve the generalization ability of activity recognition at any location.

In order to verify the effectiveness of CSI-PCNH system, we conduct experiments on a self-made human activity sample dataset. We use the activity samples collected at 12 known locations as the training set, and use the activity samples at any other 10 locations as the test set. The experimental results show that the average recognition accuracy for 6 categories of human activities is 91.7%. The main contributions of this paper are as follows:

  • In this paper, we propose a location-independent activity recognition system based on CSI, which can realize the HAR at any new location without providing activity samples at the new location. We combine the CSI data of 2 receivers and its DS data to form the activity sample data, thus facilitating the extraction of the inter-class and intra-class features of activity samples.

  • We use 3DCNN combined with CAM and 2DCNN with LSTM to design the parallel convolutional network model to obtain the inter-class and intra-class features of activity samples, so as to improve the recognition accuracy of different activities and the generalization ability of activity recognition at any locations.

The rest of this paper is arranged as follows. Section 2 introduces related work on location-independent HAR. Section 3 introduces the basic knowledge of CSI and DS. Section 4 introduces the overview of the system structure and detailed information of the system. The performance of the system is evaluated in Section 5. In Section 6, we summarize this paper and provide the future work.

Section snippets

Related work

To realize the location-independent HAR based on Wi-Fi, some researchers tried to use the metric learning, virtual sample generation and environment-independent feature extraction methods.

Ding et al. [23] proposed WiLiMetaSensing, a HAR system based on CNN-LSTM feature representation and metric learning. WiLiMetaSensing system learns and memorizes the features of activity samples at different locations, realizing the recognition of activities at new locations with a few of activity samples at

CSI

CSI is used as an estimate of channel state in Orthogonal Frequency Division Multiplexing(OFDM). CSI describes the channel characteristics of the communication link between the transmitter and the receiver, and represents the changes such as time delay, amplitude attenuation and phase offset caused by the signal in the transmission process. For each subcarrier of each channel, CSI is shown in Eq. (1) [11] : h=|h|ejϕwhere, |h| and ϕ represent the amplitude and the phase of CSI respectively.

In

System structure

The CSI-PCNH system consists of two phases: offline training and online testing, as shown in Fig. 4. STEM is the Spatial–Temporal feature Extraction Model in the CSI-PCNH system. During the training phase, the training sample data is labeled and then used to train the STEM model. In the test phase, the trained STEM model is used to classify the test data to obtain the activity recognition results.

STEM structure

STEM consists of a global spatial–temporal feature extraction module, a local spatial–temporal

Experimental setup

We conduct a series of experiments on a self-made human activity dataset to comprehensively evaluate the performance of CSI-PCNH system. The computer resources used for the evaluation are Intel Core i7-7700K CPU equipped with 16 GB RAM and NVIDIA GeForce GTX 1080 GPU.

The human activity dataset includes 6 categories of human activities (as shown in Fig. 7) performed in the leisure hall. The floor plan of the leisure hall are shown in Fig. 8. The area of the leisure hall is 8 m × 7 m, and there

Conclusion

In this paper, we design a location-independent human activity recognition system based on CSI, which can realize the HAR at any new location without providing activity samples at the new location. We use the CSI data collected by 2 receivers and its DS data to constitute the activity sample data, and use 3DCNN with CAM and 2DCNN with LSTM to design the parallel convolutional network model to extract the inter-class features and intra-class features from activity samples, in order to improve

CRediT authorship contribution statement

Yong Zhang: Methodology, Funding acquisition, Supervision, Validation, Writing – review & editing. Yuqing Yin: Investigation, Methodology, Data curation, Software, Formal analysis, Writing – original draft. Yujie Wang: Supervision, Validation, Writing – review & editing. Jiaqiu Ai: Investigation, Resources. Dingchao Wu: Investigation, Software.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Yong Zhang reports article publishing charges was provided by Anhui Provincial Department of Science and Technology. Yujie Wang reports administrative support was provided by National Natural Science Foundation of China. Lu Jia reports article publishing charges was provided by Ministry of Education of the People’s Republic of China.

Yong Zhang was born in Anhui, China, in 1973. He received Doctor’s degree in signal and information processing from University of Science and Technology of China in 2007, associate professor. His research interests include intelligent information processing, hand gesture recognition, indoor positioning.

References (32)

  • KhanP. et al.

    Differential channel-state-information-based human activity recognition in IoT networks

    IEEE Internet Things J.

    (2020)
  • YangJ. et al.

    Efficientfi: Towards large-scale lightweight wifi sensing via csi compression

    IEEE Internet Things J.

    (2022)
  • H. Lee, C.R. Ahn, N. Choi, Exploiting multiple receivers for csi-based activity classification using a hybrid CNN-LSTM...
  • ShengB. et al.

    Deep spatial–temporal model based cross-scene action recognition using commodity WiFi

    IEEE Internet Things J.

    (2020)
  • WangY. et al.

    Wifall: Device-free fall detection by wireless networks

    IEEE Trans. Mob. Comput.

    (2016)
  • WangH. et al.

    RT-fall: A real-time and contactless fall detection system with commodity WiFi devices

    IEEE Trans. Mob. Comput.

    (2016)
  • Cited by (4)

    Yong Zhang was born in Anhui, China, in 1973. He received Doctor’s degree in signal and information processing from University of Science and Technology of China in 2007, associate professor. His research interests include intelligent information processing, hand gesture recognition, indoor positioning.

    Yuqing Yin was born in Heilongjiang, China, in 1998. She received bachelor’s degree in electronic information science and technology from Hefei University of Technology in 2020. Currently, she is studying for a master’s degree in electronic information at Hefei University of Technology. Her research interests include human activity recognition and indoor positioning.

    Yujie Wang was born in Shandong, China, in 1980. She received the Ph.D. degree in circuits and systems from University of Science and Technology of China in 2011. She is currently an associate professor of Hefei University of Technology. Her research interests include intelligent information processing, indoor positioning, and audio signal processing.

    Jiaqiu Ai (M’17) received his BS degree in Electronics & Information from the Beijing Information Science and Technology University in 2007, and his Ph.D. degree in Information & Communication from University of Chinese Academy of Sciences in 2012. He is currently a professor at Hefei University of Technology (HFUT), Hefei, China. He is the author of more than 50 journal papers. He is the editorial member of Journal of Remote Sensing, Computer Engineering, and Current Chinese Sciences. He is one of the Hefei Leading Talents, and a Young Academic Talent of HFUT. He is the best reviewer for Journal of Remote Sensing, and Journal of Radar. His current research interests include SAR image processing, radar target detection and radar system design.

    Dingchao Wu was born in Hunan, China, in 1999. He received is B.S. from Hefei University of Technology in 2021. He is currently pursuing the M.Sc. degree in Hefei University of Technology. His research interests include activity recognition.

    This work was supported by Anhui Provincial Natural Science Foundation, China [No. 2008085MF214], the National Natural Science Foundation of China under Grant [No. 61801162] and the Fundamental Research Funds for the Central Universities, China [No. JZ2021HGTB0080].

    View full text