1 Introduction

The rapidly growing demand for location-based services (LBS) has stimulated a lot of attention in accurate indoor localization over the years. Due to the widespread deployment of Wi-Fi devices, the Device-free Passive Wireless Localization (DFPWL) technology has become a research hotspot for indoor localization in recent years. It can locate the target that neither carries any equipment nor participates in the locating course by the influence of the target on the surrounding wireless signals (Zhou et al. 2015; Savazzi et al. 2016). There are many researchers that use the easily acquired Received Signal Strength Indicator (RSSI) to achieve device-free indoor localization (Youssef et al. 2007; Lee and Moon 2018; Ciabattoni et al. 2019; Fu et al. 2019). However, due to its own defects, the RSSI is seriously affected by multipath effects in indoor environments. The deployment cost, detection granularity and localization accuracy of the system cannot reach a satisfactory level.

In recent years, researchers have been able to obtain Channel State Information (CSI) by modifying commercial network card drivers (Halperin et al. 2011). CSI is a kind of fine-grained signal characteristic information, including amplitude and phase information of subcarriers. Each CSI value represents the amplitude and phase of a subcarrier in an OFDM (Orthogonal Frequency Division Multiplexing) system. Compared with the single-valued RSSI, the CSI includes richer and finer-grained environmental information, which can effectively describe the multipath propagation of indoor wireless signals and can sensitively perceive multipath signal changes. Thus, the CSI is more suitable for accurate localization than the RSSI (Wu et al. 2012b; Abdel-Nasser et al. 2013).

There have been many works that utilize CSI to implement localization in a device-bound (Wu et al. 2012a, b; Hosen et al. 2015; Wang et al. 2017a, b; Chen et al. 2017; Han et al. 2018) or device-free (Abdel-Nasser et al. 2013; Xiao et al. 2013; Sabek and Youssef 2013; Li et al. 2016; Qian et al. 2016; Gao et al. 2017; Zhou et al. 2017, 2018; Shi et al. 2018; Wang et al. 2018) manner. They mainly exploited signal propagation models (Wu et al. 2012c; Hosen et al. 2015; Li et al. 2016; Qian et al. 2016; Han et al. 2018; Wang et al. 2018) or fingerprinting approaches (Wu et al. 2012b; Abdel-Nasser et al. 2013; Xiao et al. 2013; Sabek and Youssef 2013; Wang et al. 2016a, 2017a, b; Chen et al. 2017; Gao et al. 2017; Zhou et al. 2017, 2018; Shi et al. 2018). Due to the unpredictability of radio propagation caused by multipath effects, the solution to establish an accurate signal propagation model is difficult to implement, so the CSI fingerprinting method is widely adopted (Shi et al. 2018). Most of these CSI-based device-free fingerprinting schemes use CSI measurements as fingerprints directly or just after pre-processing, which do not fully exploit the CSI training samples and are subjective on feature selection (Wu et al. 2012b; Abdel-Nasser et al. 2013; Zhou et al. 2017). This leads to the fact that these schemes must use multiple links to provide richer CSI information to distinguish different locations, which is not available in many scenarios. At present, it has become a trend to utilize deep learning for the fingerprinting localization (Wang et al. 2016a, 2017a, b; Chen et al. 2017; Gao et al. 2017; Zhou et al. 2018), which utilize the deep learning to extract more distinguishing representation from the CSI measurements, thereby avoiding artificial feature selection. Although these works extract the feature representation with good discrimination from CSI measurements, but they ignore the time-varying characteristics of CSI. They believe that the distribution of training CSI samples is the same as that of test CSI samples since they are collected in a similar indoor environment. However, it does not hold true in a real, complex indoor environment according to our experimental observations. CSI is more temporally stable than the RSSI, but when the time interval is extended to one or more days, even if we keep the indoor environment unchanged, the CSI is likely to change significantly.

There are already many studies that attempt to cope with the instability of Wi-Fi based localization approaches caused by changing environmental dynamics (Mager et al. 2015; Ohara et al. 2015), but most of them are aimed at RSS. To the best of our knowledge, there are few studies on this aspect of CSI. We have proposed a rigorously designed update scheme that uses an artificial neural network to update the fingerprint database to solve this problem in our previous work (Rao and Li 2019). However, it can only obtain the nonlinear mapping relationship between the current fingerprints and the fingerprint database built previously. Using this nonlinear mapping relationship to convert the fingerprint database may result in damage to the discriminability of the fingerprint database data structures. Therefore, there is a need to create a stable and accurate CSI-based localization approach to cope with the time-varying characteristic of CSI caused by changing environmental dynamics. And the transfer learning can solve this problem precisely, so we combine it with the deep learning of recent trends, which can extract more distinguishing representation from the CSI measurements, to complete our system.

In this paper, we propose a novel transfer deep learning-based DFPWL system. Compared with other deep learning-based approaches, our system uses the CSI extracted from a single link to estimate the location of the target, neither requiring the target to wear any electronic equipment nor deploying many APs and Monitor Devices. In addition, the previous deep learning-based DFPWL approaches mostly ignored the instability of localization accuracy caused by the time-varying characteristics of CSI. To cope with it, a novel transfer deep learning (TDL) method combining deep neural networks and transfer learning is applied in our system based on the further development of our previous work (Rao and Li 2019). The TDL method aims to learn the new feature representation from CSI samples, under which intra-class distances are minimized, inter-class distances are maximized, distribution differences between fingerprint database and test samples are minimized, simultaneously. Benefited from this feature representation, we can utilize the CSI from a single link to obtain satisfactory localization accuracy through the KNN algorithm finally, while saving calibration cost caused by the need for re-collecting fingerprint database. It is proved in the subsequent evaluations and thus greatly expanding the application scenario of the DFPWL technology.

The main contributions of this paper are as follows:

  1. 1.

    Through experimental observation, it is found that CSI changes significantly with increasing time interval, which is about 5–6 dB difference per day;

  2. 2.

    Propose a novel CSI based device-free wireless localization solution, which can maintain stable localization accuracy without repeated fingerprint acquisition over time with only using a single link;

  3. 3.

    Apply a novel transfer deep learning (TDL) method that combines deep neural networks and transfer learning to obtain the new feature representation that pursues transferability and discrimination simultaneously;

  4. 4.

    The performance of our system (i.e., localization accuracy) has been verified by a comprehensive experiment. The experiment results show that our system can achieve the mean localization error of about 1.1 m without the changing in CSI and maintain it within 1.5 m in the face of CSI changes, which is more accurate and stable than the state of the art.

The rest of the paper is organized as follows. The related works and the preliminary studies are introduced in Sects. 2 and 3, respectively. Section 4 presents the details of our proposed system. Then, Sects. 5 evaluates the performance of our system and the conclusion of this paper is given in Sect. 6.

2 Related works

The means and methods of indoor localization are diverse in recent years. For brevity, we only survey the works that are closely related to our system here, i.e., CSI-based device-free localization and feature-based transfer learning.

2.1 CSI-based device-free localization approaches

Various CSI-based device-free localization approaches have been proposed (Abdel-Nasser et al. 2013; Xiao et al. 2013; Sabek and Youssef 2013; Li et al. 2016; Qian et al. 2016; Gao et al. 2017; Zhou et al. 2017, 2018; Shi et al. 2018; Wang et al. 2018) in recent years. These approaches are mainly divided into two categories: model-based and fingerprint-based solutions.

Model-based solutions (Li et al. 2016; Qian et al. 2016; Wang et al. 2018) establish the mathematical relationship between CSI measurements and the location of the target through a statistical radio signal propagation model and use this relationship to obtain the estimated location from the measured CSI samples. Therefore, these solutions do not need to laboriously build and maintain a fingerprint database. Wang et al. (2018) proposed a low-manpower, device-free positioning system based on CSI, termed LiFS. It selected the multipath-affected subcarriers from the CSI measurements and then modeled them as a set of equations based on power fading to obtain the estimated location. Li et al. (2016) proposed MaTrack, which detected the subtle reflected signals from moving objects and further separated it from the stationary objects to recognize the object’s angle and tracked the moving object without prior training. Widar proposed by Qian et al. (2016) established a model between the location and moving states (i.e. speed and direction) of target and the CSI changes, and then used the model to simultaneously estimate the moving states and positions of the target.

Fingerprint-based solutions (Abdel-Nasser et al. 2013; Xiao et al. 2013; Sabek and Youssef 2013; Gao et al. 2017; Zhou et al. 2017, 2018; Shi et al. 2018) require to construct an offline fingerprint database in advance and then compare the collected test CSI samples to obtain the estimated location of the target. MonoPHY proposed by Abdel-Nasser et al. (2013)used a maximum likelihood algorithm for clustering-based CSI fingerprinting based on a single-link to locate the target. Based on the further development of MonoPHY, MonoStream (Sabek and Youssef 2013) extracted the features that can capture small changes in CSI measurements caused by people standing in different locations and then utilized object recognition algorithm to estimate the location of the target. Gao et al. (2017) utilized radio image processing methods to characterize the effects of human behavior on CSI and proposed a device-free localization and activity recognition approach. It extracted the color and texture features from the radio image converted by amplitude and phase information of CSI and learned optimized deep features from these color and texture features by using CNN. Finally, a machine learning algorithm was applied to estimate the location and activity of the target. Zhou et al. (2017) proposed to establish a model between the CSI measurements and the location of the target by SVM regression, and then obtained the estimated location of the target by inputting the measured CSI sample into the model. Xiao et al. (2013) proposed Pilot at the ICDCS conference, which implemented a lightweight CSI-based passive human movement detection and the detected CSI samples were then matched to the fingerprint database using a probabilistic algorithm to obtain the estimated locations of the potential existing targets. In addition, a data fusion block was applied in Pilot to solve the multiple targets localization problem. Shi et al. (2018) proposed to determine the location of the target with the highest probability through comparing CSI test samples to the fingerprint database by Bayes Classification method and improved the performance by reducing the dimension of CSI samples with PCA. Furthermore, the Bayesian filter and Kalman filter were utilized to continuously forecast and track the location of the moving target. Zhou et al. (2018) proposed a CSI-based device-free localization approach by using Deep Neural Networks (DNN), which can establish the dependence model between the target’s location and the CSI measurements through deep neural network, and then inputted measured CSI samples into it to obtain estimated location.

2.2 Feature-based transfer leaning

The feature-based transfer learning methods are the most commonly used method for transfer learning. It takes the knowledge connection of the inter-domain feature level as the research point and utilizes the shared feature representation to realize the cross-domain transfer of knowledge.

The transfer subspace learning (TSL) method proposed by Si et al. (2009) reduced the distribution difference between domains by adding Bregman Divergence (Banerjee et al. 2005) to the traditional dimensionality reduction algorithm, realizing the cross-domain transfer of knowledge and improving the classifier accuracy in the target task. The Geodesic Flow Kernel (GFK) algorithm based on kernel technology proposed by Gong et al. (2012) embedded the dataset into the Grassman manifold to construct a geodesic flow to realize knowledge transfer between domains. The transfer principle component analysis (TCA) algorithm proposed by Pan et al. (2010) combined PCA with Maximum Mean Discrepancy (MMD) (Borgwardt et al. 2006) to realize shared feature extraction and cross-domain transfer of knowledge. To further improve the performance of TCA, Long et al. (2013) designed a Joint Distribution Adaptation (JDA) algorithm on the basis of the label iterative elaboration mechanism in the case of there was no labeled data in the target domain. In label iterative elaboration processing, JDA simultaneously narrowed the difference of conditional distribution and marginal distribution between domains through MMD and combined it with the PCA algorithm to reduce dimensions, obtain the shared feature subspace, thus realized the cross-domain transfer of knowledge. Farajidavar et al. (2014) designed the Adaptive Transductive Transfer Machine (ATT) method. It utilized the unlabeled target domain samples to adjust the parameters in method to obtain the optimal shared feature subspace and applied local transformation to the source domain samples to narrow the marginal distribution differences between domains. Finally, label converters were used to narrow the differences of conditional distribution between domains and improve the efficiency of knowledge transfer across domains. The Generalized Unsupervised Manifold alignment (GUMA) algorithm proposed by Cui et al. (2014) made full use of the original structure information of the data, assuming that the same subject data set had a similar data structure. It established inter-domain relationships by aligning the manifold structure between domains to achieve cross-domain transfer of knowledge. Tuia et al. (2014) further improved the classification accuracy and robustness of the algorithm to nonlinear data by aligning the kernelled manifold structure. To maintain the optimal effect of the projection transformation of the respective subspaces between the domains, Fernando et al. (2013) proposed a subspace alignment algorithm. It utilized the traditional feature extraction algorithm PCA to extract the data features from the source and target domains respectively. Then, subspace coordinate transformation was applied to realize the cross-domain transfer of knowledge.

3 Background and motivation

3.1 Channel state information (CSI)

In 2009, the IEEE 802.11n standard supporting MIMO and Orthogonal Frequency Division Multiplexing (OFDM) technologies was released. This standard provides a physical layer information interface between the transmitting device and the receiving device, i.e., Channel State Information (CSI). The CSI reflects the amplitude and phase information of all subcarriers transmitted between the transmitting and receiving devices. Compared with RSS, which is a superimposed value of wireless signals from multiple paths, it has the advantages of better stability, small multipath effect, and fine granularity. Therefore, it is possible to easily acquire CSI through the CSI Tools (Halperin et al. 2011) that proposed by Halperin based on the Intel 5300 wireless network card and utilize it to achieve more accurate indoor localization. CSI Tools is an open-source tool for acquiring CSI and supporting simple calculations, and in addition to providing a CSI acquisition interface, it provides information on the number of transmit (Tx) and receiver (RX) antennas, received signal strength, and noise estimation.

Let \( \overrightarrow {X} \) and \( \overrightarrow {Y} \) denote the transmitted and received signal vectors. We have

$$ \overrightarrow {Y} = H \cdot \overrightarrow {X} + \overrightarrow {N} $$
(1)

where vector \( \overrightarrow {N} \) is the additive white Gaussian noise, and H represents the channel gain vector, which is the CSI matrix we need.

Then, assume the transmit antennas number be \( N_{T} \) and the receive antennas number be \( N_{R} \), the CSI matrix H is a \( N_{T} \times N_{R} \times 30 \) matrix:

$$ H = (h_{ijk} )_{{N_{T} \times N_{R} \times 30}} $$
(2)

where 30 is the number of subcarriers extracted by CSI Tools from one stream (Halperin et al. 2011).

For each value \( h_{ijk} \), it represents the CSI of the kth subcarrier of the stream formed by TX i and RX j, expressed as:

$$ h_{ijk} = |h_{ijk} |e^{{j\sin (\angle h_{ijk} )}} ,i \in [1,N_{T} ],j \in [1,N_{R} ],k \in [1,30] $$
(3)

where \( |h_{ijk} | \) is the amplitude response of the kth subcarrier, and \( \angle h_{ijk} \) is the phase response of the kth subcarrier.

3.2 Time-varying characteristic of CSI

In this section, we examine the challenges of using CSI for indoor localization in a real and complex indoor environment. As mentioned before, CSI contains amplitude and phase information, but as described in (Zhou et al. 2014; Qian et al. 2014; Wu et al. 2015; Xie et al. 2019), phase information has significant random noise, making it impossible to use for indoor localization without phase correction and noise filtering. Therefore, we only use CSI amplitude information to design our localization system.

Through experimental observations, we find that the CSI amplitude information has time-varying characteristics. For brevity, the other characteristics of CSI used to locate target in previous works also applied to our localization system, which are not described and verified in detail here. Such as the CSI extracted from the three antennas of the NIC are different (Sabek and Youssef 2013), the CSI measurements of continuously received packets exhibit better stability than the RSS values when the target is at a fixed location (Wang et al. 2017a), the CSI will change significantly with the target at different locations or in the silence case, i.e., without target in the interest area (Sabek and Youssef 2013).

Figure 1 plots the amplitude values of CSI for one stream in five consecutive days. Figure 1a–e respectively plot the amplitude of subcarriers over-time/packets in the case of no target in the interest area on each day under the condition of no big change in the indoor environment. It is easy to see that the CSI amplitude exhibits good stability for continuous packets but has obvious variations between different days. We call this the time-varying characteristic of CSI. Such characteristic of CSI leads to the fact that only frequent and periodic collection of CSI fingerprints can ensure the positioning accuracy, but most previous CSI-based localization solutions ignore this. This is the motivation of our system.

Fig. 1
figure 1

CSI amplitude values for one stream in five consecutive days

4 System design

4.1 System architecture

Unlike traditional fingerprinting localization systems, our system just utilizes a single link to achieve acceptable localization accuracy, and mainly includes three stages: the offline training stage, the localization preparation stage and the online localization stage. The structure of our system is shown in Fig. 2.

Fig. 2
figure 2

System architecture

The CSI amplitude measurements may contain abnormal samples affected by multipath and environmental noises, these abnormal samples have a great impact on the localization performance and thus should be filtered before localization. To cope with these abnormal CSI amplitude samples, we adopt the CFDP algorithm proposed in our previous work (Rao and Li 2019), which considers any sample have lower local density \( \rho \) than others and large distance from cluster center \( x^{cen} \) with the highest local density as an abnormal sample, where \( \rho \) is the number of samples that are closer to sample \( x_{i} \) than cutoff distance \( d_{c} \).

The offline training stage aims to extract CSI amplitude samples and construct a CSI fingerprint database. We first pick out K positions in the monitored area and mark them as training points. Then, we situate the target at each training point k to collect T CSI samples from all L = ntx * nrx TX-RX streams and label them with each training point k after filtering. Finally, the filtered CSI samples collected by K training points are combined to form a CSI fingerprint database \( {\mathbb{C}}_{database} \). Specifically, \( C_{k}^{t} (l) = \{ h_{1} ,h_{2} , \ldots ,h_{30} \} \) represents the t-th CSI samples collected from the l-th stream when the target is at training point k. Moreover, we also collect T CSI samples when there is no-target in the monitored area and recorded in the fingerprint database labeled as training point 0 after filtering. Hence, the fingerprint database can be represented as \( {\mathbb{C}}_{database} = \{ C_{0} ,C_{1} , \ldots ,C_{k} , \ldots ,C_{K} \} \) and \( C_{k} \) is the CSI samples from all L streams at training point k, it can be presented as:

$$ C_{k} = \left\{ {\begin{array}{*{20}c} {C_{k}^{1} (1)} & {C_{k}^{1} (2)} & \cdots & {C_{k}^{1} (L)} \\ {C_{k}^{2} (1)} & {C_{k}^{2} (2)} & \cdots & {C_{k}^{2} (L)} \\ \vdots & \vdots & \ddots & \vdots \\ {C_{k}^{T} (1)} & {C_{k}^{T} (2)} & \cdots & {C_{k}^{T} (L)} \\ \end{array} } \right\} $$
(4)

During the localization preparation stage, a small CSI test training set is constructed first. We pick out M training points from the all K training points and situate the target at each training point m to collect T CSI samples from all L = ntx * nrx TX-RX streams labeled as each training point m. Then we combine the filtered CSI samples collected by M training points to form a test training set \( {\mathbb{C}}_{testtrainingset} \). It should be noted that training point 0 must be included, i.e., when the monitoring area has no target. Concretely, the test training set can be represented as \( {\mathbb{C}}_{testtrainingset} = \{ C_{0} ,C_{m} , \ldots ,C_{M} \} \) and \( C_{m} \) is the CSI samples from all L streams at training point m. Then, the CSI samples in the fingerprint database and in the test training set are inputted into the proposed TDL method to learn the new feature representation. Finally, we transfer all the training samples in the fingerprint database into the learned feature representation space to generate a new fingerprint database.

The online test stage aims to obtain the estimated location of the target. We collect test CSI samples when a target in the monitored area, \( c_{test} \). Then, we transform it into the learned feature representation space and finally obtain the estimated position by comparing it with the new fingerprint database through KNN algorithms.

4.2 Transfer deep learning method

Combining the ideas of deep learning and transfer learning (Hu et al. 2015), we propose a transfer deep learning (TDL) method to obtain the localization fingerprints. It aims to learn the discriminative distance network by mixing the fingerprint database samples and the test training set samples and obtain the feature representation that takes into account both separability and transferability to be used as fingerprints.

4.2.1 Notation

We denote \( X_{S} = \{ (x_{si} ,y_{si} )|i = 1,2, \ldots ,N_{s} \} \) as the source domain training set, i.e., fingerprint database \( {\mathbb{C}}_{database} \). It contains \( N_{s} = T*(K + 1) \) CSI samples, \( x_{si} \) is the i-th CSI sample in the \( X_{S} \) from all L streams, which is a \( d = 30*L \) dimensional vector, \( y_{si} \in \{ 0,1,2, \ldots ,K\} \) is the label of \( x_{si} \). Similarly, \( X_{T} = \{ (x_{ti} ,y_{ti} )|i = 1,2, \ldots ,N_{t} \} \) denotes the target domain training samples, i.e., test training set \( {\mathbb{C}}_{testtrainingset} \) that contains \( N_{t} = T*(M + 1) \) CSI samples, and \( y_{ti} \) is the label of \( x_{ti} \). Then, we mix the CSI samples from the \( X_{S} \) and \( X_{T} \) according to the label correspondence to obtain the merged labeled training set \( X_{ST} = \{ (x_{i} ,y_{i} )|i = 1,2, \ldots ,N\} ,N = N_{s} + N_{T} \).

4.2.2 Intra-class and inter-class distance

A deep neural network is designed to learn the feature representation we need, and its structure is shown in Fig. 3. The feature representation of each sample \( x \) is calculated by sending each sample x into the deep neural network through a multi-layer nonlinear transformation, where the input is the sample \( x \) in the merged labeled training set \( X_{ST} \), the output of the hidden layer is \( h^{(1)} \), and the output of the top layer is \( h^{(2)} \). \( W^{(s)} \), \( b^{(s)} \) are the network parameters we need to learn, \( 1 \le s \le 2 \).

Fig. 3
figure 3

The network architecture

Assume there are \( p^{(s)} \) units in the s-th layer, s = 1,2. Thus, the output of this layer for sample \( x \) is:

$$ f^{(s)} (x) = h^{(s)} = \varphi (W^{(s)} h^{(s - 1)} + b^{(s)} ) \in {\mathbb{R}}^{{p^{(s)} }} $$
(5)

where \( W^{(s)} \in {\mathbb{R}}^{{p^{(s)} \times p^{(s - 1)} }} \) is the weight matrix of this layer, \( b^{(s)} \in {\mathbb{R}}^{{p^{(s)} }} \) is the bias of this layer, and \( \varphi \) is a nonlinear activation function that realizes the nonlinearization between the input and output of neurons (such as tanh or sigmoid).

At the s-th layer, each pair of samples \( x_{i} \) and \( x_{j} \) can be transformed to representations \( f^{(s)} (x_{i} ) \) and \( f^{(s)} (x_{j} ) \), so the distance between \( x_{i} \) and \( x_{j} \) at this layer is defined as:

$$ d_{{f^{(s)} }}^{2} (x_{i} ,x_{j} ) = ||f^{(s)} (x_{i} ) - f^{(s)} (x_{j} )||_{2}^{2} $$
(6)

Then, we defined \( S_{w}^{(s)} \) as the intra-class distance and \( S_{b}^{(s)} \) as the inter-class distance, as shown below:

$$ S_{w}^{(s)} = \frac{1}{{Nk_{1} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {P_{ij} d_{{f^{(s)} }}^{2} (x_{i} ,x_{j} )} } $$
(7)
$$ S_{b}^{(s)} = \frac{1}{{Nk_{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {Q_{ij} d_{{f^{(s)} }}^{2} (x_{i} ,x_{j} )} } $$
(8)

where, if \( x_{j} \) is one of k1-intra-class nearest neighbors of \( x_{i} \), \( P_{ij} \) is set to 1, otherwise 0; if \( x_{j} \) is one of k2-inter-class nearest neighbors of \( x_{i} \), \( Q_{ij} \) is set to 1, otherwise 0.

4.2.3 Domain difference measure

In our system, given the fingerprint database \( {\mathbb{C}}_{database} \) and the test training set \( {\mathbb{C}}_{testtrainingset} \) are training sets from the source and target domains, respectively. As described in Sect. 3.2, their distribution is usually different in the CSI amplitude space when the measurement time interval is large. Therefore, we utilize the Maximum Mean Discrepancy (MMD) (Borgwardt et al. 2006) to measure the differences between them. As mentioned before, there are samples of M +1 training points in \( X_{T} \). We extract the samples of these M +1 training points from \( X_{S} \) to construct a new training set \( X^{*} \). Then, the differences between domains \( X_{S} \) and \( X_{T} \) at the s-th layer can be expressed as follows:

$$ D_{ts}^{(s)} (X_{T} ,X_{S} ) = ||\frac{1}{{N_{t} }}\sum\limits_{i = 1}^{{N_{t} }} {f^{(s)} (x_{*i} )} - \frac{1}{{N_{t} }}\sum\limits_{i = 1}^{{N_{t} }} {f^{(s)} (x_{ti} )} ||_{2}^{2} $$
(9)

where \( N_{t} \) represents the number of samples in \( X_{T} \) and \( X^{*} \), equal to T*(M + 1); \( x_{*i} \) is the i-th sample in \( X^{*} \) and \( x_{ti} \) is the i-th sample in \( X_{T} \).

4.2.4 Method specific process

Now, we can formulate the TDL method as the following optimization problem at the top output layer by combining (7), (8) and (9):

$$ \mathop {\hbox{min} J}\limits_{{f^{(2)} }} = S_{w}^{(2)} - \alpha S_{b}^{(2)} + \beta D_{ts}^{(2)} (X_{T} ,X_{S} ) + \gamma \sum\limits_{s = 1}^{2} {(||W^{(s)} ||_{F}^{2} + ||b^{(m)} ||_{2}^{2} )} $$
(10)

where \( \alpha (\alpha > 0) \) is a parameter for balancing the importance between the intra-class and inter-class distance; \( \beta (\beta > 0) \) is a regularization parameter; \( \gamma (\gamma > 0) \) is a tunable regularization parameter; \( ||W^{(s)} ||_{F} \) is the Frobenius norm of \( W^{(s)} \).

The gradients of \( J \) in (10) are computed as follows:

$$ \begin{aligned} \frac{\partial J}{{\partial W^{(s)} }} &= \frac{2}{{Nk_{1} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {P_{ij} (L_{ij}^{(s)} (h_{i}^{(s - 1)} )^{T} + L_{ji}^{(s)} (h_{j}^{(s - 1)} )^{T} )} } \hfill \\ &\quad - \frac{2\alpha }{{Nk_{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {Q_{ij} (L_{ij}^{(s)} (h_{i}^{(s - 1)} )^{T} + L_{ji}^{(s)} (h_{j}^{(s - 1)} )^{T} )} } \hfill \\ &\quad + \frac{2\beta }{{N_{t} }}(\sum\limits_{i = 1}^{{N_{t} }} {L_{ti}^{(s)} (h_{ti}^{(s - 1)} )^{T} + } \sum\limits_{i = 1}^{{N_{t} }} {L_{*i}^{(s)} (h_{*i}^{(s - 1)} )^{T} } ) \hfill \\ &\quad + 2\gamma W^{(s)} ,s = 1,2 \hfill \\ \end{aligned}$$
(11)
$$ \begin{aligned} \frac{\partial J}{{\partial b^{(s)} }} &= \frac{2}{{Nk_{1} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {P_{ij} (L_{ij}^{(s)} + L_{ji}^{(s)} )} } - \frac{2\alpha }{{Nk_{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {Q_{ij} (L_{ij}^{(s)} + L_{ji}^{(s)} )} } \hfill \\ &\quad + \frac{2\beta }{{N_{t} }}(\sum\limits_{i = 1}^{{N_{t} }} {L_{ti}^{(s)} + } \sum\limits_{i = 1}^{{N_{t} }} {L_{*i}^{(s)} } ) + 2\gamma b^{(s)} ,s = 1,2 \hfill \\ \end{aligned}$$
(12)

and the back-propagation updating equations are as follows:

$$\begin{aligned} & L_{ij}^{(2)} = (h_{i}^{(2)} - h_{j}^{(2)} ) \odot \varphi^{\prime} (z_{i}^{(2)} ),L_{ji}^{(2)} = (h_{j}^{(2)} - h_{i}^{(2)} ) \odot \varphi^{\prime} (z_{j}^{(2)} ) \hfill \\ & L_{ij}^{(1)} = ((W^{(2)} )^{T} L_{ij}^{(2)} ) \odot \varphi^{\prime} (z_{i}^{(1)} ),L_{ji}^{(1)} = ((W^{(2)} )^{T} L_{ji}^{(2)} )\varphi^{\prime} (z_{j}^{(1)} ) \hfill \\ & L_{ti}^{(2)} = \frac{1}{{N_{t} }}(\sum\limits_{j = 1}^{{N_{t} }} {h_{tj}^{(2)} } - \sum\limits_{j = 1}^{{N_{t} }} {h_{*j}^{(2)} } ) \odot \varphi^{\prime} (z_{ti}^{(2)} ),L_{*i}^{(2)} \\ & \qquad= \frac{1}{{N_{t} }}(\sum\limits_{j = 1}^{{N_{t} }} {h_{*j}^{(2)} } - \sum\limits_{j = 1}^{{N_{t} }} {h_{ti}^{(2)} } ) \odot \varphi^{\prime} (z_{*i}^{(2)} ) \hfill \\ & L_{ti}^{(1)} = ((W^{(2)} )^{T} L_{ti}^{(2)} ) \odot \varphi^{\prime} (z_{ti}^{(1)} ),L_{*i}^{(1)} = ((W^{(2)} )^{T} L_{*i}^{(2)} ) \odot \varphi^{\prime} (z_{*i}^{(1)} ) \hfill \\ & z_{i}^{(s)} = W^{(s)} h_{i}^{(s - 1)} + b^{(s)} ,s = 1,2 \hfill \\ \end{aligned}$$

the operation \( \odot \) denotes the element-wise multiplication.

The \( W^{(s)} \) and \( b^{(s)} \) can be updated as follows, \( \lambda \) is the learning rate:

$$ W^{(s)} = W^{(s)} - \lambda \frac{\partial J}{{\partial W^{(s)} }} $$
(13)
$$ b^{(s)} = b^{(s)} - \lambda \frac{\partial J}{{\partial b^{(s)} }} $$
(14)

The detailed calculation flow of the method is shown in Algorithm 1.

figure a

5 Experiment validation

We verify the feasibility and effectiveness of our system through comprehensive experiments in this section. We start by the overview of the experiment methodology, followed by the analysis of the localization performance of our systems and comparisons with other DFPWL systems in the actual indoor environment, and we end the section by the evaluation of the localization performance of our system under different system parameters.

5.1 Experiment methodology

5.1.1 Experiment environment

We experimented in a typical indoor environment in the conference room. The experimental site is Room 514, Laboratory Building, Xidian University. This conference room is \( 12 \times 10\;{\text{m}}^{2} \) in size with a large conference table in the center, surrounded by many tables, chairs, and filing cabinets, resulting in a lot of non-line-of-sight (NLOS) paths and complex wireless propagation environment (Rao and Li 2019). AP is an IEEE 802.11n-compliant TP-LINK WR841N router operating in the 2.5 GHz unlicensed band. A laptop is serving as Monitoring Point, which is equipped with Ubuntu 14.04 operating system and an Intel Wireless Link 5300 NIC (IWL5300). We placed the AP and the laptop on the diagonally located table in the room, about 15.6 m apart. The selection of training points and test points in the monitoring area and the layout of the experimental scene are consistent with our previous work (Rao and Li 2019).

5.1.2 Data sets settings

We run CSI Tools to collect and record a CSI measurement for 30 channels per packet from each antenna of the IWL 5300 NIC. Specifically, when the IWL 5300 NIC receives the packet, the CSI Tools obtains a raw CSI measurement and records it on the hardware, which includes 90 subcarriers values from the three antennas. Then, we design a MATLAB program to read CSI measurements from the hardware based on the interface provided by CSI Tools. Moreover, we use the Anaconda and TensorFlow (base) libraries (Abadi et al. 2016) to develop and debug the neural network in the TDL method. Due to the small amount of data and the small size of the network, we did not use GPU to accelerate learning. All of the above operations are implemented on the Dell XPS-15 laptop with Intel i5-8300H CPU (2.3 GHz) and 8 GB RAM.

During the training stage, we collect CSI samples to construct the fingerprint database. Specifically, in the experiment settings, we selected 56 training points from the indoor area of interest (monitored area), and the distance between each adjacent training point was 1 m. Then, we recorded 1000 CSI samples (approximately 100 s) from the TX-RX link when the target locates at each training point. During the localization preparation stage, we only collect 1000 CSI samples with no target in the monitored area, i.e., training point 0, to form the test training set. This is the most appropriate choice after detailed analysis in Sect. 5.5. In this case, the system can achieve satisfactory positioning accuracy without any manual intervention. During the test stage, the target is located in an unknown location in the monitored area and we collect CSI samples and recorded samples for each location lasting 5 s (50 values).

After five consecutive days of data collection, we can have five training data sets D1, D2, D3, D4, D5, corresponding to each day, and five test data sets T1, T2, T3, T4, T5.

5.1.3 Performance metric

We use the average mis-measured distance between the estimated position of the target and the actual one, i.e., the Median Distance Error, as the performance metric of system performance. Assume that the estimated position of the target obtained from test sample i is \( (x_{i}^{e} ,y_{i}^{e} ) \), and the actual position of the target is \( (x_{i}^{{}} ,y_{i}^{{}} ) \). So, the median distance error is:

$$ \mu_{test} = \sum\limits_{i = 1}^{{N_{test} }} {\sqrt {(x_{i}^{e} - x_{i} )^{2} + (y_{i}^{e} - y_{i} )^{2} } } /N_{test} $$
(15)

5.1.4 Experimental parameters

The experimental environment, data collection and performance metric have been determined, but there are still many system parameters in our system that affect its performance. In our proposed TDL method, we designed a three-layer deep network to obtain new feature representations, whose neural nodes were set as [90–180–90] from bottom to top to maintain consistent input and output dimensions, and the nonlinear activation function adopted the commonly used tanh. According to (Samui et al. 2017), we empirically initialize the coefficient matrix \( W^{(s)} ,s = 1,2 \) as diagonal matrixes, the bias \( b^{(s)} ,s = 1,2 \) as zero vectors, and the learning rate \( \lambda \) as 0.1. For the hyperparameters in the TDL method, we respectively set \( \alpha \), \( \beta \), \( k_{1} \) and \( k_{2} \) to 0.1, 10, 5 and 10. Table 1 lists the other detailed experimental parameters.

Table 1 Experimental parameters

5.2 Localization performance

In this section, we used the CSI measurements from a single link to conduct experiments to evaluate the localization performance of our system and to make a fair comparison with other systems that also use these CSI measurements, including MonoPHY (Abdel-Nasser et al. 2013), DNNWL (Zhou et al. 2018), LiFS (Wang et al. 2018), Pilot (Xiao et al. 2013), SVRDFL (Zhou et al. 2017).

First, we designed an experiment that collected the CSI samples of both training points and test points on each day for five consecutive days. Then, we used the CSI samples of training points as training data, and the CSI samples of test points collected on the same day were used as test data, all of which were applied to the above systems to obtain the final localization results and comparing the localization errors. We called this experiment as Test 1 with the test training set in the TDL method is an empty set in this experiment. The mean localization errors and standard deviations of each system on each day in Test 1 are shown in Table 2.

Table 2 The mean localization errors and standard deviations of each system in Test 1

In the conference room scenario with rich multipath and shadow effects, our system achieves the mean localization error of about 1.1 m on each day while using only a single link. And the mean localization errors for And the mean localization errors for MonoPHY, DNNWL, Pilot, LiFS, and SVRDFL on each day are about 1.8 m, 1.5 m, 2.6 m, 2.8 m, and 1.6 m, respectively. It is easy to see that our system has the slightest mean localization error and the smallest standard deviation error, which is outperformed than other systems about 39%, 27%, 58%, 61%, and 31%.

Figure 4 plots the Cumulative distribution function (CDF) of the localization errors for each system on each day in Test 1. Our system has over 60% of the test samples having an error less than or equal to 1.5 m in all experiments for five consecutive days, while that for the other systems is 49% or less. Furthermore, it is also found that approximately 100% of the test samples for our system have an error under 2 m, while the percentage of test samples having a smaller error than 2 m is about 62%, 71%, 33%, 41%, and 67%, for MonoPHY, DNNWL, Pilot, LiFS, and SVRDFL, respectively.

Fig. 4
figure 4

CDF of the localization errors of each system on each day in Test 1

Our system yields the best performance in these systems, which we believe to be due to its ability to extract more stable and discriminating feature representations than original CSI amplitude features using the proposed TSDNN method. Since DNNWL and SVRDFL use DNN and SVR, correspondingly, to model the relationship between position and pre-processed CSI amplitude fingerprints finely, the two have achieved the second and third performance separately. LiFS is a model-based positioning system, and Pilot directly uses the CSI amplitude fingerprint to obtain the target estimation position through the probability algorithm. Therefore, in the scenario where we only have single-link CSI channel information, the performance of the two is poorly performed due to insufficient information.

Second, we designed another experiment that collected the CSI samples of training points on the first day and collected the CSI samples of test points in the next 4 days. We then used the CSI samples of training points as training data and the CSI samples of test points collected on other days were used as test data, all of which were applied to our system, MonoPHY and DNNWL to obtain the final localization results and comparing the localization errors. We called this experiment as Test 2 and the mean localization errors and standard deviations of each system on each day in Test 2 are presented in Table 3.

Table 3 The mean localization errors and standard deviations of each system in Test 2

From Table 3, we can see that the mean localization errors of our system, are all within 1.5 m, and it also achieves the best performance among these systems benefiting from the more transferability feature representations obtained from CSI phase information. In addition, contrary to Test 1, the performance of MonoPHY is better than DNNWL and SVRDFL in Test 2. The mean localization error for MonoPHY on each day increases from 1.88 m to 2.13 m, while that for DNNWL and SVRDFL on each day is greater than MonoPHY and increases from 2.09 m to 2.36 m and from 1.903 m to 2.114 m, respectively. This is because DNNWL and SVRDFL use DNN and SVR, correspondingly, to model the relationship between position and pre-processed CSI amplitude fingerprints finely. When the training data and test data follow the same distribution (collected on the same day), the DNNWL and SVRDFL are able to build more sophisticated models and achieve better localization performance than MonoPHY. However, when the training data and the test data do not follow the same distribution (collected on different days), over-elaboration of the model leads to faster performance degradation of DNNWL and SVRDFL. Furthermore, the localization error for Pilot slightly increased to compare with Test 1, due to the distribution difference between training data and test data, while the model-based LiFS is basically unchanged. As with Test 1, the two both have poor performance due to insufficient information.

Figure 5 plots the CDF of the localization errors of each system on each day in Test 2. Although the time-varying characteristics of CSI make the distribution of training data and test data different, our system can still achieve 1.4–1.5 m mean localization errors. The localization errors of our system are within 1.5 m with the probability of about 60%, while that for the other systems is about 30% or less. Furthermore, approximately 100% of the test samples for our system have an error under 2.5 m, while the percentage of test samples having a smaller error than 2.5 m is about 80%, 68%, 26%, 38%, and 76%, for MonoPHY, DNNWL, Pilot, LiFS, and SVRDFL, respectively.

Fig. 5
figure 5

CDF of localization errors of each system on each day in Test 2

The experimental results of Test 1 and Test 2 show that our system can not only achieve better localization performance than MonoPHY, DNNWL, Pilot, LiFS, and SVRDFL but also achieve satisfactory localization accuracy without re-collecting the fingerprint database when the fingerprint database becomes unavailable due to a long-time-interval.

5.3 Effect of different antennas

The IWC5300 NIC has three antennas, and different combinations of antennas can affect the performance of our system. Therefore, we investigated the impact by designing systems that use different antenna combinations. First, a three-antennas system utilizing 90 CSI values of three antennas as input data; Second, dual-antennas systems that combine two antennas randomly selected from three antennas, which utilize 60 CSI values from two antennas as input data; Third, single antenna systems that select one antenna from three antennas, and only 30 CSI values from a single antenna are used as input data. The meanwhile, the number of the neurons at the input and output layer of the deep neural network in TDL method is set as a number of the CSI values that used as input data, i.e., 30 for single antenna systems, 60 for dual-antennas systems, 90 for three-antennas system. All other system parameters remain unchanged and the experiment is carried out under the condition that the indoor environment of the meeting room is as constant as possible. For brevity, we only use the CSI samples of training points and test points collected on the same day. The results of using the CSI samples of training points and test points collected on different days are generally similar. The mean localization errors of the different antenna-version systems on each day are listed in Table 4.

Table 4 Mean localization errors of the different antenna-version systems on each day

As shown in Table 4, it is noticed that the mean localization errors of the three-Antenna system are about 1.1 m on each day. It is far more accurate than other versions of the system, and the latter two have mean localization errors of about 1.7 m and 2.2 m, respectively. As available CSI values increase from 30 to 90, each sample can provide more detailed environmental features, so the three-antenna system can achieve better positioning accuracy than other versions. Therefore, our system eventually adopted three antennas.

5.4 Impact of the neurons in the TDL method

To study the influence of the deep neural network structure in the TDL method on our system performance, we designed four deep neural networks with different numbers of neuron, whose structure is [90–180–90], [90–90–90], [90–180–180] and [90–180–360], respectively.

First, we used the CSI samples of training points and test points collected on the same day as training data and test data to evaluate the impact of deep neural network structure on the localization accuracy of our system. The mean localization errors and standard deviations of the systems with different deep neural network structures on each day are listed in Table 5.

Table 5 Mean localization errors and standard deviations of the systems with different deep neural network structures on each day in the case of using the CSI samples of training points and test points collected on the same day as training data and test data

It can be seen from the Table 5 that the system using the deep neural network with the neuron structure of [90–180–90] achieves the best localization performance with the mean localization error on each day of about 1.1 m, while the mean localization errors of the systems using deep neural networks with other neuron structures are about 1.8 m, 1.6 m, and 1.4 m, respectively.

Second, we used the CSI samples of the training points collected on the first day as training data and used the CSI samples of the test points collected on other days as test data to evaluate the impact of deep neural network structure on the localization accuracy of our system. The mean localization errors and standard deviations of the systems with different deep neural network structures on each day are presented in Table 6.

Table 6 Mean localization errors and standard deviations of the systems with different deep neural network structures on each day in the case of using the CSI samples of the training points collected on the first day as training data and the CSI samples of the test points collected on other days as test data

From the Table 6, we can see that, in the case of using the collected CSI samples on the first day as training data and the CSI samples collected on other days as test data, the system using the deep neural network with the neuron structure of [90–180–90] also achieves the best localization performance. The mean localization errors of our system that utilizes this structure are all within 1.5 m on each day, while the mean localization errors of the systems using deep neural networks with other neuron structures are about 2.0 m, 1.9 m, and 1.6 m, respectively.

In summary, the deep neural network with the neuron structure of [90–180–90] is the optimal choice for our system.

5.5 Impact of the number of training points in the test training set \( X_{T} \)

Since our system utilizes the TDL method to obtain the feature representation using the test training set \( X_{T} \) consisting of the currently acquired CSI samples, the number of training points in \( X_{T} \) can greatly affect our system performance. To evaluate this impact, we designed a specific experiment. In this experiment, we used the CSI samples of the training points collected on the first day as training data, and the CSI samples of the test points collected on the fifth day, which have the greatest difference from the first day, were used as test data. We collected CSI samples of different number of training points to form different test training sets \( X_{T}^{1} ,X_{T}^{2} ,X_{T}^{3} ,X_{T}^{4} \). Considering that increasing the number of training points in the test training set requires a lot of labor costs, the test training sets \( X_{T}^{1} ,X_{T}^{2} ,X_{T}^{3} ,X_{T}^{4} \) are composed of the CSI samples of training point 0 (only CSI samples collected without the target in the monitored area), CSI samples of 5 training points, CSI samples of 10 training points and CSI samples of 15 training points, respectively. The mean localization errors of this experiment are listed in Table 7. It is noticed that as the number of training points in the test training set increases, the mean localization error does not decrease significantly, and remains substantially at about 1.5 m, however, the labor cost required to collect the test training set increases significantly.

Table 7 Mean localization errors for our system with different numbers of training points in the test training set in the case of using the CSI samples of the training points collected on the first day as training data and the CSI samples of the test points collected on the fifth days as test data

Figure 6 represents the CDF of the localization errors for our systems with different numbers of training points in the test training set. It is easy to see that the CDF of localization errors of the systems with a different number of the training points in the test training set are relatively close. So, considering the increased labor costs of increasing the number of training points in the test training set, it could be more appropriate to utilize the test training set consisting of the CSI samples collected without target in the monitored area. This not only can achieve satisfactory mean localization error but also eliminates the need for manual intervention during the training preparation stage.

Fig. 6
figure 6

CDF of localization errors for our systems with different numbers of training points in the test training set in the case of using the CSI samples of the training points collected on the first day as training data and the CSI samples of the test points collected on the fifth days as test data

5.6 Impact of the classification algorithm

Since our system finally uses a classification algorithm to estimate the location of the target, different classification algorithms will affect the localization performance of our system. To study this impact, we design a specific experiment. In this experiment, three classifiers (NN with  = 3, SVM with C = 1 and polynomial kernel, Naive Bayesian Classifier) are employed to perform final classification tasks to obtain the localization result.

First, we use the CSI samples of training points and test points collected on the same day as training data and test data to evaluate the impact of different classification algorithms. The mean localization errors and standard deviation of our system with different classification algorithm on each day are presented in Table 8.

Table 8 The mean localization errors and standard deviation of our system with different classification algorithms in case of using the CSI samples of training points and test points collected on the same day as training data and test data

From Table 8, we can see that the mean localization error of the system using SVM and KNN algorithm on each day is very close, both are about 1.1 m, and the mean localization error of the system using the SVM algorithm can be less than 1 m on the second and fourth days. Meanwhile, the system using the NBC algorithm has the largest mean localization error on each day, about 1.3 m.

Second, we use the CSI samples of the training points collected on the first day as training data and the CSI samples of the test points collected on other days as test data to evaluate the effect of different classification algorithms on our system performance. In this case, the mean localization errors and standard deviation of our system with different classification algorithm on each day are presented in Table 9.

Table 9 The mean localization errors and standard deviation of our system with different classification algorithms in case of using the CSI samples of the training points collected on the first day as training data and the CSI samples of the test point

As shown in Table 9, we can see that, in the case of using the collected CSI samples on the first day as training data and the CSI samples collected on other days as test data, the system utilizing SVM algorithm achieves the best localization performance and the mean localization error on each day is about 1.35 m. Meanwhile, The mean localization errors of the system that utilizes other classification algorithms, KNN and NBC, are about 1.44 m and 1.69 m, respectively.

In summary, considering the localization accuracy and implementation complexity, we decided to use the KNN algorithm as the classifier algorithm in our system. This is because it not only achieves a fairly good localization accuracy but also does not require an additional training step (SVM and NBC algorithm both need an additional training step).

6 Conclusion

In this paper, we presented a novel transfer deep learning-based DFPWL system, which uses the CSI amplitudes extracted from a single link to estimate the location of the target, neither requiring the target to wear any electronic equipment nor deploying a large number of APs and Monitor Devices. In order to overcome the reduction of localization accuracy caused by the time-varying characteristic of CSI that ignored by most previous works, we proposed a transfer deep learning (TDL) method combining deep neural network and transfer learning to obtain the feature representation as fingerprint which pursues transferability and discriminability simultaneously, instead of using the CSI as fingerprint directly like the other traditional fingerprinting localization systems. The proposed system obtains the localization result by comparing the test data and the fingerprint database under the new feature representation space through the KNN algorithm and can achieve satisfactory localization accuracy without the labor costs of repeated fingerprint collection. However, although our work has already given a solution to the CSI time-varying problem encountered in actual device-free passive fingerprinting localization scenarios, it can only locate a single target, and we will explore multi-target device-free passive fingerprinting localization in future work.