1 INTRODUCTION

With the vigorous development of information technology, people’s requirements for location-based services (LBS) are also increasing, and location applications are becoming popular. Wireless localization technology has become a point of interest for people, an application focus in industry, and a research hotspot in academia [1]. The airport terminal is a large-scale key place for civil aviation business such as check-in, security check, and boarding. The construction of airport terminals is usually large, and the types, deployment methods, and deployment locations of check-in islands, security check areas, restaurants, and public service points are usually different between airports, resulting in inconvenience for passengers. Therefore, the research on indoor localization methods for airport terminals is particularly important [2].

In the indoor environment, due to the wide application of wireless local area network (WLAN) technology and its advantages of simple layout and low price, the WLAN-based localization method is more suitable for indoor environments. Indoor localization methods based on WLAN include ranging models [3] and location fingerprint models. Among them, localization methods based on location fingerprint models have become mainstream methods and research hotspots due to their higher localization accuracy. Fingerprint localization methods based on Wi-Fi signals mainly use the received signal strength (RSS) fingerprints [4, 5] and the channel impulse response (CIR) fingerprints [6, 7]. Since the sampling of CIR fingerprints requires special equipment, localization method is now more based on RSS. The localization accuracy of traditional location fingerprint-based localization models depends largely on whether offline data and online collected data belong to the same distribution model. In the actual indoor WLAN environment, the data collected is often different from the ideal situation. The offline collected RSS signals from various wireless access points (Access Point, AP) are complex and changeable, which reduces the indoor localization algorithm accuracy. In response to the above problems, Bahl et al. [8] proposed the use of K-Nearest Neighbors (KNN) algorithm for localization, which uses KNN to match the online collected data with the fingerprint dataset, and uses the obtained results as the target location. This algorithm has a disadvantage. When localization, it does not consider the situation of RSS data being disturbed. In the actual indoor environment, due to the complexity and change of RSS data, this localization method often has a large localization error. Zhang et al. [9] proposed a Wi-Fi localization algorithm using Kernel Discriminant Analysis (KDA)–KNN. This algorithm uses KDA transform to train offline location fingerprints in the offline phase, and uses KNN for position prediction in the online phase. Compared with KNN and its improved algorithm, this algorithm effectively improves the localization accuracy, but due to the longer time required for KDA conversion, the localization efficiency is low. Li Hualiang et al. [10] proposed an indoor localization algorithm based on Kernel Principal Component Analysis (KPCA). The algorithm uses KPCA to train location fingerprint data in the offline phase, and uses an improved K-nearest neighbor algorithm for location prediction in the online phase. Due to the unsupervised nature of the KPCA algorithm, its localization performance is worse than KDA, and its localization efficiency is low.

In this paper, an indoor localization algorithm based on RSS extended features and Spectral Regression KDA (SRKDA) [11] is proposed for the characteristics of large passenger flow, complicated and changeable indoor environment in the airport terminal. In the offline phase, first use the Least Squares–Support Vector Machine(LS–SVM) [12] regression model to estimate the distance between the terminal and the AP, and use this to extend the RSS features, and then use the SRKDA method to train the location fingerprint data, improve the localization accuracy while reducing the time complexity. In the online stage, use weighted KNN (Weighted KNN, WKNN) algorithm for position estimation.

2 PROBLEM DESCRIPTION

When studying the localization of a special localization scenario for a large airport terminal, we extracted two characteristics from large airport scenario:

(1) Because there is a lot of passenger traffic, and radio waves will be lost through the human body, when it is in an area with large passenger traffic, the signal sent by the sending end will be affected, which is different from the ideal signal attenuation, so will affect the localization results;

(2) Due to the complex and changeable environment of the terminal, non-line-of-sight (NLOS) localization environments of line-of-sight and non-line-of-sight are common, and the NLOS environment is also an important factor that affects the localization accuracy.

2.1 Impact of Heavy Traffic on Localization

In order to study the influence of the human body on the received signal strength, we divided into two groups of experiments, and measured the received signal strength in a crowded environment and a sparse environment, and then collected the data for analysis. The results are shown in Fig. 1.

Fig. 1.
figure 1

Human body influence on received signal strength.

If it is an empty environment and there is no shelter from buildings and pedestrians, the actual signal strength will be the signal strength of each Wi-Fi signal after air attenuation. Figure 1a is the frequency histogram generated by the RSS collected in a densely populated environment; Fig. 1b is the frequency histogram generated by the RSS collected in a sparsely populated environment. In Fig. 1a, because of the densely populated environment, the RSS data collected at the same location fluctuates significantly, while in Fig. 1b, the RSS data is more concentrated. As can be seen from Fig. 1, the human body has a significant impact on the received signal strength. Therefore, to study the indoor localization method in a large airport terminal environment, first of all, it is necessary to solve the problem of poor localization effect caused by heavy passenger flow. In the fingerprint-based localization method, the use of KDA for data processing can effectively solve the problem of large RSS fluctuations caused by the signal passing through the human body; and the introduction of the SR framework on the basis of KDA can solve the problem of low localization efficiency when with high passenger flow in the terminal.

2.2 Impact of NLOS Propagation on Localization

We usually divide the propagation conditions of wireless communication into two propagation environments: LOS and NLOS. In an indoor environment, when the direct path between the AP and the mobile terminal is blocked by some obstacles, the wireless signal at this time can only reach the mobile terminal through reflection, scattering, diffraction, etc. At this time, the data obtained by the measurement, such as the arrival time, received signal strength, etc., deviate from the data reflected by the actual distance. Using the data obtained by the measurement for position estimation, the estimated position will be different from the real position. Therefore, the NLOS environment will affect the indoor localization accuracy. The effect of NLOS propagation on localization is shown in Fig. 2.

Fig. 2.
figure 2

Schematic diagram of the effect of non-line-of-sight propagation on localization.

It can be seen from Fig. 2 that due to the presence of a NLOS environment, the position estimation of the AP and the mobile terminal based on the data obtained by the mobile terminal will be different from the actual distance. Due to the NLOS transmission of the wireless signal, the signal will be attenuated. Therefore, if the propagation model is used for distance estimation, the estimated distance will generally be greater than the actual distance. NLOS propagation is a feature of the large-scale airport terminal building localization scenario, which is proposed on the basis of considering the AP deployment location of the airport terminal and the space layout of the terminal. In a large airport terminal environment, APs are deployed above buildings such as shops or counters. Therefore, the layout of shops or counters in the terminal building will inevitably result in NLOS transmission environments due to blocked signals. In response to the above problems, this paper uses estimated distance to extend RSS features, first uses the LS–SVM regression model to estimate the distance from the terminal to the access point, and then uses the estimated distance and RSS data together as fingerprint data, and then performs position estimation.

3 PROPOSED METHOD

Aiming at the environment of large airport terminals, this paper based on the fingerprint-based localization method, extended fingerprint features by using distance estimation, and introduced the SR framework based on KDA, proposed an indoor localization algorithm based on feature extension and SRKDA–WKNN. The algorithm can not only improve the localization effect, but also be more efficient than other algorithms in the fingerprint data processing stage, so it is very suitable for a large airport terminal such as a passenger location with high passenger demand and high localization requirements.

As shown in Fig. 3, the algorithm in this paper is divided into offline phase and online phase. The offline stage is mainly to collect fingerprint data in the localization area and collect the distance information from the terminal to the AP. The least square support vector machine regression model is used to estimate the distance from the terminal to the AP, the RSS data is combined with the estimated distances together to form a location fingerprint dataset, and then use the SRKDA method to extract features from the offline fingerprint dataset to generate a new feature dataset; while the online stage mainly uses the device to be located to obtain fingerprint data in the localization area, and then the same fingerprint data is obtained to perform SRKDA processing, finally match the processed feature fingerprint with the data in the feature dataset. The matching algorithm at this time uses the WKNN algorithm to estimate the position.

Fig. 3.
figure 3

Schematic diagram of proposed localization algorithm.

3.1 Terminal to AP Distance Estimation

Before performing distance estimation, we must first perform feature extraction on RSS data. In the process of feature extraction, the following features are mainly selected for NLOS error suppression:

(1) Average value (\(\mu \)). The average value refers to the ratio of the sum of all data in a group of data to the number of data in this group, which can be directly obtained from the RSS \(data.\)

(2) The logarithmic mean (\(\hat {d}~\)). Because there is a linear relationship between RSS and the logarithm of distance, you can convert RSS to logarithmic space, and then find the mean value as a feature.

(3) Standard deviation (\({{\sigma }_{\mu }}\)). Standard deviation refers to the square root of the arithmetic mean of the square of the difference between all data in a set of data and the average, which can be obtained by calculating the mean.

After feature extraction, the LS–SVM regression model is used for distance estimation, that is, the above features are used as input, and the estimated distance from the terminal to the AP is used as output. The process of distance estimation using the LS–SVM regression model is described below.

For \(m\) samples \(\left\{ {\left( {{{{\mathbf{x}}}_{i}},{{y}_{i}}} \right)} \right\}_{{i = 1}}^{m}\), \({{{\mathbf{x}}}_{i}}\) is the RSS feature of a certain point, \({{y}_{i}}\) is the corresponding distance. The regression function \(y = f\left( {\mathbf{x}} \right)\) between distance \(y{{\;}}\)and the feature x can be learned by using LS–SVM method. LS–SVM transforms the inequality constraints of the original SVM method into equality constraints, which greatly facilitates to solve the problem. LS–SVM is described as the following optimization problems:

$$\begin{array}{*{20}{c}} {{\text{mi}}{{{\text{n}}}_{{{\boldsymbol{\mathbf{\omega }}},{\mathbf{b}},{\mathbf{e}}}}}{{\;}}J\left( {{\boldsymbol{\mathbf{\omega }}},{\mathbf{e}}} \right) = \frac{1}{2}{{{\boldsymbol{\mathbf{\omega }}}}^{T}}{\boldsymbol{\mathbf{\omega }}} + \frac{1}{2}\gamma \mathop \sum \limits_{i = 1}^m {\mathbf{e}}_{i}^{2}} \\ {s.t.{{\;}}{{y}_{i}}\left[ {{{{\boldsymbol{\mathbf{\omega }}}}^{T}}\varphi \left( {{{{\mathbf{x}}}_{i}}} \right) + b} \right] = 1 - {{{\mathbf{e}}}_{i}},{{\;}}i = 1, \ldots ,m,} \end{array}$$
(1)

where \({\boldsymbol{\mathbf{\omega }}},b,{\mathbf{e}}\) are the variables to be optimized, \({{{\mathbf{e}}}_{i}}\) is the error variable introduced into each sample, \(\gamma \) is the regularization parameter, and \(\varphi \left( {{{{\mathbf{x}}}_{i}}} \right)\) map \({{{\mathbf{x}}}_{i}}\) to a higher dimensional space. Lagrange multiplier method is used to solve the above optimization problems, and the problem is equivalent to minimizing the function:

$$L\left( {{\boldsymbol{\mathbf{\omega }}},b,{\mathbf{e}},{\boldsymbol{\mathbf{\alpha }}}} \right) = J\left( {{\boldsymbol{\mathbf{\omega }}},{\mathbf{e}}} \right) - \mathop \sum \limits_{i = 1}^m {{{\alpha} }_{i}}\left\{ {{{y}_{i}}\left[ {{{{\boldsymbol{\mathbf{\omega} }}}^{T}}\varphi \left( {{{{\mathbf{x}}}_{i}}} \right) + b} \right] - 1 + {{{\mathbf{e}}}_{i}}} \right\},$$
(2)

where \({{\alpha }_{i}}\) are the Lagrange multiplier coefficients corresponding to xi.

Then derivate \({\boldsymbol{mathbf{\omega} }},b,{\mathbf{e}},{\boldsymbol{\mathbf{\alpha }}}\) , and set thoese derivatives are 0, a linear system of equations about α and \(b\) can be got:

$$\left[ {\begin{array}{*{20}{c}} 0&{{{{\mathbf{1}}}^{{\text{T}}}}} \\ {\mathbf{1}}&{K + \frac{{\text{I}}}{{{\lambda }}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} b \\ {\boldsymbol{\mathbf{\alpha }}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 0 \\ {\mathbf{y}} \end{array}} \right],$$
(3)

where\({\mathbf{1}} = {{[1,1, \ldots ,1]}^{T}}\), \(K\) is the \(m \times m\) kernel matrix, and the kernel function used in this algorithm is the Gaussian radial basis function. The function expression is as follows:

$${{K}_{{ij}}} = k\left( {{{{\mathbf{x}}}_{i}},{{{\mathbf{x}}}_{j}}} \right) = \varphi {{\left( {{{{\mathbf{x}}}_{i}}} \right)}^{T}}\varphi \left( {{{{\mathbf{x}}}_{j}}} \right) = \exp \left[ { - \frac{{\left\| {{{{\mathbf{x}}}_{i}} - {{{\mathbf{x}}}_{j}}} \right\|_{2}^{2}}}{{2{{\sigma }^{2}}}}} \right],~$$
(4)

where \(\sigma \) is the kernel parameter. Solve the above equations, we can obtain the α and \(b\). Thus, for a new sample x, the output of the LS–SVM method is:

$$y = f\left( {\mathbf{x}} \right) = {\boldsymbol{\mathbf{\omega }}}{{\varphi }}\left( {\mathbf{x}} \right) + b = \mathop \sum \limits_{i = 1}^m {{{{\alpha }}}_{i}}k\left( {{\mathbf{x}},{{{\mathbf{x}}}_{i}}} \right) + b.$$
(5)

The estimated distance from the terminal to the AP can be obtained using the LS–SVM regression model. Assuming that the received signal strength from n APs \(\left( {rs{{s}_{1}},rs{{s}_{2}}, \ldots ,rs{{s}_{n}}} \right)\) can be collected at a certain location, then we can use the LS–SVM regression model to obtain the estimated distance from the location to n APs \(\left( {{{d}_{1}},{{d}_{2}}, \ldots ,{{d}_{n}}} \right)\), and then combine the obtained estimated distance with the RSS data as fingerprint data, that is, perform feature extension on the RSS data to obtain fingerprint data \({{f}_{1}} = \left( {rs{{s}_{1}},rs{{s}_{2}}, \ldots ,rs{{s}_{n}},{{d}_{1}},{{d}_{2}}, \ldots ,{{d}_{n}}} \right)\).

3.2 SRKDA Processing of Offline Fingerprint Data

The goal of LDA is to maximize the inter-class covariance matrix while minimizing the intra-class covariance matrix. The KDA algorithm is an improvement on the LDA algorithm, and its idea is similar to that of the LDA algorithm.

Setting the nonlinear transformation \(\Phi \) realizes the mapping from offline fingerprint space \(X\) to feature fingerprint space X ', that is: \(\Phi {\kern 1pt} :X \to X{\kern 1pt} '\). For a proper chosen \(\Phi \), an inner product \(\left\langle {\,,\left. {\kern 1pt} \right\rangle } \right.\) can be defined on X '. We have \(\left\langle {~\Phi \left( {\mathbf{x}} \right),\Phi \left( {\mathbf{y}} \right)} \right\rangle = \mathcal{K}\left( {{\mathbf{x}},{\mathbf{y}}} \right)\), where \(\mathcal{K}\left( {.,.} \right)\) is a positive semi-definite kernel function. Several popular kernel function is Gaussian kernel, polynomial kernel and Sigmoid kernel. Suppose we have a set of \(m\) samples \({{{\mathbf{x}}}_{1}},~{{{\mathbf{x}}}_{2}}, \ldots ,~{{{\mathbf{x}}}_{m}} \in {{\mathbb{R}}^{n}}\), belonging to \(N\) classes. Let \(S_{b}^{\Phi },~~S_{w}^{\Phi }\) and \(S_{t}^{\Phi }\) denote the between-class, within-class and total scatter matrices in the feature space, respectively. We have

$$S_{b}^{\Phi } = \mathop \sum \limits_{k = 1}^N {{p}_{k}}\left( {{\boldsymbol{\mathbf{\mu }}}_{{{\Phi }}}^{{\left( k \right)}} - {{{\boldsymbol{\mathbf{\mu }}}}_{{{\Phi }}}}} \right){{\left( {{\boldsymbol{\mathbf{\mu} }}_{{{\Phi }}}^{{\left( k \right)}} - {{{\boldsymbol{\mathbf{\mu }}}}_{{{\Phi }}}}} \right)}^{T}},$$
(6)
$$S_{w}^{\Phi } = \mathop \sum \limits_{k = 1}^N \mathop \sum \limits_{i = 1}^{{{p}_{k}}} \left( {{{\Phi }}\left( {{\mathbf{x}}_{i}^{{\left( k \right)}}} \right) - {\boldsymbol{\mathbf{\mu }}}_{{{\Phi }}}^{{\left( k \right)}}} \right){{\left( {{{\Phi }}\left( {{\mathbf{x}}_{i}^{{\left( k \right)}}} \right) - {\boldsymbol{\mathbf{\mu }}}_{{{\Phi }}}^{{\left( k \right)}}} \right)}^{T}},$$
(7)
$$S_{t}^{\Phi } = \mathop \sum \limits_{i = 1}^m \left( {{{\Phi }}\left( {{{{\mathbf{x}}}_{i}}} \right) - {{{\boldsymbol{\mathbf{\mu }}}}_{{{\Phi }}}}} \right){{\left( {{{\Phi }}\left( {{{{\mathbf{x}}}_{i}}} \right) - {{{\boldsymbol{\mathbf{\mu} }}}_{{{\Phi }}}}} \right)}^{T}},$$
(8)

where \({{p}_{k}}\) is the number of samples in the \(k\)-th class, \({\boldsymbol{\mathbf{\mu }}}_{\Phi }^{{\left( k \right)}}\) and \({{{\boldsymbol{\mathbf{\mu }}}}_{\Phi }}\) are the centroids of the \(k\)-th class and the global centroid, \({\mathbf{x}}_{i}^{{\left( k \right)}}\) is the \(i\)-th sample in the \(k\)-th class.

Let v denote the projection function in the feature space, and the corresponding objective function in the feature space is:

$${{{\mathbf{v}}}_{{{\text{opt}}}}} = {\text{argmax}}\;\frac{{{{{\mathbf{v}}}^{T}}S_{b}^{\Phi }{\mathbf{v}}}}{{{{{\mathbf{v}}}^{T}}S_{t}^{\Phi }{\mathbf{v}}}},$$
(9)

which can be solved by the Eigen-problem:

$$S_{b}^{\Phi }{\mathbf{v}} = {{\lambda }}S_{t}^{\Phi }{\mathbf{v}}.$$
(10)

Since the feature vector in the above formula is a linear combination of \(\Phi \left( {{{{\mathbf{x}}}_{i}}} \right)\), there are:

$${\mathbf{v}} = \mathop \sum \limits_{i = 1}^m {{\alpha }_{i}}{{\Phi }}\left( {{{{\mathbf{x}}}_{i}}} \right).$$
(11)

Let \({\mathbf{\alpha }} = {{[{{\alpha }_{1}},{{\alpha }_{2}},{{\alpha }_{3}}, \ldots ,{{\alpha }_{m}}]}^{T}}\), equation (9) is equivalent to:

$${{{\boldsymbol{\mathbf{\alpha }}}}_{{{\text{opt}}}}} = {\text{argmax}}~\frac{{{{{\boldsymbol{\mathbf{\alpha }}}}^{{\text{T}}}}KWK{\boldsymbol{\mathbf{\alpha }}}}}{{{{{\boldsymbol{\mathbf{\alpha }}}}^{T}}KK{\boldsymbol{\mathbf{\alpha }}}}},$$
(12)

and the corresponding Eigen-problem is:

$$KWK{\boldsymbol{\mathbf{\alpha }}} = \lambda KK{\boldsymbol{\mathbf{\alpha} }},$$
(13)

where \(K\) is the kernel matrix and \(W\) is defined as:

$${{W}_{{ij}}} = \left\{ {\begin{array}{*{20}{c}} {\frac{1}{{{{p}_{k}}}},}&{{\text{if}}\;{{{\mathbf{x}}}_{i}}\;{\text{and}}\;{{{\mathbf{x}}}_{j}}\;{\text{both belong to the}}\;k{\text{ - th class}};{{\;}}} \\ {0,}&{{\text{otherwise}}.} \end{array}} \right.$$
(14)

It can be seen that, computing the optimal α involves Eigen-decomposition of the kernel matrix, which is very expensive when a large number of training samples exist. By using spectral graph analysis, SRKDA casts this problem into a regression framework which is conducive to efficient computation and the use of regularization techniques. According to [11], we can use an two-stage regression method to get the eigenvector α.

For each feature vector α, there is a projection function v in the feature space. For the offline fingerprint xi, we can get its projection \({{{\mathbf{z}}}_{i}}\) in kernel space:

$${{{\mathbf{z}}}_{i}} = {{{\mathbf{v}}}^{T}}{{\Phi }}\left( {{{{\mathbf{x}}}_{i}}} \right).$$
(15)

Thus, SRKDA method is used to map the original extended fingerprint data to fingerprint feature space, then estimate the location in this space, as described in the following.

3.3 WKNN Position Estimation

In this paper, the algorithm uses SRKDA to process the original location fingerprint in the offline stage to generate a feature fingerprint dataset and generate a model. In the online stage, the model is used to process the information obtained online, and then the processed online data and offline features are processed using the WKNN method. Match the data in the fingerprint dataset.

First, calculate the Euclidean distance between Y ' and each feature fingerprint data X ' in the offline feature fingerprint dataset:

$${{D}_{i}}\left( {Y{\kern 1pt}^{{'}},~X_{i}^{{{'}}}} \right) = \sqrt {\mathop \sum \limits_{j = 1}^n {{{\left( {Y_{j}^{{{'}}} - rss_{{ij}}^{{{'}}}} \right)}}^{2}}} ,{{\;}}\,\,i \in \left( {1,N} \right),{{\;}}$$
(16)

\({{D}_{i}}\left( {Y{\kern 1pt} ',~X_{i}^{{{'}}}} \right)\) can indicate the similarity between Y ' and \(X_{i}^{{{'}}}\), the smaller the value, the more similar the two are.

Then arrange \({{D}_{i}}\left( {Y{\kern 1pt} ',~X_{i}^{{{'}}}} \right)\) in ascending order, take the first \(t\) smaller \({{D}_{i}}\), and its corresponding t position information \({{l}_{i}}\left( {{{x}_{i}},{{y}_{i}}} \right)\), according to the principle that the smaller the distance, the greater the weight. The reciprocal of the distance is used as the weight, and then the predicted position is obtained by the following formula:

$${\text{pos}} = \frac{{\sum\limits_{i = 1}^t {~\frac{1}{{{{D}_{i}}~}}{{l}_{i}}} }}{{\sum\limits_{i = 1}^t {~\frac{1}{{{{D}_{i}}~}}} }}.$$
(17)

4 EXPERIMENT AND VERIFICATION

4.1 Experimental Data and Environment

The data used in this experiment was collected in a real environment. The data collection of the experiment was carried out at the departure floor of Terminal T2 of Tianjin Binhai International Airport. Considering many factors, a rest area at the departure floor of the terminal is selected as the experimental area. Judging from the collected data, in this rest area, multiple data collections at the same location fluctuate greatly due to the influence of personnel walking. Deploy an reference point (RP) every 1m and deploy a total of 10 RPs. The location distribution is shown by the white dots in Fig. 4. Then choose other locations in the data collection area as test point (TP). The distribution is shown by the star point in Fig. 4. The real location of TP is known, RSS data from each AP is collected at each TP location as online fingerprint data, and then the online fingerprint data is used as input, and the location estimation is performed by the algorithm in this paper, and the estimated location is compared with the real location to verify the performance of the algorithm.

Fig. 4.
figure 4

Schematic diagram of the four location scenes, white circle indicates a reference point, and star indicates a test point.

In the experiment, the WLAN facility in the terminal building of Tianjin Airport was used for data collection. Along with the change of the data collection location, a total of 157 APs could be detected in the data collection experiment area, and 122 AP information remained after excluding personal hotspots. All APs are NLOS in the data collection area, and the AP position factor and the influence of AP hardware on the algorithm are ignored in the experiment. In the experiment, Huawei Mate7 mobile phone was used to collect RSS data. The mobile phone’s processor is quad-core 1.8 GHz, and the operating system is EMUI 3.0. The RSS data of each TP and RP is collected continuously 100 times, and each fingerprint data includes RSS value and terminal-to-AP distance. Some fingerprint data is shown in Table 1. The data in Table 1 is offline fingerprint data. In general, it can be directly used for localization without processing. Due to limited space, only part of the RSS data from the first 6 APs is listed in the table.

Table 1.   Partial fingerprint data

The offline fingerprint data is processed by SRKDA, and the Gaussian kernel width is set to 1 according to experience. The data in the offline fingerprint dataset is the RSS value obtained by Huawei Mate7 mobile phone. Since KDA performs nonlinear transformation, the transformed data is not used as feature fingerprint data and has no practical significance.

4.2 Influence of Different Feature Combinations on Distance Estimation

When estimating the terminal-to-AP distance, feature extraction must first be performed. Firstly, experiment on the influence of different feature combinations on distance estimation to judge the merits of the selected features. The specific feature combination is shown in Table 2.

Table 2.   Feature combination table

After grouping the features, we conduct experiments according to the grouping, using the CDF of the error as a measure, and the final result is shown in Fig. 5. In this experiment, we used 1500 pieces of data collected offline as the experimental sample, and used a 10-fold cross-validation method to conduct the experiment. That is, the sample was divided into ten parts, 9 of which were taken as training sets and 1 as the test set.

Fig. 5.
figure 5

Influence of different feature combinations on distance estimation.

From Fig. 5, we can see that as we add features, the performance of the LS–SVM regression model will always improve, but it is not that the more features, the better the distance estimation effect. In fact, when selecting the feature group F2, the result is the best. At this time, the features include the average and the logarithmic mean, which means that these two features have the greatest impact on the regression model. When the feature of standard deviation is added, the result has not become better as expected. The result of distance estimation becomes worse, which means that the noise characteristic of standard deviation may affect the performance of the regression model, resulting in a larger distance estimation error.

4.3 Comparison of Distance Estimation Effects

In the distance estimation, in order to compare the effects of different algorithms, we conducted experiments on three algorithms: SPM algorithm, BPM algorithm and LS–SVM algorithm, using the error CDF as the measurement standard. The final result is shown in Fig. 6.

Fig. 6.
figure 6

Comparison of the distance estimation effect of different algorithms.

As can be seen from Fig. 6, when performing distance estimation, the LS–SVM regression model used in this article is more effective, and the errors are mostly distributed within 2 m. In contrast, the effects of the SPM algorithm and the BPM algorithm are much worse. The errors of the SPM algorithm are mostly distributed within 5 m, and the errors of the BPM algorithm are mostly distributed within 5.5 m, but the probability of the BPM algorithm within 3 m is much higher than the SPM algorithm, so the average error of the BPM algorithm smaller. Experiments show that the effect of distance estimation using LS–SVM algorithm is much better than SPM algorithm and BPM algorithm.

4.4 The Effect of the Number of APs on the Average Error

In the fingerprint localization algorithm, because it mainly relies on RSS data for localization, the dimension of the RSS data, that is, the number of APs is an important factor that affects the localization effect. In general, the more APs, the better the localization effect. In this experiment, we take the number of APs to vary from 10 to 120, and use the average error to verify the performance of the algorithm. The definition of the average error is as follows:

$$ME = \frac{1}{N}\mathop \sum \limits_{i = 1}^N \sqrt {{{{\left( {{{x}_{i}} - x_{i}^{'}} \right)}}^{2}} + {{{\left( {{{y}_{i}} - y_{i}^{'}} \right)}}^{2}}} .$$
(18)

In the above formula, N represents the number of position estimations, \({{x}_{i}}\) and \({{y}_{i}}\) represent the real coordinates of the i-th location, and \(x_{i}^{{{'}}}\) and \(y_{i}^{{{'}}}\) represent the predicted coordinates of the i-th location. The specific experimental results are shown in Fig. 7.

Fig. 7.
figure 7

The average error varies with the number of APs.

As can be seen from Fig. 7, as the number of APs increases, the average error of each algorithm shows a downward trend. When the number of APs is 30, the average error of the algorithm proposed is already smaller than that of other algorithms, at the same average number of APs required under error is less than other algorithms. From this, the algorithm proposed in this paper performs better than other algorithms in terms of average error.

4.5 Localization Accuracy in Different Localization Scenarios

In order to verify the robustness of the algorithm in this paper, we conduct experiments in four different localization scenarios shown in Fig. 3, and verify the localization performance of the proposed algorithm in different scenarios by comparing the localization accuracy within 1.5 m of distance error in different scenarios. In this experiment, all the offline samples and APs were taken to verify the different fields.

The localization accuracy of different localization algorithms under different scenarios is shown in Table 3.

Table 3.   Localization accuracy of different algorithms under different scenarios

It can be seen from Table 3 that in the four localization scenarios, although the localization accuracy of different algorithms varies in different scenarios, compared with other algorithms, the localization performance of the algorithm in four scenarios is better, indicating that the proposed algorithm is more robust.

5 CONCLUSIONS

In view of the complex localization environment of the terminal, this paper proposes an indoor localization algorithm based on RSS feature extension and SRKDA. The algorithm uses the LS–SVM regression model to estimate the distance between the terminal and the access point, and then uses the estimated distance and the RSS data as fingerprint data. In the offline stage, SRKDA is used to train the original fingerprint data to extract the nonlinear characteristics of the original location fingerprint. As a new location fingerprint dataset, the online data uses the WKNN method to match the online data with the data in the fingerprint dataset for location estimation. Experimental results show that the algorithm proposed in this paper is superior to several other algorithms, reducing the average error and improving the localization accuracy. However, during the experiment, this paper did not consider the impact of the AP deployment position on the signal, and the localization situation under different AP deployment position environments needs to be resolved in the next step.