In this section, we present the detailed experiment and corresponding results obtained from a real testbed.
5.1 Experimental setup and technical details
As a testbed, we utilize the WLAN infrastructure of the ECE Building in Bangladesh University of Engineering and Technology (BUET). We have chosen two general classrooms of this building, which are room no 203 and 204. Let us mark these rooms as room A and room B, consecutively. For both rooms, we create a grid based radio-map inside and outside to get reference points. Data collection in these two rooms is performed independently.
There are plenty of Wi-Fi access points perceived in both of the rooms. But, to increase accuracy, we place an extra Wi-Fi source in the center of the room while conducting data collection in that room. We have divided the room A by a 4 × 4 grid inside (hence, m = 4 and n = 4 in the algorithm). Middle point of each grid is a reference point. Also, there are 9 reference points outside the room. Similarly, in room B, there are also 4 × 4 grid inside where 16 reference points are present. And there are 10 reference points outside. So, for room A, we get 25 reference points (16 inside + 9 outside). On the other hand, we get 26 reference points (16 inside + 10 outside) for room B.
To collect data for a room, in each test point, we run our custom build app, providing the reference point position and timer. Then, the app automatically collects Wi-Fi AP list with their corresponding signal strength. We collect signals 20 times at each reference point with a 20 seconds gap in between. These data are our reference point data.
5.1.1 Outlier Detection.
For each reference point in a room, from the 20 set of data, we first remove outliers by applying Interquartile range outlier elimination techniques for each access point data and then average the remaining data.
5.1.2 Fingerprint Data Generation for a Room.
For each reference point, we have one set of pairs. Each pair consists of an AP name and its average signal strength.
Then, for each room, we determine the most frequent access points, which are present in most of the reference points inside and outside of that room. Even if some reference points do not capture a Wi-Fi access point due to their weak signal strength there, we do not ignore the access point for that reference point. Instead, we assign them a minimum signal strength value of − 90dB for all the missing access points.
So, for a room, we have finally determined some fixed access points that are present in all the reference points. In this way, our reference dataset generation as well as fingerprinting is complete for a room.
5.2 Experimental Results
Once the reference dataset is ready, we utilize test samples to determine the accuracy of our algorithm. The test sample contains a list of Wi-Fi APs with RSS values. For a particular test point, the signal is collected multiple times. Then, for every access point, we exclude the outlier from its RSS values and then take the average values like the way we averaged RSS values for the reference points. Thus, for a test point, we also have a list of AP with their averaged RSS values.
Now, we run the algorithm to the test point against the reference points. Here, we take three different values of k eg. 3, 5, 7 where k is the number of best matching reference points. We utilize the four different distance metrics, namely euclidean distance, manhattan distance, cosine similarity and hamming distance to calculate the distance between test point vector and every reference point vector. The algorithm here determines the k nearest reference points with respect to the test point. Next, we perform the majority voting to the k selected reference points to determine whether the test point is inside or outside the room.
From the results presented in Figure
5 and Figure
6, we can observe how different values of
k and various distance metrics influence the accuracy of test points in both Room
A and Room
B. In Room
A, for
k = 3, the accuracy achieved with the Euclidean and Manhattan distance metrics is
\(87\%\) and
\(92\%\), respectively, while cosine similarity also reaches
\(87\%\), and Hamming distance trails behind at
\(61\%\). With
k = 5, Euclidean and Manhattan accuracies remain consistent at
\(87\%\) and
\(92\%\), though cosine similarity drops to
\(77\%\) and Hamming distance stays at
\(61\%\). For
k = 7, the accuracy declines further for all metrics except Hamming (
\(61\%\)), with Euclidean at
\(82\%\), Manhattan at
\(84\%\), and cosine similarity at
\(69\%\).
Similarly, in Room B, the performance for k = 3 shows that Euclidean yields \(94.6\%\), Manhattan gets \(100\%\), cosine similarity achieves \(97.3\%\), and Hamming distance provides \(81\%\). With k = 5, accuracy for Euclidean and cosine metric remain steady at \(97.3\%\), while Manhattan gets perfect accuracy (\(100\%\)), and Hamming improves slightly to \(86.5\%\). However, at k = 7, accuracy begins to decline across most metrics, with Euclidean at \(91.9\%\), cosine at \(94.6\%\), and Hamming at \(83.8\%\), though Manhattan remains consistently perfect with \(100\%\) accuracy.
Overall, both k = 3 and k = 5 yield strong results for the Euclidean and Manhattan distance metrics, but Manhattan distance proves superior across the board, particularly for Room B where it achieves \(100\%\) accuracy at both values of k over around 40 test points. In Room A, Manhattan also performs well with \(92\%\) accuracy, misclassifying 3 out of about 40 test points. The poorer performance observed at k = 7 can likely be attributed to boundary conditions, where fewer reference points are available outside the room, limiting the number of suitable neighbors for the test points situated at outside. This suggests that higher k values, such as 7 or above, may not be optimal for this attendance system. Additionally, other distance metrics like cosine similarity and Hamming distance do not perform well in either room.
5.2.1 Missing Access Point.
On a particular day, one or two access point(s) might be malfunctioning. We simulate this scenario and find how robust our system is. For this purpose, we deliberately remove one AP at a time and evaluate the overall accuracy using the remaining access points.
As shown in Figure
7, the performance remains relatively stable across most APs, with only slight fluctuations in accuracy. While the absence of some APs slightly decreases accuracy, the absence of one particular AP surprisingly results in the highest accuracy, even surpassing the baseline. Similarly, Figure
8 shows that removing a single AP has little impact on accuracy. Although accuracy decreases slightly when two APs are removed, for most APs, the accuracy remains unaltered. These findings suggest that during the attendance phase, if an AP fails, the built attendance system would still continue to operate with reasonable accuracy.
5.2.2 Device Heterogeneity.
To observe the impact of device heterogeneity on RSS values, we collect the signal strength data using two different Android smartphones: a) Device Model RMX3363, and b) Device Model SM-M127G. Signals are captured at various points using both devices, and some of the results are displayed in the following figures.
As seen in Figure
9, while the average signal values from the two devices are not identical, they exhibit similar patterns. The relative signal strength across all access points remains almost consistent between the two devices. For example, if a signal is weak at a specific point, it is weak on both the devices, and the same argument holds true for stronger signals. These findings suggest that our proposed method is robust enough to handle heterogeneity of user devices effectively.