Distributed localization for IoT with multi-agent reinforcement learning

Jia, Jie; Yu, Ruoying; Du, Zhenjun; Chen, Jian; Wang, Qinghu; Wang, Xingwei

doi:10.1007/s00521-021-06855-1

Distributed localization for IoT with multi-agent reinforcement learning

Original Article
Published: 29 January 2022

Volume 34, pages 7227–7240, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jie Jia ORCID: orcid.org/0000-0001-7296-5061^1,2,
Ruoying Yu¹,
Zhenjun Du³,
Jian Chen¹,
Qinghu Wang^1,2 &
…
Xingwei Wang^1,2

570 Accesses
1 Altmetric
Explore all metrics

Abstract

Localization has become one of the important techniques for Internet of Things (IoT). However, most existing localization methods need a central controller and operate on an off-line manner, which cannot satisfy the requirements of real-time IoT applications. In order to address this issue, a novel distributed localization scheme based on multi-agent reinforcement learning (MARL) is proposed. The localization problem is first reformulated as a stochastic game for maximizing the sum of the negative localization error. Each non-anchor node is then modeled as an intelligent agent, where its action space corresponds to possible locations. After that, we invoke a MARL framework on the basis of conventional Q-learning framework to learn the optimal policy, and to maximize the long-term expected reward. The novel strategy is also proposed to reduce the localization error. Extensive simulations demonstrate that the proposed localization method is superior to game theoretic-based distributed localization algorithm and virtual force-based distributed localization algorithm in terms of both localization accuracy and convergence speed, and is suitable for on-line localization scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and Implementation of SF Selection Based on Distance and SNR Using Autonomous Distributed Reinforcement Learning in LoRa Networks

Federated Reinforcement Learning for Automated LoRaWAN Management in Industrial IoT

Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning

Article 21 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor 17(4):2347–2376
Article Google Scholar
Shit RC, Sharma S, Puthal D, Zomaya AY (2018) Location of things (lot): a review and taxonomy of sensors localization in iot infrastructure. IEEE Commun Surv Tutor 20(3):2028–2061
Article Google Scholar
Huang L, Wu C, Aghajan H (2011) Vision-based user-centric light control for smart environments. Pervas Mob Comput 7(2):223–240
Article Google Scholar
Rezazadeh J, Moradi M, Ismail AS, Dutkiewicz E (2014) Superior path planning mechanism for mobile beacon-assisted localization in wireless sensor networks. IEEE Sens J 14(9):3052–3064
Article Google Scholar
Wang S, Luo F, Jing X, Zhang L (2017) Low-complexity message-passing cooperative localization in wireless sensor networks. IEEE Commun Lett 21(9):2081–2084
Article Google Scholar
Xu H, Sun H, Cheng Y, Liu H (2016) Wireless sensor networks localization based on graph embedding with polynomial mapping. Comput Netw 106:151–160
Article Google Scholar
Rezazadeh J, Subramanian R, Sandrasegaran K, Kong X, Moradi M, Khodamoradi F (2018) Novel ibeacon placement for indoor positioning in iot. IEEE Sens J 18(24):10240–10247
Article Google Scholar
Dong S, Zhang XG, Zhou WG (2020) A security localization algorithm based on DV-Hop against Sybil attack in wireless sensor networks. J Electr Eng Technol 15(5):919–926
Article Google Scholar
Messous S, Liouane H, Liouane N (2020) Improvement of DV-Hop localization algorithm for randomly deployed wireless sensor networks. Telecommun Syst 73:75–86
Article Google Scholar
Tomic S, Beko M (2019) Target localization via integrated and segregated ranging based on RSS and TOA measurements. Sensors 19(2):230
Article Google Scholar
Han Z, Leung CS, So HC, Constantinides AG (2018) Augmented Lagrange programming neural network for localization using time-difference-of-arrival measurements. IEEE Trans Neural Netw Learn Syst 29(8):3879–3884
Article MathSciNet Google Scholar
Moschitta A, Macii D, Trenti F, Dalpez S, Bozzoli A (2012) Characterization of a geometrical wireless signal propagation model for indoor ranging techniques. In: IEEE international instrumentation and measurement technology conference proceedings, pp 2598–2603
Wang P, Xue F, Li H, Cui Z, Chen J (2019) A multi-objective DV-hop localization algorithm based on NSGA-II in internet of things. Mathematics 7(2):184
Article Google Scholar
Shi Q, Xu Q, Zhang J (2018) An improved DV-Hop scheme based on path matching and particle swarm optimization algorithm. Wirel Pers Commun 104(4):1301–1320
Article Google Scholar
Khediri SE, Khan RU, Nasri N, Kachouri A (2021) Energy efficient adaptive clustering hierarchy approach for wireless sensor networks. Int J Electron 108(1):67–86
Article Google Scholar
Jia J, Zhang G, Wang X, Chen J (2013) On distributed localization for road sensor networks: a game theoretic approach. Math Probl Eng 2013(pt. 17):1–9
MathSciNet MATH Google Scholar
Xiong Z, Jia J, Chen J (2016) Distributed localization scheme based on virtual force in wireless sensor networks. Comput Sci 43(2):109–112
Google Scholar
Du B, Liu Y, Abbas IA (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353(2):448–461
Article MathSciNet Google Scholar
Abouelmagd EI, Awad ME, Elzayat E, Abbas IA (2014) Reduction the secular solution to the periodic solution in the generalized restricted three-body problem. Astrophys Space Sci 350(2):495–505
Article Google Scholar
Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177(C):409–415
Article Google Scholar
He S, Zhang M, Fang H, Liu F, Luan X, Ding Z (2019) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Comput Appl 32(18):1433–3058
Google Scholar
Zhang X, Wang H, Stojanovic V, Cheng P, Liu F (2021) Asynchronous fault detection for interval type-2 fuzzy nonhomogeneous higher-level Markov jump systems with uncertain transition probabilities. IEEE Trans Fuzzy Syst PP(99):1
Cheng P, He S, Luan X, Liu F (2021) Finite-region asynchronous ${\rm H}_\infty$ control for 2d Markov jump systems. Automatica 129(18):109590
Article MathSciNet Google Scholar
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press
Abed-Alguni BH (2017) Action-selection method for reinforcement learning-based on cuckoo search algorithm. Arab J Sci Eng 1:1–15
Google Scholar
Abed-Alguni B, Paul D, Chalup S, Henskens F (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. Comput Sci abs/1312.5602
Abed-Alguni BH, Chalup SK, Henskens FA, Paul DJ (2015) A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers. Vietnam J Comput Sci 2(4):213–226
Article Google Scholar
Abed-Alguni B, Ottom MA (2018) Double delayed q-learning. Int J Artif Intell 16(2):41–59
Google Scholar
So-In C, Permpol S, Rujirakul K (2016) Soft computing-based localizations in wireless sensor networks. Pervas Mob Comput 29:17–37
Article Google Scholar
Zhu F, Wei J (2016) Localization algorithm for large scale wireless sensor networks based on fast-SVM. Wirel Pers Commun 95(3):1859–1875
Article MathSciNet Google Scholar
Peng B, Li L (2015) An improved localization algorithm based on genetic algorithm in wireless sensor networks. Cogn Neurodynam 9(2):249–256
Article Google Scholar
Li H (2012) Multi-agent q-learning of channel selection in multi-user cognitive radio systems: a two by two case. pp. 1893–1898
Galindo-Serrano A, Giupponi L (2010) Distributed q-learning for aggregated interference control in cognitive radio networks. IEEE Trans Veh Technol 59(4):1823–1834
Article Google Scholar
Asheralieva A, Miyanaga Y (2016) An autonomous learning-based algorithm for joint channel and power level selection by d2d pairs in heterogeneous cellular networks. IEEE Trans Commun PP(9):1
Wang Q (2012) Research on an indoor positioning technology based on RSSI ranging. Electro Sci Tech 25:64–66
Google Scholar
Shoham Y, Leyton-Brown K (2008) Multiagent systems: algorithmic, game-theoretic, and logical foundations. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 1893–1898
Neyman A (2003) From Markov chains to stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. Springer, Dordrecht, pp 9–25
Chapter Google Scholar
Pénard T (2004) La théorie des jeux et les outils d’analyse des comportements stratégiques. université de rennes, 1
Neto G (2005) From single-agent to multi-agent reinforcement learning: foundational concepts and methods. Learning theory course, 2
Watkins J, Dayan P (1992) Q-learning. Mach Learn 8:279–292
MATH Google Scholar
Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput 6(6):1185–1201
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants No. 61772126, 61972079, 62172084, and 62132004, in part by the Major Research Plan of National Natural Science Foundation of China under Grants No. 92167103, in part by the Fundamental Research Funds for the Central Universities under Grants N2016004, N2116004, N2116009, and N2024005-1, in part by the Central Government Guided Local Science and Technology Development Fund Project under Grant 2020ZY0003, in part by the Science and Technology Plan Project of Inner Mongolia Autonomous Region of China under Grant 2020GG0189, in part by the LiaoNing Revitalization Talents Program under Grant No. XLYC2007162, in part by the Young and Middle-aged Scientific and Technological Innovation Talent Support Program of Shenyang under Grant RC200548.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Jie Jia, Ruoying Yu, Jian Chen, Qinghu Wang & Xingwei Wang
Engineering Research Center of Security Technology of Complex Network System, Ministry of Education, Shenyang, 110819, China
Jie Jia, Qinghu Wang & Xingwei Wang
SIASUN Robot & Automation CO., Ltd., Shenyang, China
Zhenjun Du

Authors

Jie Jia
View author publications
You can also search for this author in PubMed Google Scholar
Ruoying Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenjun Du
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qinghu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingwei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Jia.

Ethics declarations

Conflict of interest

We promise that this manuscript is the authors’ original work and has not been published nor has it been submitted simultaneously elsewhere. All authors have checked the manuscript and have agreed to the submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Theorem 1

We proof the Theorem 1 based on theory in [42]. The Q-learning algorithm is iterative process of the stochastic game. Therefore, we consider to use stochastic approximation theory to prove the convergence of the Q-learning. The result of approximation theory given in [42] can be concluded in Lemma 1.

Lemma 1

The random iterative process $\varDelta ^{t+1}(x)$, which is given by

$$\begin{aligned} {{\varDelta ^{t+1}}({x})} = \left( 1 - {\alpha ^{t}}({x})\right) {{\varDelta ^{t}}({x})} + {\beta ^{t}}({x}){\varPhi ^{t}}({x}), \end{aligned}$$

(A.1)

converges to zero with the probability 1 only if the following conditions are met.

1.
The state space is finite;
2.
$\sum _{t=0}^{+\infty }{\alpha ^t} = {\infty }$, $\sum _{t=0}^{+\infty }{({\alpha ^t})}^2 {<} {\infty }$, $\sum _{t=0}^{+\infty }{\beta ^t} = {\infty }$, $\sum _{t=0}^{+\infty }{({\beta ^t})}^2 {<} {\infty }$, and $E\{{\beta }^{t}({x})|{\varGamma }^{t}\}$ $\le$ $E\{{\alpha }^{t}({x})|{\varGamma }^{t}\}$ with the probability 1;
3.
$\parallel {E\{{\varPhi }^{t}({x})|{\varGamma }^{t}\}}\parallel _\mathrm{w} \le$ $\delta \parallel \varDelta ^{t}{\parallel }_\mathrm{w}$, where ${\delta }\in (0,1)$;
4.
Var$\{{\varPhi }^{t}({x})|{\varGamma }^{t}\}$ $\le$ Z$(1+\parallel {\varDelta ^{t}}\parallel _\mathrm{w})^{2}$, where Z is a constant and Z > 0.

Here, $\varGamma ^{t}$ = {${\varDelta ^{t}},{\varDelta ^{t-1}},\ldots ,{\varPhi ^{t-1}},\ldots ,{\alpha ^{t-1}},\ldots ,{\beta ^{t-1}}$} denotes the past situation before time slot t. Symbol $\parallel \cdot \parallel _\mathrm{w}$ denotes the weighted maximum norm.

Now, we prove the Theorem 1 as follows.

The updating rule of Q-value is shown in (27), we now let both side of (27) subtract $Q_{i}^{*}({s _{i}},{a_i})$, we can derive

$$\begin{aligned} {{\varDelta _{i}^{t+1}}\left( {s_i},{a_i}\right) } = \left( 1 - {\alpha ^{t}}\right) {{\varDelta _{i}^{t}}\left( {s_i},{a_i}\right) } + {\alpha ^{t}}{\gamma }{\varPhi ^{t}}\left( {s_i},{a_i}\right) , \end{aligned}$$

(A.2)

where

$$\begin{aligned} {{\varDelta _{i}^{t+1}}\left( {s_i},{a_i}\right) }&= Q_{i}^{t}\left( {s_{i}},{a_i}\right) - {Q_{i}^{*}\left( {s_{i}},{a_i}\right) }, \end{aligned}$$

(A.3)

$$\begin{aligned} {{\varPhi _{i}^{t+1}}\left( {s_i},{a_i}\right) }&= {r_i^t} + {\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) - {Q_{i}^{*}\left( {s_{i}},{a_i}\right) }. \end{aligned}$$

(A.4)

Observing the above analysis, we know that the Q-learning algorithm is the specific case of Lemma 1 with $\beta ^{t}$ = $\alpha ^{t}$.

Then, we prove the Q-learning algorithm satisfy the condition (3) and (4) in Lemma 1. We first introduce the contraction mapping.

Definition 4

For a set $\mathcal {Y}$, the mapping H: $\mathcal {Y} \rightarrow \mathcal {Y}$ is a contraction mapping, if there is a constant $\gamma$ and ${\gamma }\in (0,1)$, for any ${y_1}, {y_2}$ have

$$\begin{aligned} {\parallel \mathbf{H}{y_1}-\mathbf{H}{y_2}\parallel \le {\gamma }\parallel {y_1}-{y_2}\parallel }. \end{aligned}$$

(A.5)

Proposition 3

There is a contraction mapping H for function q and the q is the optimal Q-function in (A.7). We have

$$\begin{aligned} {{\parallel \mathbf{H}{q_1}({s_i},{a_i})-\mathbf{H}{q_2}({s_i},{a_i})\parallel _\infty } \le {\gamma }{\parallel {q_1}({s_i},{a_i})-{q_2}({s_i},{a_i})\parallel }_\infty }. \end{aligned}$$

(A.6)

Proof

The optimal Q-function in (26) can be reformulated to

$$\begin{aligned} {Q_{i}^{*}}\left( {s_{i}},{a_{i}}\right) =&{\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }\nonumber \\&\times {\left[ {R_{i}}{\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }+{\gamma }{\max \limits _{a_{i}^{'}}}{Q_{i}^{*}}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }\right] }. \end{aligned}$$

(A.7)

Then, we have

$$\begin{aligned} \mathbf{H}{q}{\left( s_i,a_i\right) } =&{\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }\nonumber \\&\times {\left[ {R_{i}}{\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }+{\gamma }{\max \limits _{a_{i}^{'}}}{q}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }\right] }. \end{aligned}$$

(A.8)

In order to derive the in Eq. (A.6), we make the following calculations.

$$\begin{aligned}&{\parallel \mathbf{H}{q_1\left( {s_{i},{a_{i}}}\right) } - \mathbf{H}{q_2\left( {s_{i},{a_{i}}}\right) }\parallel }_\infty \nonumber \\&\quad = {\max \limits _{s_{i},{a_i}}}{\gamma }{\bigg |}{\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }\left[ {\max \limits _{a_{i}^{'}}}{q_1}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) } - {\max \limits _{a_{i}^{'}}}{q_2}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }\right] {\bigg |}\nonumber \\&\quad \le {\max \limits _{s_{i},{a_i}}}{\gamma }{\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }{\bigg |}{\max \limits _{a_{i}^{'}}}{q_1}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) } - {\max \limits _{a_{i}^{'}}}{q_2}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }{\bigg |}\nonumber \\&\quad \le {\max \limits _{s_{i},{a_i}}}{\gamma }{\sum _{{s_{i}^{'}}}}{{T}({s_{i}},{a},{s_{i}^{'}})}{\max \limits _{a_{i}^{'}}}{\bigg |}{q_1}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) } - {q_2}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }{\bigg |}\nonumber \\&\quad = {\max \limits _{s_{i},{a_i}}}{\gamma }{\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a},{s_{i}^{'}}\right) }{\parallel }{q_1}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) } - {q_2}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }{\parallel }_\infty \nonumber \\&\quad = {\gamma }{\parallel }{q_1}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) } - {q_2}{\left( {s_{i}^{'}},{a_{i}^{'}}\right) }{\parallel }_\infty . \end{aligned}$$

(A.9)

We can derive the first equation in (A.9) from the definition of q, the second and the third inequations follow the properties of absolute value inequalities. The definition of infinity norm is used in the fourth equation. After the maximum calculation, we can derive the last equation. $\square$

According to (A.4) and (A.8), we have

$$\begin{aligned} {E}\left\{ {{\varPhi _{i}^{t+1}}\left( {s_i},{a_i}\right) }\right\}&= {\sum _{{s_{i}^{'}}}}{{T}\left( {s_{i}},{a_{i}},{s_{i}^{'}}\right) }\nonumber \\&\quad \times \,\left[ {r_i^t} + {\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) - {Q_{i}^{*}\left( {s_{i}},{a_i}\right) }\right] \nonumber \\&= \mathbf{H}{Q_{i}^{t}}\left( {s_{i},{a_{i}}}\right) - {Q_{i}^{*}}\left( {s_{i},{a_{i}}}\right) \nonumber \\&= \mathbf{H}{Q_{i}^{t}}\left( {s_{i},{a_{i}}}\right) - \mathbf{H}{Q_{i}^{*}}\left( {s_{i},{a_{i}}}\right) . \end{aligned}$$

(A.10)

Note that the optimal Q-value ${Q_{i}^{*}}({s_{i},{a_{i}}})$ is a constant so that we have ${Q_{i}^{*}}({s_{i},{a_{i}}})$ = H${Q_{i}^{*}}({s_{i},{a_{i}}})$. Hence, based on (A.3), Proposition 3, and (A.10), we have

$$\begin{aligned} \parallel {E}\{{{\varPhi _{i}^{t+1}}\left( {s_i},{a_i}\right) }\}\parallel _\infty&\le {\gamma }{\parallel {{Q_i^t}\left( {s_i},{a_i}\right) } - {Q_{i}^{*}\left( {s_{i}},{a_i}\right) }\parallel }_\infty \nonumber \\&={\gamma }{\parallel }{\varDelta _{i}^{t}\left( {s_{i},{a_{i}}}\right) {\parallel }_\infty }. \end{aligned}$$

(A.11)

According to (A.11), the condition (3) of Lemma 1 is proved. Finally, we consider the condition (4) of Lemma 1.

$$\begin{aligned}&\mathrm{Var}\left\{ {{\varPhi _{i}^{t+1}}\left( {s_i},{a_i}\right) }\right\} \nonumber \\&\quad = {E}\left\{ {r_i^t} + {\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) - {Q_{i}^{*}\left( {s_{i}},{a_i}\right) }\right. \nonumber \\&\qquad \left. -\, \mathbf{H}{Q_{i}^{t}}\left( {s_{i},{a_{i}}}\right) + {Q_{i}^{*}}\left( {s_{i},{a_{i}}}\right) \right\} \nonumber \\&\quad = {E}\left\{ {r_i^t} + {\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) - \mathbf{H}{Q_{i}^{t}}\left( {s_{i},{a_{i}}}\right) \right\} \nonumber \\&\quad = \mathrm{Var}\left\{ {r_i^t} +{\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) \right\} . \end{aligned}$$

(A.12)

As shown in (11), the strategy region for each agent is finite and thus the state and action space of each agent is finite. And the reward of each non-anchor node $v_i$ is only related to its neighbors and the number of the nodes in $N_i$ is finite. Therefore, the value of reward function $r_i^t$ for $v_i$ is bounded at any time slot t. Hence, we have

$$\begin{aligned} \mathrm{Var}\left\{ {r_i^t} + {\gamma }\underset{{{a_i^{'}}}\in {{\mathcal {A}_i}}}{max}{Q_i^t}\left( {s_i^{'}},{a_i^{'}}\right) \right\} \le {Z}{\left( 1+\parallel {\varDelta _{i}^{t}}{\left( {s_{i},a_{i}}\right) }\parallel _\mathrm{w}^{2}\right) }, \end{aligned}$$

(A.13)

where Z is a constant.

Therefore, ${\parallel }{\varDelta _m^{t}}({s_i},{a_i}){\parallel }$ converges to zero with the probability 1, that is $Q_{m}^{t}({s_{i},a_{i}})$ converges to $Q_{m}^{*}({s_{i},a_{i}})$ with the probability 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, J., Yu, R., Du, Z. et al. Distributed localization for IoT with multi-agent reinforcement learning. Neural Comput & Applic 34, 7227–7240 (2022). https://doi.org/10.1007/s00521-021-06855-1

Download citation

Received: 07 July 2021
Accepted: 12 December 2021
Published: 29 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00521-021-06855-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed localization for IoT with multi-agent reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design and Implementation of SF Selection Based on Distance and SNR Using Autonomous Distributed Reinforcement Learning in LoRa Networks

Federated Reinforcement Learning for Automated LoRaWAN Management in Industrial IoT

Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Theorem 1

Lemma 1

Definition 4

Proposition 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Distributed localization for IoT with multi-agent reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design and Implementation of SF Selection Based on Distance and SNR Using Autonomous Distributed Reinforcement Learning in LoRa Networks

Federated Reinforcement Learning for Automated LoRaWAN Management in Industrial IoT

Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

Lemma 1

Definition 4

Proposition 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation