RLProph: a dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks

Sharma, Deepak Kumar; Rodrigues, Joel J. P. C.; Vashishth, Vidushi; Khanna, Anirudh; Chhabra, Anshuman

doi:10.1007/s11276-020-02331-1

RLProph: a dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks

Published: 28 April 2020

Volume 26, pages 4319–4338, (2020)
Cite this article

Wireless Networks Aims and scope Submit manuscript

Deepak Kumar Sharma¹,
Joel J. P. C. Rodrigues^2,3,
Vidushi Vashishth¹,
Anirudh Khanna⁴ &
…
Anshuman Chhabra^4,5

636 Accesses
25 Citations
Explore all metrics

Abstract

Routing in Opportunistic Internet of Things networks (OppIoTs) is a challenging task because of intermittent connectivity between devices and the lack of a fixed path between the source and destination of messages. Recently, machine learning (ML) and reinforcement learning (RL) have been used with great success to automate processes in a number of different problem domains. In this paper, we seek to fully automate the OppIoT routing process by using the Policy Iteration algorithm to maximize the possibility of message delivery. Moreover, we model the OppIoT environment as a Markov decision process (MDP) replete with states, actions, rewards, and transition probabilities. The proposed routing protocol, RLProph, is able to optimize the routing process via the optimal policy obtained by solving the MDP using Policy Iteration. Through extensive simulations, we show that RLProph outperforms a number of ML-based and context-aware routing protocols on a multitude of performance criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Reinforcement Learning

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

Article 13 April 2024

Probabilistic Analysis and Optimization of Packet Losses in Dense LoRa Networks

Article 29 October 2021

References

Juang, P., Oki, H., Wang, Y., Martonosi, M., Peh, L. S., & Rubenstein, D. (2002). Energy-efficient computing for wildlife tracking: Design tradeoffs and early experiences with ZebraNet. ACM SIGARCH Computer Architecture News, 30(5), 96–107.
Article Google Scholar
Pentland, A., Fletcher, R., & Hasson, A. (2004). Daknet: Rethinking connectivity in developing nations. Computer, 37(1), 78–83.
Article Google Scholar
Doria, A., Uden, M., & Pandey, D. P. (2002). Providing connectivity to the saami nomadic community. International Conference on open collaborative design for sustainable innovation. 01/12/2002-02/12/2002.
Guo, B., Zhang, D., Wang, Z., Yu, Z., & Zhou, X. (2013). Opportunistic IoT: Exploring the harmonious interaction between human and the internet of things. Journal of Network and Computer Applications, 36(6), 1531–1539.
Article Google Scholar
Vahdat, A., & Becker, D. (2000). Epidemic routing for partially connected ad hoc networks. Technical Report CS-200006, Duke University. Available from: https://cloudcoder.cs.duke.edu/techreports/2000/2000-06.ps.
Lindgren, A., Doria, A., & Schelén, O. (2003). Probabilistic routing in intermittently connected networks. SIGMOBILE Mobile Computer Communication and Review., 7(3), 19–20. https://doi.org/10.1145/961268.961272.
Article Google Scholar
Dhurandher, S. K., Sharma, D. K., Woungang, I., & Bhati, S. (2013). HBPR: History based prediction for routing in infrastructure-less opportunistic networks. In 2013 IEEE 27th international Conference on advanced information networking and applications (AINA) (pp. 931–936).
Dhurandher, S. K., Sharma, D. K., Woungang, I., & Saini, A. (2015). Efficient routing based on past information to predict the future location for message passing in infrastructure-less opportunistic networks. The Journal of Supercomputing, 71(5), 1694–1711.
Article Google Scholar
Sharma, D. K., Dhurandher, S. K., Woungang, I., Srivastava, R. K., Mohananey, A., & Rodrigues, J. J. (2016). A machine learning-based protocol for efficient routing in opportunistic networks. IEEE Systems Journal, 12(3), 2207–2213.
Article Google Scholar
Vashishth, V., Chhabra, A., & Sharma, D. K. (2019). GMMR: A Gaussian mixture model based unsupervised machine learning approach for optimal routing in opportunistic IoT networks. Computer Communications, 134, 138–148.
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., & Wierstra, D., et al. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., et al. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354.
Article Google Scholar
Bellman, R. (1957). A Markovian Decision Process. Indiana University Mathematics Journal, 6, 679–684.
Article MathSciNet Google Scholar
Telser, L. G. (1961). Dynamic Programming and Markov Processes. Ronald A. Howard. Journal of Political Economy., 69(3), 296–297. https://doi.org/10.1086/258477.
Huang, T. K., Lee, C. K., & Chen, L. J. (2010). Prophet+: An adaptive prophet-based routing protocol for opportunistic network. In 2010 24th IEEE international Conference on advanced information networking and applications (AINA) (pp. 112–119). IEEE.
Nguyen, H. A., Giordano, S., & Puiatti, A. (2007). Probabilistic routing protocol for intermittently connected mobile ad hoc network (propicman). In IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (pp. 1–6). Available from: https://ieeexplore.ieee.org/abstract/document/4351696/.
Leguay, J., Friedman, T., & Conan, V. (2005). MobySpace: Mobility pattern space routing for DTNs. In ACM SIGCOMM Conference.
Burns, B., Brock, O., & Levine, B. N. (2005). MV routing and capacity building in disruption tolerant networks. In Proceedings IEEE 24th annual joint Conference of the IEEE computer and communications societies (vol. 1, pp. 398–408).
Boldrini, C., Conti, M., Jacopini, J., & Passarella, A. (2007). HiBOp: A history based routing protocol for opportunistic networks. In 2007 IEEE international Symposium on a world of wireless, mobile and multimedia networks (pp. 1–12).
Dhurandher, S. K., Borah, S. J., Obaidat, M. S., Sharma, D. K., Gupta, S., & Baruah, B. (2015). Probability-based controlled flooding in opportunistic networks. In: 2015 12th International joint Conference one-Business and telecommunications (ICETE) (vol. 6, pp. 3–8). IEEE.
Xiao, M., Wu, J., & Huang, L. (2014). Community-aware opportunistic routing in mobile social networks. IEEE Transactions on Computers, 63(7), 1682–1695.
Article MathSciNet Google Scholar
Sharma, D. K., Sharma, A., & Kumar, J., et al. (2017). KNNR: K-nearest neighbour classification based routing protocol for opportunistic networks. In: 2017 Tenth international Conference on contemporary computing (IC3) (pp. 1–6). IEEE.
Derakhshanfard, N., Sabaei, M., & Rahmani, A. (2017). CPTR: Conditional probability tree based routing in opportunistic networks. Wireless Networks. https://doi.org/10.1007/s11276-015-1136-4.
Article Google Scholar
Zhang, J., Huang, H., et al. (2019). Destination-aware metric based social routing for mobile opportunistic networks. Wireless Networks. https://doi.org/10.1007/s11276-018-01907-2.
Article Google Scholar
Jang, K., Lee, J., et al. (2016). An adaptive routing algorithm considering position and social similarities in an opportunistic network. Wireless Networks. https://doi.org/10.1007/s11276-015-1048-3.
Article Google Scholar
Malekyan, S., Bag-Mohammadi, M., et al. (2019). On selection of forwarding nodes for long opportunistic routes. Wireless Networks. https://doi.org/10.1007/s11276-017-1636-5.
Article Google Scholar
Wu, J., Chen, Z., & Zhao, M. (2019). Information cache management and data transmission algorithm in opportunistic social networks. Wireless Networks. https://doi.org/10.1007/s11276-018-1691-6.
Article Google Scholar
Spyropoulos, T., Psounis, K., & Raghavendra, C. S. (2005). Spray and wait: An efficient routing scheme for intermittently connected mobile networks. In Proceedings of the 2005 ACM SIGCOMM Workshop on Delay-tolerant networking (pp. 252–259). ACM.
Spyropoulos, T., Psounis, K., & Raghavendra, C. S. (2007). Spray and focus: Efficient mobility-assisted routing for heterogeneous and correlated mobility. In Null, (pp. 79–85). IEEE.
Keränen, A., Ott, J., & Kärkkäinen, T. (2009). The ONE simulator for DTN protocol evaluation. In Proceedings of the 2nd international conference on simulation tools and techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (pp. 55).
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Available from: https://trs.jpl.nasa.gov/handle/2014/35171.
Kurgan, L. A., & Cios, K. J. (2004). CAIM discretization algorithm. IEEE Transactions on Knowledge and Data Engineering, 16(2), 145–153.
Article Google Scholar
Tsai, C. J., Lee, C. I., & Yang, W. P. (2008). A discretization algorithm based on class-attribute contingency coefficient. Information Sciences, 178(3), 714–731.
Article Google Scholar
Scott, J., Gass, R., Crowcroft, J., Hui, P., Diot, C., & Chaintreau, A. (2009). CRAWDAD dataset cambridge/haggle (v. 2009-05-29); Downloaded from https://crawdad.org/cambridge/haggle/20090529.Accessed 6 Jan 2020.
Chhabra, A., Vashishth, V., & Sharma, D. K. (2017). A game theory based secure model against Black hole attacks in Opportunistic Networks. In 2017 51st Annual conference on information sciences and systems (CISS), (pp. 1–6). IEEE.
Chhabra, A., Vashishth, V., & Sharma, D. K. (2018). A fuzzy logic and game theory based adaptive approach for securing opportunistic networks against black hole attacks. International Journal of Communication Systems., 31(4), e3487.
Article Google Scholar
Chhabra, A., Vashishth, V., & Sharma, D. K. (2017). SEIR: A Stackelberg game based approach for energy-aware and incentivized routing in selfish Opportunistic Networks. In 2017 51st Annual Conference on information sciences and systems (CISS), (pp. 1–6). IEEE.
Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057–1063).
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Thirtieth AAAI Conference on Artificial Intelligence. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389.

Download references

Acknowledgements

This work is partially supported by FCT/MCTES through national funds and when applicable co-funded EU funds under the Project UIDB/EEA/50008/2020; and by Brazilian National Council for Research and Development (CNPq) via Grant No. 309335/2017-5.

Author information

Authors and Affiliations

Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India
Deepak Kumar Sharma & Vidushi Vashishth
Federal University of Piauí (UFPI), Campus Petrônio Portela, Ininga, Teresina, PI, Brazil
Joel J. P. C. Rodrigues
Instituto de Telecomunicações, Covilhã, Portugal
Joel J. P. C. Rodrigues
Division of Electronics and Communication Engineering, Netaji Subhas University of Technology, New Delhi, India
Anirudh Khanna & Anshuman Chhabra
Department of Computer Science, University of California, Davis, USA
Anshuman Chhabra

Authors

Deepak Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Joel J. P. C. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Vidushi Vashishth
View author publications
You can also search for this author in PubMed Google Scholar
Anirudh Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Anshuman Chhabra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Kumar Sharma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, D.K., Rodrigues, J.J.P.C., Vashishth, V. et al. RLProph: a dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks. Wireless Netw 26, 4319–4338 (2020). https://doi.org/10.1007/s11276-020-02331-1

Download citation

Published: 28 April 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11276-020-02331-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RLProph: a dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks

Abstract

Access this article

Similar content being viewed by others

Introduction to Reinforcement Learning

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

Probabilistic Analysis and Optimization of Packet Losses in Dense LoRa Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RLProph: a dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks

Abstract

Access this article

Similar content being viewed by others

Introduction to Reinforcement Learning

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

Probabilistic Analysis and Optimization of Packet Losses in Dense LoRa Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation