Abstract
The dynamicity of distributed wireless networks caused by node mobility, dynamic network topology, and others has been a major challenge to routing in such networks. In the traditional routing schemes, routing decisions of a wireless node may solely depend on a predefined set of routing policies, which may only be suitable for a certain network circumstances. Reinforcement Learning (RL) has been shown to address this routing challenge by enabling wireless nodes to observe and gather information from their dynamic local operating environment, learn, and make efficient routing decisions on the fly. In this article, we focus on the application of the traditional, as well as the enhanced, RL models, to routing in wireless networks. The routing challenges associated with different types of distributed wireless networks, and the advantages brought about by the application of RL to routing are identified. In general, three types of RL models have been applied to routing schemes in order to improve network performance, namely Q-routing, multi-agent reinforcement learning, and partially observable Markov decision process. We provide an extensive review on new features in RL-based routing, and how various routing challenges and problems have been approached using RL. We also present a real hardware implementation of a RL-based routing scheme. Subsequently, we present performance enhancements achieved by the RL-based routing schemes. Finally, we discuss various open issues related to RL-based routing schemes in distributed wireless networks, which help to explore new research directions in this area. Discussions in this article are presented in a tutorial manner in order to establish a foundation for further research in this field.
Similar content being viewed by others
References
Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38(4):393–422
Akyildiz IF, Lee WY, Chowdhury KR (2009) Cognitive radio ad hoc networks. Ad Hoc Netw 7(5):810–836
Al-Rawi HAA, Yau K-LA (2012) Routing in distributed cognitive radio networks: a survey. Wirel Pers Commun Int J. doi:10.1007/s11277-012-0674-7
Albus JS (1975) A new approach to manipulator control: the cerebellar model articulation controller. J Dyn Syst Meas Control 97:220–227
Arroyo-Valles R, Alaiz-Rodriquez R, Guerrero-Curieses A, Cid-Sueiro J (2007) Q-probabilistic routing in wireless sensor networks. In: Proceedings of ISSNIP 3rd international conference intelligent sensors, sensor network and information processing, pp. 1–6
Baruah P, Urgaonkar R (2004) Learning-enforced time domain routing to mobile sinks in wireless sensor fields. In: Proceedings of LCN 29th annals IEEE international conference local computer networks, pp. 525–532
Bhorkar AA, Naghshvar M, Javidi T, Rao BD (2012) Adaptive opportunistic routing for wireless ad hoc networks. IEEE ACM Trans Netw 20(1):243–256
Boyan J, Littman ML (1994) Packet routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of NIPS Adv neural information processing systems, pp 671–678
Boukerche A (2009) Algorithms and protocols for wireless, mobile and ad hoc networks. Wiley, New Jersey
Burleigh S, Hooke A, Torgerson L, Fall K, Cerf V, Durst B, Scott K, Weiss H (2003) Delay-tolerant networking: an approach to interplanetary internet. IEEE Commun Mag 41(6):128–136
Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215–250
Chang Y-H, Ho T, Kaelbling LP (2004) Mobilized ad-hoc networks: a reinforcement learning approach. In: Proceedings of ICAC international conference autonomic computer, pp 240–247
Chetret D, Tham C-K, Wong LWC (2004) Reinforcement learning and CMAC-based adaptive routing for MANETs. In: Proceedings of ICON 12th IEEE international conference networks, pp. 540–544
Clausen T, Jacquet P (2003) Optimized link state routing protocol (OLSR). IETF RFC 3626
Dearden R, Friedman N, Andre D (1999) Model based Bayesian exploration. In: Proceedings of UAI 15th conference uncertainty, artificial intelligence, pp 150–159
Di Felice M, Chowdhury KR, Wu C, Bononi L, Meleis W (2010) Learning-based spectrum selection in cognitive radio ad hoc networks. In: Proceedings of WWIC 8th international conference wired wireless internet communications, pp 133–145
Dong S, Agrawal P, Sivalingam K (2007) Reinforcement learning based geographic routing protocol for UWB wireless sensor network. In: Proceedings of GLOBECOM IEEE global telecommunications conference, pp 652–656
Dowling J, Curran E, Cunningham R, Cahill V (2005) Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing. IEEE Trans Syst Man Cybern Part A Syst Hum 35(3):360–372
Elwhishi A, Ho P-H, Naik K, Shihada B (2010) ARBR: Adaptive reinforcement-based routing for DTN. In: Proceedings of WIMOB IEEE 6th international conference wireless and mobile computes, networks and communications, pp. 376–385
Forster A (2007) Machine learning techniques applied to wireless ad-hoc networks: guide and survey. In: Proceedings of ISSNIP 3rd international conference intelligent sensors, sensor Networks and information, pp. 365–370
Forster A, Murphy AL (2007) FROMS: Feedback routing for optimizing multiple sinks in WSN with reinforcement learning. In: Proceedings of ISSNIP 3rd international conference intelligent sensors, sensor Networks and, informations, pp. 371–376
Forster A, Murphy AL, Schiller J, Terfloth K (2008) An efficient implementation of reinforcement learning based routing on real WSN hardware. In: Proceedings of WIMOB IEEE international conference wireless and mobile computers, networks and communcations, pp 247–252
Fu P, Li J, Zhang D (2005) Heuristic and distributed QoS route discovery for mobile ad hoc networks. In: Proceedings of the CIT 5th international conference on computer and information technology, pp. 512–516
Gen M, Cheng R (1999) Genetic algorithms and engineering optimization. Wiley, NY
Hao S, Wang T (2006) Sensor networks routing via Bayesian exploration. In: Proceedings of LCN 31th annals of IEEE international conference local computing Networks, pp. 954–955
Hu T, Fei Y (2010) QELAR: a machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater sensor networks. IEEE Trans Mobile Comput 9(6):796–809
Intanagonwiwat C, Govindan R, Estrin D, Heidemann J, Silva F (2003) Directed diffusion for wireless sensor networking. IEEE ACM Trans Netw 11(1):2–16
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference neural networks. pp 1942–1948
Kumar S, Miikkulainen R (1997) Dual reinforcement Q-routing: an on-line adaptive routing algorithm. In: Proceedings of ANNIE artificial neural networks in engineering conference. pp 231–238
Liang X, Balasingham I, Byun S-S (2008) A multi-agent reinforcement learning based routing protocol for wireless sensor networks. In: Proceedings of ISWCS IEEE international symposium Wireless communications systems. pp 552–557
Lin Z, Schaar Mvd (2011) Autonomic and distributed joint routing and power control for delay-sensitive applications in multi-hop wireless networks. IEEE Tran Wirel Commun 10(1):102–113
Naruephiphat W, Usaha W (2008) Balancing tradeoffs for energy-efficient routing MANETs based on reinforcement learning. In: Proceedings of VTC spring IEEE vehicular techmology conference. pp 2361–2365
Nurmi P (2007) Reinforcement learning for routing in ad hoc networks. In: Proceedings of WiOpt 5th international symposium modeling and optimization in mobile, ad hoc and wireless network and workshops, pp 1–8
Ouzecki D, Jevtic D (2010) Reinforcement learning as adaptive network routing of mobile agents. In: Proceedings of MIPRO 33rd international convention, pp 479–484
Perkins CE, Royer EM (1999) Ad-hoc on-demand distance vector routing. In: Proceedings of WMCSA mobile computers systems and applications, pp 90–100
Rojas R (1996) Neural networks: a systematic introduction. Springer, NY
Santhi G, Nachiappan A, Ibrahime MZ, Raghunadhane R, Favas MK (2011) Q-learning based adaptive QoS routing protocol for MANETs. In: Proceedings of ICRTIT international conference recent trends in information technology, pp 1233–1238
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Snyman A (2005) Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms. Springer, NY
Toh CK (2001) Ad hoc mobile wireless networks: protocols and systems. Prentice Hall, New Jersey
Usaha W (2004) A reinforcement learning approach for path discovery in MANETs with path caching strategy. In: Proceedings of ISWCS 1st international symposium wireless communications systems, pp 220–224
Xia B, Wahab MH, Yang Y, Fan Z, Sooriyabandara M (2009) Reinforcement learning based spectrum-aware routing in multi-hop cognitive radio networks. In: Proceedings of CROWNCOM 4th international conference cognitive radio oriented wireless networks and communications, pp 1–5
Yau K-LA, Komisarczuk P, Teal PD (2012) Reinforcement learning for context awareness and intelligence in wireless networks: review, new features and open issues. J Netw Comput Appl 35(1):253–267
Yin GG, Krishnamurthy V (2005) Least mean square algorithms with markov regime-switching limit. IEEE Trans Autom Control 50(5):577–593
Yu FR, Wong VWS, Leong VCM (2008) A new QoS provisioning method for adaptive multimedia in wireless networks. IEEE Trans Veh Technol 57(3):1899–1909
Zhang Y, Fromherz M (2006) Constrained flooding: a robust and efficient routing framework for wireless sensor networks. In: Proceedings of AINA 20th international conference advanced information networking and applications
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Rawi, H.A.A., Ng, M.A. & Yau, KL.A. Application of reinforcement learning to routing in distributed wireless networks: a review. Artif Intell Rev 43, 381–416 (2015). https://doi.org/10.1007/s10462-012-9383-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9383-6