Elsevier

Internet of Things

Volume 14, June 2021, 100384
Internet of Things

Reinforcement learning-based fuzzy geocast routing protocol for opportunistic networks

https://doi.org/10.1016/j.iot.2021.100384Get rights and content

Abstract

In a communication environment like opportunistic networks (OppNets) where there is no stable path, the message routing is a challenge. In this sense, using a deep learning approach that focuses on utilizing the agents in the environment to fulfill the message routing task has proven to be effective. This paper proposes a novel routing scheme for OppNets called Reinforcement Learning-based Fuzzy Geocast Routing Protocol (RLFGRP) for OppNets, in which a fuzzy controller makes use of the node’s Q-value, reward value and remaining buffer space as input parameters to determine the likelihood of that node to be selected as suitable forwarder of a message from source towards its destination. Through simulations using real mobility traces, the proposed RLFGRP protocol is shown to outperform the established Geocast Fuzzy-Based Check-and-Spray routing (FCSG) [4] and the Fuzzy logic-based Q-learning routing (FQLRP) [9] protocols in terms of overhead ratio, delivery ratio and average latency.

Introduction

Due to the frequent intermittent connectivity and dynamic topology of OppNets, achieving an efficient message routing in such networks is challenging [1]. This requires designing an utility function that determines progressively a set of relay nodes elected as suitable forwarders of the message from source to destination, based on the considered nodes’ local parameters such as speed, velocity, remaining energy, buffer space and list of neighbours, to name a few. Those selected relay nodes follow the so-called store-carry-forward mechanism for message forwarding [1].

In an OppNet, the nodes meet opportunistically and considering the computational time that the routing protocol may take to make a decision on selecting the best relay nodes to carry the message toward its destination, which might be higher than expected due to possible inefficient feedback mechanisms from these nodes, it has been argued that a reinforcement learning can be adopted [2].

In general, the technique of reinforcement learning [2] consists of enabling a software agent to perform a learning-based trial-and-error method in an environment using the feedback from its actions and experiences. In this paper, an instance of reinforcement learning technique referred to as Q-learning [3] is jointly utilized in combination with a geocast technique [4] and fuzzy logic [5] to perform the selection of suitable message forwarders from source to destination.

The proposed approach consists of two phases. In the first phase, a reinforcement learning technique (precisely a Q-learning mechanism) is applied for message forwarding towards the destination cast. This mechanism works on the basis of a continuous process that assigns a reward (or punishment in the form of reduced reward) and a Q-value to each involved node, depending on the action taken by that node, the goal being to strengthen the forwarding strategy [6].

More precisely, some information related to a node, namely the node’s remaining energy, Euclidean distance from the node to the source of the message, delivery probability, and direction with respect to the destination node, are utilized as parameters in a reward function that calculates the reward of each node, noting that the highest the reward of a node is, the more likely that node will be selected as best next hop of the message, notwithstanding that the rewards and stochastic state transitions are considered in such prediction by continuously interacting with the environment with no known prior information. Next, the derived node’s Q-value, reward value and remaining buffer space are used as input parameters by a fuzzy controller to determine the likelihood of that node to be selected as next hop of the message.

In this iterative process, the Q-value of a node is updated on the basis of the reward value, the action of the node, and the Q-learning algorithm’s discount rate and learning rate parameters. In addition, the message replicas are only forwarded to the selected relay nodes, which helps reducing the routing overhead. In the second phase, the message is flooded within the geocast region using the Check-and-Spray technique inherited from [4], which involves two steps: (1) checking the node’s location, checking the node’s energy level against a prescribed energy threshold, and checking whether the message is already present in the node’s cast; then (2) flooding the message to only those nodes for which Step (1) has been successful.

The remainder of the paper is organized as follows. In Section 2, some related work are discussed. In Section 3, the proposed RLFGRP protocol is presented. In Section 4, its performance evaluation using simulations is described. Section 5 concludes the paper.

Section snippets

Related work

There are a number of works in the literature on routing schemes for delay tolerant networks, which have used the concepts of reinforcement learning, fuzzy logic or geocast technique in their designs. Representative ones are as follows.

Rolla and Curado [2] introduced a routing technique for delay tolerant networks (DTNs), in which multi-agent reinforcement learning methods are used to assist in the learning of the paths in the network, in such a way as to construct the replicas of the messages

Proposed RLFGRP protocol

The proposed Reinforcement Learning-based Fuzzy Geocast Routing Protocol (RLFGRP) protocol is designed by following a feedback mechanism whereby the system learns and adjusts itself for efficient message routing purpose using a fuzzy Q-learning based algorithm. This algorithm operates in such a way that whenever a source node wishes to send a message to a destination node, the neighbour nodes of the source node are first discovered and Q-values are assigned to these nodes by a fuzzy learning

Performance evaluation of the proposed RLFGRP scheme

The performance evaluation of the proposed RLFGRP scheme is conducted using the ONE simulator [15] under real traces mobility model [16] and compared against that of the FCSG [4] and FQLRP [9] protocols in terms of delivery ratio, overhead ratio, and average latency, under varying buffer size, and Time-to-Live (TTL). In our work, the overhead ratio is defined as the ratio of the number of relayed messages minus the number of successfully delivered message to the number of successfully delivered

Conclusion

This paper has proposed a Q-learning Fuzzy Geocast Routing Protocol (RLFGRP) for OppNets which uses fuzzy logic and Q-learning to calculate the likelihood of a node to be selected as best forwarder of a message from source towards its destination. Simulation results using real mobility traces have shown that the proposed RLFGRP scheme is superior to the FCSG and FQLRP protocols in terms of overhead ratio, delivery ratio, and average latency. As future work, we plan to design a security

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is partially sponsored by a grant held by the 2nd author, from the National Science and Engineering Research Council of Canada (NSERC) [Grant number: RGPIN-2017-04423].

References (26)

  • S.K. Dhurandher et al.

    Reinforcement learning-based routing protocol for opportunistic networks

    IEEE International Conference on Communications (ICC)

    (2020)
  • F. Yuan et al.

    A double q-learning routing in delay tolerant networks

    IEEE International Conference on Communications (ICC)

    (2019)
  • C. Wu et al.

    Flexible, portable, and practicable solution for routing in vanets: a fuzzy constraint q-learning approach

    IEEE Trans. Veh. Technol.

    (2013)
  • Cited by (9)

    View all citing articles on Scopus
    View full text