Elsevier

Ad Hoc Networks

Volume 110, 1 January 2021, 102278
Ad Hoc Networks

A deep reinforcement learning-based on-demand charging algorithm for wireless rechargeable sensor networks

https://doi.org/10.1016/j.adhoc.2020.102278Get rights and content

Abstract

Wireless rechargeable sensor networks are widely used in many fields. However, the limited battery capacity of sensor nodes hinders its development. With the help of wireless energy transfer technology, employing a mobile charger to charge sensor nodes wirelessly has become a promising technology for prolonging the lifetime of wireless sensor networks. Considering that the energy consumption rate varies significantly among sensors, we need a better way to model the charging demand of each sensor, such that the sensors are able to be charged multiple times in one charging tour. Therefore, time window is used to represent charging demand. In order to allow the mobile charger to respond to these charging demands in time and transfer more energy to the sensors, we introduce a new metric: charging reward. This new metric enables us to measure the quality of sensor charging. And then, we study the problem of how to schedule the mobile charger to replenish the energy supply of sensors, such that the sum of charging rewards collected by mobile charger on its charging tour is maximized. The sum of the collected charging reward is subject to the energy capacity constraint on the mobile charger and the charging time windows of all sensor nodes. We first prove that this problem is NP-hard. Due to the complexity of the problem, then deep reinforcement learning technique is exploited to obtain the moving path for mobile charger. Finally, experimental simulations are conducted to evaluate the performance of the proposed charging algorithm, and the results show that the proposed scheme is very promising.

Introduction

Wireless Sensor Networks (WSNs) have shown great potential for collecting information in modern society [1]. They have been widely used in many fields, such as forest fire monitoring, building surveillance, and so on [2], [3], [4]. In general, a wireless sensor network consists of a large number of sensors. Since these sensors are powered by energy-constrained batteries, the operational time of networks usually is limited, which hinders the development of sensor networks [5], [6], [7].

Considering that the battery capacity of each sensor is limited, it is crucial to replenish the energy supply of sensors before their battery runs out. Benefiting from the development of wireless charging technology [8], employing a moving vehicle equipped with wireless charging devices, called mobile charger (MC), to charge the sensors emerges as a very promising solution to address the issue [9], [10], [11], [12], [13], [14], [15], [16].

In Wireless Rechargeable Sensor Networks (WRSNs), most existing studies [17], [18], [19], [20] construct a charging path for MC and then dispatch MC to charge the sensor nodes by moving along the obtained path. In order to reduce the moving distance of MC, a shortest Hamiltonian cycle is always established as MC’s charging path, and MC charges each sensor node in turn along the Hamiltonian cycle [13], [21], [22], [23], [24], [25]. During a charging cycle, MC moves fixedly along the path. If there are new charging demands, they [15], [26] usually store these charging demands in base station (BS) and schedule MC to replenish the energy supply of these to-be-charged sensors in the next charging cycle. However, there is one significant limitation. If a sensor with a relatively high energy consumption rate has multiple charging demands in one cycle, it cannot be charged multiple times in these existing algorithms.

In a practical environment, the energy consumption rate varies significantly among sensor nodes [27], [28]. For example, sensors that are responsible for more data transmission tend to have higher energy consumption rates than other sensors. This limitation causes the sensor nodes in the to-be-charged set can only be charged in the next charging cycle, which may lead to some sensors running out of battery.

In this paper, aiming at addressing this problem, we use time window to represent the charging demand of the sensor and propose a new on-demand charging scheme. When a sensor’s energy level falls below a threshold, the charging time window of the sensor will be opened, that is, the charging demand occurs. At this point, MC can charge the sensor. The reason for not charging the sensor before its time window is opened is that the sensor has a relatively large amount of residual energy at that time. If the sensor is charged, the MC can only transfer a small amount of energy to the sensor and will consume a large amount of energy for movement. This is not efficient utilization of energy by the MC. When the energy of a sensor is exhausted, the charging time window of the sensor will be closed. Obviously, to avoid sensor death, MC should be dispatched to charge the sensor before its time window is closed. Assume that the battery capacity and energy consumption rate of each sensor are known, then MC can get the time when the next time window of the sensor will be opened after each time the sensor is fully charged. Since the energy of MC is divided into two parts, the energy obtained by the sensors and the energy consumed by MC’s movement, MC can transfer more energy to the sensors during each charging cycle by minimizing the energy consumed by MC’s movement (i.e., reducing the moving distance of MC). Therefore, given a WRSNs, the objective is to find a charging scheme that the number of dead sensors can be minimized, while the moving distance of the MC is the smallest.

It should be pointed out that our objective is different from the traditional traveling salesman problem with time windows (TSPTW) [29]. The traditional TSPTW is the problem of finding a minimum-cost path that only visits each of a set of points exactly once [29]. However, in our problem, each sensor node may be visited multiple times in one charging cycle. The reason is that, the time window of the sensor node with a high energy consumption rate may be opened multiple times during a charging cycle. In order to avoid its energy depletion, MC should charge it multiple times. Therefore, in such a WRSNs, an effective method is that each sensor continuously updates its time window information in a charging cycle, and MC can charge the sensors with high energy consumption rates multiple times based on the time window information. In this work, we study a fundamental effective on-demand charging problem: how to schedule an energy-constrained MC to charge sensors in real-time based on the time window information of each sensor, with the goal of minimizing the number of dead sensor nodes and the moving distance of MC.

The most intuitive solution is that, we first construct a TSPTW path along which the MC charges the nodes. When a node is fully charged, the node’s time window will be opened again after a while. Then the node can be inserted into the corresponding location in the existing path according to its time window, as shown in Fig. 1. Fig. 1 depicts the whole process of inserting a node into an existing path. The time window of each node is given after each node in the figure too. Fig. 1(a) shows that the charging sequence is a → b → c → d. x’ is the node that the MC has just visited before visiting node a. The next time window of this node is 16:00-17:00. According to this method, x’ will be inserted into the existing path between node c and node d. Fig. 1(b) presents the obtained charging path, a → b → c → x’ → d. In fact, the time windows of node b and c have a lot of overlap. If we let MC first visit node c, then node b and construct a charging sequence, a → c → b → x → d, as shown in Fig. 1(c), this path will be smaller than the path in Fig. 1(b).

We can conclude that the result of directly inserting a node into an existing path is not optimal. If we try to get a better path, in addition to inserting x’ directly into the existing path, the charging order before MC visits x’ also needs to be changed. This implies that existing paths cannot be utilized. It can be seen that after a node is charged and its time window information is updated, the charging path with the shortest moving distance cannot be obtained by directly inserting this node into the existing path.

In order to design a charging scheme to minimize the number of dead sensors by meeting the charging demands of each sensor node in time, and at the same time to minimize the moving distance of MC, we are facing the following challenges:

  • To obtain an effective on-demand charging scheme that sensors can be charged multiple times during one charging cycle, after a sensor is charged by MC, we cannot simply ignore it in the current charging cycle, but update its energy information and time window information. This allows the new charging scheme always to select the next sensor to be charged in a set of N sensors, where N is the total number of sensors in the network.

  • In order to get the optimal charging path, the charging scheme should take the state of each sensor (residual energy, location) in the network and the state of MC (residual energy) into consideration when selecting the next sensor to be charged. Since each charging behavior of the MC affects the state of each sensor in the network and the state of the MC itself, MC’s current charging behavior will affect what kind of charging behavior MC will choose in the future.

  • To optimize charging performance, the charging scheme not only needs to consider the short-term charging effect, such as which sensor MC should move to in the next step, but also consider the long-term charging effect, such as which sensor may potentially need to be charged in the future. As shown in Fig. 1(b), when MC is at node a, since the distance to node b is less than that to node c, the path a → b → c → x’ → d is obtained by only considering the short-term effect, but if the charging scheme can consider the long-term charging effect, select node c, and the path a → c → b → x → d will be obtained. Choosing node c is not the optimal choice in the short-term, but in the long run this may be more effective.

To tackle these challenges, we propose to exploit Reinforcement Learning (RL) techniques [30]. RL is a branch of machine learning. Compared with the classic supervised learning [31] and unsupervised learning [32] of machine learning, RL is characterized by learning from interaction. In the interaction with the environment, the agent (e.g., MC in our network) continuously learns knowledge according to the rewards obtained, making it more adaptable to the environment. RL has shown great potential in sequential decision-making area. For example, Google Deepmind adopts it on game playing and gets excellent results [33], [34]. Inspired by this, we apply RL techniques to get the charging path of desired optimality. The contributions of this paper are summarized as follows:

  • (1)

    To the best of our knowledge, this is the first work to model the real-time charging demands of sensor nodes as time window. Based on this, MC is dispatched to charge the sensors with the goal of minimizing the number of dead sensor nodes and the moving distance of MC (i.e., charging performance), subject to the constrained energy both of MC and sensors.

  • (2)

    This effective on-demand charging problem is formulated as a reward maximization problem (RMP). Then we prove RMP is NP-hard, and the exact solution for RMP is not feasible in practice.

  • (3)

    Due to the complexity of RMP, a novel deep reinforcement learning approach is proposed in this paper to obtain the charging path of MC, such that the charging reward collected is maximized.

  • (4)

    Simulation results are presented to show the effectiveness of the proposed charging scheme. It is illustrated that the charging performance can be significantly improved while avoiding dead sensor nodes.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces the system model and problem definition. Section 4 proves the NP-hardness of the concerned problem. Section 5 presents the framework of the charging scheme and describes the learning algorithm. Section 6 shows the simulation results of the proposed charging scheme, and Section 7 concludes the paper.

Section snippets

Related work

With the development of wireless power transfer technology, many studies have been conducted to prolong the lifetime of WRSNs. Generally speaking, there are two types of approaches for scheduling charging tasks: periodical charging methods or on-demand charging methods.

In periodical charging solutions, MC travels through a pre-planned path to charge the sensors in the WRSNs [18]. Xie et al. [35] conducted a theoretical study on energy usage for WRSNs. They solve the path planning problem by

Preliminaries

In this section, we present the system model and the formulation of our charging problem in WRSNs. We list the major notations used in the rest of the paper in Table 1.

NP-Hardness of RMP

Consider a special case of RMP, named RMP-T, which is obtained by fixing some characteristics of RMP. We show that the decision version of RMP-T is NP-hard, so is RMP.

Given a WRSN Gc = (Vc, Ec), where Vc contains a BS and a set of nodes to be charged, and Ec contains a set of links between each pair of nodes in Vc. In RMP-T, assume that BS has a time window [0, ∞], which means MC can return to BS at any time, and the charging reward of each node is non-negative since a large negative penalty

Solution for RMP

In this section, we first briefly introduce reinforcement learning [30] techniques. Then, we model the concerned problem and provide the learning algorithm.

Performance evaluation

In this section, we carry out extensive simulations to evaluate the advantages of the RL-based charging algorithm (RMP-RL). Moreover, characteristic properties analysis is provided, which can make a better understanding of our proposed charging method.

Conclusions and future work

In this paper, we studied the use of a mobile charger to replenish energy supply of sensor nodes in a wireless sensor network so that the sum of reward collected by the mobile charger is maximized, subject to the energy capacity constraint on the mobile charger and charging time windows of all sensor nodes. We showed that this problem is NP-hard, and exploited reinforcement learning techniques to obtain the charging scheme for the mobile charger. Finally, extensive experimental simulations are

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is partially supported by State Key Laboratory of Networking and Switching Technology (SKLNST-2019-2-06), Education Department of Sichuan Province (18ZA0404), Visual Computing and Virtual Reality Key Laboratory of Sichuan Province (SCVCVR2018.02VS), Key Technologies Research and Development Program (2017YFB0202403).

Xianbo Cao received the BSc degree in Automation Engineering from North China Electric Power University, P. R. China, in 2015. He now is a third year master student in computer science at Sichuan University. His research interests include wireless sensor networks and mobile computing.

References (47)

  • B. Liu et al.

    Novel methods for energy charging and data collection in wireless rechargeable sensor networks

    Int. J. Commun. Syst.

    (2017)
  • A. Kurs et al.

    Wireless power transfer via strongly coupled magnetic resonances

    Science

    (2007)
  • C. Lin et al.

    P2s: A primary and passer-by scheduling algorithm for on-demand charging architecture in wireless rechargeable sensor networks

    IEEE Trans. Veh. Technol.

    (2017)
  • X. Ye et al.

    Charging utility maximization in wireless rechargeable sensor networks

    Wirel. Netw.

    (2017)
  • C. Yang et al.

    A priority-based energy replenishment scheme for wireless rechargeable sensor networks

    2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA)

    (2017)
  • S. Zhang et al.

    Collaborative mobile charging

    IEEE Trans. Comput.

    (2014)
  • G. Jiang et al.

    Joint charging tour planning and depot positioning for wireless sensor networks using mobile chargers

    IEEE/ACM Trans. Netw.

    (2017)
  • W. Liang et al.

    Approximation algorithms for charging reward maximization in rechargeable sensor networks via a mobile charger

    IEEE/ACM Trans. Netw. (TON)

    (2017)
  • T. Liu et al.

    Learning an effective charging scheme for mobile devices

    34th International Parallel and Distributed Processing Symposium (IPDPS)

    (2020)
  • C. Lin et al.

    Tsca: a temporal-spatial real-time charging scheduling algorithm for on-demand architecture in wireless rechargeable sensor networks

    IEEE Trans. Mob. Comput.

    (2018)
  • F. Sangare et al.

    Mobile charging in wireless-powered sensor networks: optimal scheduling and experimental implementation

    IEEE Trans. Veh. Technol.

    (2017)
  • W. Xu et al.

    Approximation algorithms for the team orienteering problem

    IEEE INFOCOM 2020-IEEE Conference on Computer Communications

    (2020)
  • S. Zhang et al.

    Optimizing itinerary selection and charging association for mobile chargers

    IEEE Trans. Mob. Comput.

    (2016)
  • Cited by (39)

    View all citing articles on Scopus

    Xianbo Cao received the BSc degree in Automation Engineering from North China Electric Power University, P. R. China, in 2015. He now is a third year master student in computer science at Sichuan University. His research interests include wireless sensor networks and mobile computing.

    Wenzheng Xu received the B.Sc., M.E., and Ph.D. degrees from Sun Yat-sen University, Guangzhou, China, in 2008, 2010, and 2015, respectively, all in computer science. He was a Visitor with The Australian National University. He currently is a Special Associate Professor with Sichuan University. His research interests include wireless ad hoc and sensor networks, mobile computing, approximation algorithms, combinatorial optimization, online social networks, and graph theory.

    Xuxun Liu received the Ph.D. degree in communication and information systems from Wuhan University, China. He is currently a professor with the School of Electronic and Information Engineering, South China University of Technology, China. His current research interests include Internet of Things, smart cities, mobile computing, and social networks. He has authored or coauthored over fifty scientific papers in international journals and conference proceedings, including five ESI highly cited papers. He serves as an Associate Editor of Wireless Networks (Springer) and IEEE Access, and as Workshop Chair, Publication Chair or TPC Member of a number of conferences.

    Jian Peng received the BS degree in electrical engineering from the University of Electronic and Science of China, China, in 1992, and the MS degree in computer science from the Chengdu University of Technology, China, in 1999, and the PhD degree in computer science from the University of Electronic and Science of China, China, in 2004. Since 1999, he has been with the College of Computer Science, Sichuan University, where he is currently a professor. His research interests include wireless sensor networks and big data.

    Tang Liu received the BS degree in computer science from University of Electronic and Science of China, China, in 2003, and the MS and PhD degree in computer science from Sichuan University, in 2009 and 2015, respectively. Since 2003, he has been with College of Computer Science, Sichuan Normal University, where he is currently a professor. His research interests include wireless charging and wireless sensor networks.

    1

    The first two authors contributed equally to this work.

    View full text