Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system

doi:10.1016/j.future.2018.12.037

Future Generation Computer Systems

Volume 94, May 2019, Pages 587-600

https://doi.org/10.1016/j.future.2018.12.037 Get rights and content

Highlights

•
A novel method of network packet routing is proposed.
•
The method can be applied to routing problem in different contexts.
•
Method outperforms state-of-the-art routing protocols in computer network routing.
•
Method outperforms state-of-the-art routing protocols in baggage handling system control.

Abstract

Packet routing problem most commonly emerges in the context of computer networks, thus the majority of routing algorithms existing nowadays is designed specifically for routing in computer networks. However, in the logistics domain, many problems can be formulated in terms of packet routing, e.g. in automated traffic routing or material handling systems. In this paper, we propose an algorithm for packet routing in such heterogeneous environments. Our approach is based on deep reinforcement learning networks combined with link-state protocol and preliminary supervised learning. Similarly to most routing algorithms, the proposed algorithm is a distributed one and is designed to run on a network constructed from interconnected routers. Unlike most other algorithms, proposed one views routers as learning agents, representing the routing problem as a multi-agent reinforcement learning problem. Modeling each router as a deep neural network allows each router to account for heterogeneous data about its environment, allowing for optimization of more complex cost functions, like in case of simultaneous optimization of bag delivery time and energy consumption in a baggage handling system. We tested the algorithm using manually constructed simulation models of computer network and baggage handling system. It outperforms state-of-the-art routing algorithms.

Introduction

Distributed routing problem first emerged in computer science in the context of packet routing in computer networks. Over the years, many network routing algorithms such as Open Shortest Path First (OSPF) [1] has been developed and standardized, and now they are widely used in computer networks.

However, the distributed routing problem emerges in contexts different from computer networks. An example of such context is baggage handling systems (BHS) primarily used in airports (Fig. 1) or material handling systems used in manufacturing enterprises and logistics centers. In [2] it was shown that baggage handling control can be approached in a distributed way, by using a simple distance-vector routing protocol, which was originally developed for packet routing in computer networks [3].

Though it is possible to use algorithms for network routing in other contexts, this is not always an optimal solution. Some constraints may exist, which are significantly different from constraints presented in the network routing problem. For example, in the case of BHS, we may aim to minimize energy consumption of conveyors together as well as the speed of baggage delivery. Algorithms designed for network routing do not fit for solving problems with such constraints.

In this paper, we propose a routing algorithm based on reinforcement learning approach. The idea of viewing packet routing as a reinforcement learning problem was first implemented in [4] using Q-routing algorithm. The algorithm we propose in this paper is largely based on Q-routing, but has the following differences:

•
Value function is approximated via neural network (NN) instead of lookup table.
•
Learning agents use additional information, such as information about current graph topology in order to make more precise estimations of action values. This information is passed between agents via complementary protocols, such as the link-state protocol.
•
Preliminary application of supervised learning on examples of baseline agent behavior is used to avoid divergence.

Using neural networks for value function approximation allows learning agents to consider arbitrary information, which may be related to routing efficiency, thus allowing the algorithm to be applied efficiently for routing in heterogeneous environments (such as baggage handling systems).

The rest of this paper is structured as follows. In Section 2, we specify the packet routing problem in a generalized way, review existing routing algorithms and reinforcement learning techniques and formulate the routing problem in terms of reinforcement learning. Then in Section 3, we describe the algorithm, as well as several considered NN architectures. In Section 4, we provide results of an experimental comparison of the proposed algorithm with link-state based shortest path algorithm (which is basically a simplified version of OSPF, one of the most widely used network routing protocol nowadays) and Q-routing (which is the basis of the proposed algorithm). A comparison is conducted in two different environments: the simulation model of a computer network and the simulation model of BHS. Section 5 is devoted to discussion of the results and drawing conclusions.

Section snippets

Formulation of generalized routing problem

In this section, we formally describe packet routing problem in a generalized way, so that resulting definition can be used to describe the problem in a variety of settings.

We model the network as a directed graph $G = (V, E)$ , where each vertex $v \in V$ corresponds to a network node (e.g. router or switch), and every edge $e \in E \subset V^{2}$ corresponds to a link between nodes. Packets are sent between the nodes in the network: a packet may be sent from one of the source nodes $V_{s} \subseteq V$ and be directed towards one of the

Basic algorithm

The method we propose, which we call DQN-routing, is largely based on Q-routing, which was described in Section 2.3. The method is as follows:

•
Every router processes one packet at a time. Processed packed is referred to as current packet. A packet $p$ consists of its destination $d$ and its state $s_{p}$ ( $s_{p}$ may be empty).
•
Every router $v$ has a current state $s_{v} = (v, d, s_{p}, u_{1}, \dots, u_{m}, X_{1}, . ., X_{k})$ , where $d$ is the destination node of current packet, $s_{p}$ is the state of current packet, $u_{1}, \dots, u_{m}$ are neighbors of $v$ and $X_{1}, \dots$

Experiments and results

We performed experiments in two simulation models: a model of a computer network and a model of a baggage handling system. In both environments, we compared DQN-routing with the shortest path (Dijkstra) algorithm with a link-state protocol and Q-routing algorithm. The link-state shortest path algorithm was chosen because link-state protocols are dominant in computer network routing nowadays, and Q-routing was chosen because DQN-routing is based on it.

Conclusion

We presented a novel distributed routing approach that is based on machine learning and can be applied both in communication networks and in physical systems, such as baggage handling systems. The unique strength of the proposed method is its ability to optimize simultaneously the travel time of the routed entities and energy consumption. Comparison with contemporary routing algorithms confirms substantial gain of the proposed method.

However, the proposed method has certain limitations. The

Acknowledgments

Authors would like to thank Arip Asadulaev and Ivan Smetannikov for useful comments. The results were obtained under the research project supported by the Ministry of Education and Science of the Russian Federation , Project No 2.8866.2017/8.9.

Dmitry Mukhutdinov is a Master’s student in Computer Science at ITMO University, Russia. He obtained BSc in Computer Science at ITMO University in 2017. His main scientific interests are artificial intelligence, distributed systems and formal logic. Apart of studying and doing research, Dmitry works as a software developer in Serokell — an R&D company which focuses on functional programming and distributed systems.

References (29)

McQuillanJ.M. et al.
The ARPA network design decisions
Comput. Netw.
(1977)
MoyJ.
OSPF version 2
(1998)
YanJ. et al.
Distributed software architecture enabling peer to peer communicating controllers
IEEE Trans. Ind. Inf.
(2013)
BoyanJ.A. et al.
Packet routing in dynamically changing networks: A reinforcement learning approach
Adv. Neural Inf. Process. Syst.
(1994)
SpaanM.T.
Partially observable Markov decision processes
M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, in: Proceedings of the Tenth...
McQuillanJ.M. et al.
The new routing algorithm for the ARPANet
IEEE Trans. Commun.
(1980)
Di CaroG. et al.
AntNet: Distributed stigmergetic control for communications networks
J. Artificial Intelligence Res.
(1998)
ChoiS.P.M. et al.
Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
Adv. Neural Inf. Process. Syst.
(1996)
KumarS. et al.
Dual reinforcement Q-routing: An on-line adaptive routing algorithm
Artif. Neural Netw. Eng.
(1997)

Al-RawiH.A. et al.

Application of reinforcement learning to routing in distributed wireless networks: A review

Artif. Intell. Rev.

(2015)

GhaffariA.

Real-time routing algorithm for mobile ad hoc networks using reinforcement learning and heuristic algorithms

Wirel. Netw.

(2017)

DesaiR.M. et al.

Prioritized sweeping reinforcement learning based routing for MANETs

Indonesian J. Electr. Eng. Comput. Sci.

(2017)

MalialisK. et al.

Resource abstraction for reinforcement learning in multiagent congestion problems

Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

(2016)

Cited by (37)

Reinforcement learning for multi-item retrieval in the puzzle-based storage system
2023, European Journal of Operational Research
Citation Excerpt :
More details of reinforcement learning can be found in Sutton & Barto (2018). In recent years, a large number of applications of reinforcement learning exist in logistics and transportation related fields, e.g., manufacturing planning for material handling systems (Govindaiah & Petty, 2019; Li et al., 2018), routing in baggage handling systems (Mukhutdinov, Filchenkov, Shalyto, & Vyatkin, 2019), order dispatching in ride-hailing/ride-sharing systems (Tang et al., 2019; Xu et al., 2018), rebalancing in bike-sharing system (Pan, Cai, Fang, Tang, & Huang, 2019). Reinforcement learning also shows promising potential for solving classical combinatorial optimization problems, e.g., traveling salesman problem (Vinyals, Fortunato, & Jaitly (2015)), vehicle routing problems (Kool, van Hoof, & Welling, 2019; Nazari, Oroojlooy, Takáč, & Snyder, 2018), etc.
Nowadays, fast delivery services have created the need for high-density warehouses. The puzzle-based storage system is a practical way to enhance the storage density, however, facing difficulties in the retrieval process. In this work, a deep reinforcement learning algorithm, specifically the Double&Dueling Deep Q Network, is developed to solve the multi-item retrieval problem in the system with general settings, where multiple desired items, escorts, and I/O points are placed randomly. Additionally, we propose a general compact integer programming model to evaluate the solution quality. Extensive numerical experiments demonstrate that the reinforcement learning approach can yield high-quality solutions and outperforms three related state-of-the-art heuristic algorithms. Furthermore, a conversion algorithm and a decomposition framework are proposed to handle simultaneous movement and large-scale instances respectively, thus improving the applicability of the PBS system.
DRL-R: Deep reinforcement learning approach for intelligent routing in software-defined data-center networks
2021, Journal of Network and Computer Applications
Citation Excerpt :
In short, as a DRL-based routing method, DRL-R proposes a novel representation of DRL formulation (reward, action and state). Specially, differently from (Xu et al., 2018; Chen et al., 2018; Mukhutdinov et al., 2019; Lillicrap et al., 2015), DRL-R uses the distinct image to represent DRL's state where the resource-recombined is the pixel of image. Differently from (Valadarsky et al., 2017; Sun et al., 2019; Mestres et al., 2017), independent from the shortest path algorithm, DRL-R uses an action representation at path level where each action corresponds to select a specific end-to-end path.
Data-center networks (DCN) possess multiple new features: coexistence of elephant flow/mice flow/coflow, and coexistence of multiple network resources (bandwidth, cache and computing). The cache should be a factor of effecting routing decision because it can eliminate redundant traffic in DCN. However, the conventional routing schemes cannot learn from their previous experiences regarding network abnormalities (such as, congestion), and their metric are still the single link state (such as, hop, distance, and cost) which does not include the effect of cache. Thus, they cannot enough efficiently allocate these resources to well meet the performance requirements for various flow types. Therefore, this paper proposes deep reinforcement learning-based routing (DRL-R). Firstly, we propose a method that recombines multiple network resources with different metrics, where we recombine cache and bandwidth by quantifying their contribution score in reducing the delay. Secondly, we propose a routing scheme with resource-recombined state. By optimally allocating network resources for traffic, a DRL agent deployed on a software-defined networking (SDN) controller continually interacts with the network to adaptively perform reasonable routing according to the network state. We employ deep Q-network (DQN) and deep deterministic policy gradient (DDPG) to build the DRL-R. Finally, we demonstrate the effectiveness of DRL-R through extensive simulations. Benefitting from continuous learning with a global view, DRL-R has lower flow completion time, higher throughput and better load balance as well as better robustness, compared to OSPF. In addition, because it efficiently utilizes the network resources, DRL-R can also outperform another DRL-based routing scheme (namely TIDE). Compared to OSPF and TIDE, respectively, DRL-R can improve throughput by up to 40% and 18.5%; DRL-R can reduce flow completion time by up to 47% and 39%; DRL-R can improve the link load balance by up to 18.8% and 9.3%. Additionally, we observed that DDPG has better performance than DQN.
Towards self-organizing logistics in transportation: a literature review and typology
2024, International Transactions in Operational Research
Learn to Optimize the Constrained Shortest Path on Large Dynamic Graphs
2024, IEEE Transactions on Mobile Computing
Multi-Beam Beamforming-Based ML Algorithm to Optimize the Routing of Drone Swarms
2024, Drones
Multi-Agent Deep Reinforcement Learning for Weighted Multi-Path Routing
2023, FRAME 2023 - Proceedings of the 3rd Workshop on Flexible Resource and Application Management on the Edge

View all citing articles on Scopus

View full text

Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system

Highlights

Abstract

Introduction

Section snippets

Formulation of generalized routing problem

Basic algorithm

Experiments and results

Conclusion

Acknowledgments

Comput. Netw.

OSPF version 2

Distributed software architecture enabling peer to peer communicating controllers

IEEE Trans. Ind. Inf.

Packet routing in dynamically changing networks: A reinforcement learning approach

Adv. Neural Inf. Process. Syst.

Partially observable Markov decision processes

The new routing algorithm for the ARPANet

IEEE Trans. Commun.

AntNet: Distributed stigmergetic control for communications networks

J. Artificial Intelligence Res.

Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control