Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system
Introduction
Distributed routing problem first emerged in computer science in the context of packet routing in computer networks. Over the years, many network routing algorithms such as Open Shortest Path First (OSPF) [1] has been developed and standardized, and now they are widely used in computer networks.
However, the distributed routing problem emerges in contexts different from computer networks. An example of such context is baggage handling systems (BHS) primarily used in airports (Fig. 1) or material handling systems used in manufacturing enterprises and logistics centers. In [2] it was shown that baggage handling control can be approached in a distributed way, by using a simple distance-vector routing protocol, which was originally developed for packet routing in computer networks [3].
Though it is possible to use algorithms for network routing in other contexts, this is not always an optimal solution. Some constraints may exist, which are significantly different from constraints presented in the network routing problem. For example, in the case of BHS, we may aim to minimize energy consumption of conveyors together as well as the speed of baggage delivery. Algorithms designed for network routing do not fit for solving problems with such constraints.
In this paper, we propose a routing algorithm based on reinforcement learning approach. The idea of viewing packet routing as a reinforcement learning problem was first implemented in [4] using Q-routing algorithm. The algorithm we propose in this paper is largely based on Q-routing, but has the following differences:
- •
Value function is approximated via neural network (NN) instead of lookup table.
- •
Learning agents use additional information, such as information about current graph topology in order to make more precise estimations of action values. This information is passed between agents via complementary protocols, such as the link-state protocol.
- •
Preliminary application of supervised learning on examples of baseline agent behavior is used to avoid divergence.
Using neural networks for value function approximation allows learning agents to consider arbitrary information, which may be related to routing efficiency, thus allowing the algorithm to be applied efficiently for routing in heterogeneous environments (such as baggage handling systems).
The rest of this paper is structured as follows. In Section 2, we specify the packet routing problem in a generalized way, review existing routing algorithms and reinforcement learning techniques and formulate the routing problem in terms of reinforcement learning. Then in Section 3, we describe the algorithm, as well as several considered NN architectures. In Section 4, we provide results of an experimental comparison of the proposed algorithm with link-state based shortest path algorithm (which is basically a simplified version of OSPF, one of the most widely used network routing protocol nowadays) and Q-routing (which is the basis of the proposed algorithm). A comparison is conducted in two different environments: the simulation model of a computer network and the simulation model of BHS. Section 5 is devoted to discussion of the results and drawing conclusions.
Section snippets
Formulation of generalized routing problem
In this section, we formally describe packet routing problem in a generalized way, so that resulting definition can be used to describe the problem in a variety of settings.
We model the network as a directed graph , where each vertex corresponds to a network node (e.g. router or switch), and every edge corresponds to a link between nodes. Packets are sent between the nodes in the network: a packet may be sent from one of the source nodes and be directed towards one of the
Basic algorithm
The method we propose, which we call DQN-routing, is largely based on Q-routing, which was described in Section 2.3. The method is as follows:
- •
Every router processes one packet at a time. Processed packed is referred to as current packet. A packet consists of its destination and its state ( may be empty).
- •
Every router has a current state , where is the destination node of current packet, is the state of current packet, are neighbors of and
Experiments and results
We performed experiments in two simulation models: a model of a computer network and a model of a baggage handling system. In both environments, we compared DQN-routing with the shortest path (Dijkstra) algorithm with a link-state protocol and Q-routing algorithm. The link-state shortest path algorithm was chosen because link-state protocols are dominant in computer network routing nowadays, and Q-routing was chosen because DQN-routing is based on it.
Conclusion
We presented a novel distributed routing approach that is based on machine learning and can be applied both in communication networks and in physical systems, such as baggage handling systems. The unique strength of the proposed method is its ability to optimize simultaneously the travel time of the routed entities and energy consumption. Comparison with contemporary routing algorithms confirms substantial gain of the proposed method.
However, the proposed method has certain limitations. The
Acknowledgments
Authors would like to thank Arip Asadulaev and Ivan Smetannikov for useful comments. The results were obtained under the research project supported by the Ministry of Education and Science of the Russian Federation , Project No 2.8866.2017/8.9.
Dmitry Mukhutdinov is a Master’s student in Computer Science at ITMO University, Russia. He obtained BSc in Computer Science at ITMO University in 2017. His main scientific interests are artificial intelligence, distributed systems and formal logic. Apart of studying and doing research, Dmitry works as a software developer in Serokell — an R&D company which focuses on functional programming and distributed systems.
References (29)
- et al.
The ARPA network design decisions
Comput. Netw.
(1977) OSPF version 2
(1998)- et al.
Distributed software architecture enabling peer to peer communicating controllers
IEEE Trans. Ind. Inf.
(2013) - et al.
Packet routing in dynamically changing networks: A reinforcement learning approach
Adv. Neural Inf. Process. Syst.
(1994) Partially observable Markov decision processes
- M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, in: Proceedings of the Tenth...
- et al.
The new routing algorithm for the ARPANet
IEEE Trans. Commun.
(1980) - et al.
AntNet: Distributed stigmergetic control for communications networks
J. Artificial Intelligence Res.
(1998) - et al.
Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
Adv. Neural Inf. Process. Syst.
(1996) - et al.
Dual reinforcement Q-routing: An on-line adaptive routing algorithm
Artif. Neural Netw. Eng.
(1997)
Application of reinforcement learning to routing in distributed wireless networks: A review
Artif. Intell. Rev.
Real-time routing algorithm for mobile ad hoc networks using reinforcement learning and heuristic algorithms
Wirel. Netw.
Prioritized sweeping reinforcement learning based routing for MANETs
Indonesian J. Electr. Eng. Comput. Sci.
Resource abstraction for reinforcement learning in multiagent congestion problems
Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems
Cited by (37)
Reinforcement learning for multi-item retrieval in the puzzle-based storage system
2023, European Journal of Operational ResearchCitation Excerpt :More details of reinforcement learning can be found in Sutton & Barto (2018). In recent years, a large number of applications of reinforcement learning exist in logistics and transportation related fields, e.g., manufacturing planning for material handling systems (Govindaiah & Petty, 2019; Li et al., 2018), routing in baggage handling systems (Mukhutdinov, Filchenkov, Shalyto, & Vyatkin, 2019), order dispatching in ride-hailing/ride-sharing systems (Tang et al., 2019; Xu et al., 2018), rebalancing in bike-sharing system (Pan, Cai, Fang, Tang, & Huang, 2019). Reinforcement learning also shows promising potential for solving classical combinatorial optimization problems, e.g., traveling salesman problem (Vinyals, Fortunato, & Jaitly (2015)), vehicle routing problems (Kool, van Hoof, & Welling, 2019; Nazari, Oroojlooy, Takáč, & Snyder, 2018), etc.
DRL-R: Deep reinforcement learning approach for intelligent routing in software-defined data-center networks
2021, Journal of Network and Computer ApplicationsCitation Excerpt :In short, as a DRL-based routing method, DRL-R proposes a novel representation of DRL formulation (reward, action and state). Specially, differently from (Xu et al., 2018; Chen et al., 2018; Mukhutdinov et al., 2019; Lillicrap et al., 2015), DRL-R uses the distinct image to represent DRL's state where the resource-recombined is the pixel of image. Differently from (Valadarsky et al., 2017; Sun et al., 2019; Mestres et al., 2017), independent from the shortest path algorithm, DRL-R uses an action representation at path level where each action corresponds to select a specific end-to-end path.
Towards self-organizing logistics in transportation: a literature review and typology
2024, International Transactions in Operational ResearchLearn to Optimize the Constrained Shortest Path on Large Dynamic Graphs
2024, IEEE Transactions on Mobile ComputingMulti-Agent Deep Reinforcement Learning for Weighted Multi-Path Routing
2023, FRAME 2023 - Proceedings of the 3rd Workshop on Flexible Resource and Application Management on the Edge
Dmitry Mukhutdinov is a Master’s student in Computer Science at ITMO University, Russia. He obtained BSc in Computer Science at ITMO University in 2017. His main scientific interests are artificial intelligence, distributed systems and formal logic. Apart of studying and doing research, Dmitry works as a software developer in Serokell — an R&D company which focuses on functional programming and distributed systems.