Skip to main content
Log in

Multi-agent deep reinforcement learning for end—edge orchestrated resource allocation in industrial wireless networks

基于多智能体深度强化学习的工业无线网络端边协同资源分配

  • Research Article
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Edge artificial intelligence will empower the ever simple industrial wireless networks (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a multi-agent deep reinforcement learning based resource allocation (MADRL-RA) algorithm for end-edge orchestrated IWNs to support computation-intensive and delay-sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption. Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step ε-greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

摘要

边缘人工智能通过协同利用设备侧和边缘侧有限的网络、计算资源,赋能工业无线网络以支持复杂和动态工业任务。面向资源受限的工业无线网络,我们提出一种基于多智能体深度强化学习的资源分配(MADRL-RA)算法,实现了端边协同资源分配,支持计算密集型、时延敏感型工业应用。首先,建立了端边协同的工业无线网络系统模型,将具有感知能力的工业设备作为自学习的智能代理。然后,采用马尔可夫决策过程对端边资源分配问题进行形式化描述,建立关于时延和能耗联合优化的最小系统开销问题。接着,利用多智能体深度强化学习克服状态空间维灾,同时学习关于计算决策、算力分配和传输功率的有效资源分配策略。为了打破训练数据的时间相关性,同时加速MADRL-RA学习过程,设计了一种带经验权重的经验回放方法,对经验进行分类存储和采样。在此基础上,提出步进的ε-贪婪方法来平衡智能代理对经验的利用与探索。最后,通过大量对比实验,验证了MADRL-RA算法相较于多种基线算法的有效性。实验结果表明,MADRL-RA收敛速度快,能够学习到有效资源分配策略以实现最小系统开销。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Xiaoyu LIU, Chi XU, and Haibin YU designed the research. Xiaoyu LIU processed the data and drafted the paper. Chi XU, Haibin YU, and Peng ZENG helped organize the paper. Xiaoyu LIU and Chi XU revised and finalized the paper.

Corresponding authors

Correspondence to Chi Xu  (许驰) or Haibin Yu  (于海斌).

Additional information

Compliance with ethics guidelines

Xiaoyu LIU, Chi XU, Haibin YU, and Peng ZENG declare that they have no conflict of interest.

Project supported by the National Key R&D Program of China (No. 2020YFB1710900), the National Natural Science Foundation of China (Nos. 62173322, 61803368, and U1908212), the China Postdoctoral Science Foundation (No. 2019M661156), and the Youth Innovation Promotion Association, Chinese Academy of Sciences (No. 2019202)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Xu, C., Yu, H. et al. Multi-agent deep reinforcement learning for end—edge orchestrated resource allocation in industrial wireless networks. Front Inform Technol Electron Eng 23, 47–60 (2022). https://doi.org/10.1631/FITEE.2100331

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2100331

Key words

CLC number

关键词

Navigation