Elsevier

Ad Hoc Networks

Volume 102, 1 May 2020, 102069
Ad Hoc Networks

A deep reinforcement learning for user association and power control in heterogeneous networks

https://doi.org/10.1016/j.adhoc.2019.102069Get rights and content

Abstract

Heterogeneous network (HetNet) is a promising solution to satisfy the unprecedented demand for higher data rate in the next generation mobile networks. Different from the traditional single-layer cellular networks, how to provide the best service to the user equipments (UEs) under the limited resource is an urgent problem to solve. In order to efficiently address the above challenge and strive towards high network energy efficiency, the joint optimization problem of user association and power control in orthogonal frequency division multiple access (OFDMA) based uplink HetNets is studied. Considering the non-convex and non-linear characteristics of the problem, a multi-agent deep Q-learning Network (DQN) method is studied to solve the problem. Different from the traditional methods, such as game theory, fractional programming and convex optimization, which need more and accurate network information in practice, the multi-agent DQN method requires less communication information of the environment. Moreover, for the communication environment dynamics, the maximum long-term overall network utility with a new reward function while ensuring the UE’s quality of service (QoS) requirements is achieved by using the multi-agent DQN method. Then, according to the application scenario, the action space, state space and reward function of the multi-agent DQN based framework are redefined and formulated. Simulation results demonstrate that the multi-agent DQN method has the best performance on convergence and energy efficiency compared with the traditional reinforcement learning (Q-learning).

Introduction

In order to meet the explosive increase of mobile data traffic demands, heterogeneous networks (HetNets) have been proposed as an efficient solution due to the characteristics of the dense deployment and heterogeneity [1]. Compared with traditional homogeneous networks, HetNets consist of different types of base stations (BSs) named as micro BS, pico BS and femto BS, etc. These BSs are characterized by their transmit power, BS density and data rate [2], [3], [4]. As the number of mobile devices increases, the interference among user equipments (UEs) will be more severe under the spectrum sharing strategy for the uplink HetNets. Thus, orthogonal frequency division multiple access (OFDMA) based HetNets have been considered in major wireless communication standards [5], [6], [7]. Since the macro BS and small BS have different coverage, transmit power and processing capability, when the conventional maximum received signal strength based user association scheme is applied to the HetNets, it will result in inefficient small BS deployment because most UEs are associated with the macro BS and very few UEs are attracted by small BS. In addition, with the increase of UEs, the uplink interference is another bottleneck in HetNets [8]. A proper setting of transmit power by a power control strategy can decrease the interference among the UEs who select the same subchannel in the OFDMA based HetNets, which strongly influences the quality of service (QoS) of UEs. Thus, to further improve the system performance and user experience, the joint optimization problem of user association and power control is of great importance in the HetNets.

There are some works that studied the user association and power control problems in [9], [10], [11], [12], [13], [14]. Considering the interplay within user association and power control, some literatures investigate joint optimization of user association and power control in the HetNets, such as [15], [16], [17], [18]. The author in [15] investigated the uplink energy-efficient of the communication between the primary users and the secondary users through user association and power control and proposed an iterative algorithm to solve this problem by using convex optimization etc. Under the noncooperative game theory, a universal joint BS association and power control algorithm for HetNets was proposed by considering the system throughput in [16]. A joint user association and power control strategy for balancing the network loads by maximizing the weighted sum of long-term rate was designed in [17]. In [18], a heuristic algorithm was proposed to deal with the delay-aware uplink user association problem in conjunction with power control in HetNets. In addition, considering the non-convex and non-linear characteristics about the joint user association and power control problem, it is difficult to obtain a global optimal solution. In order to solve the problem, some methods have recently been developed, such as the convex optimization strategy [9], [10], [12], [13], [15], game-theoretic method [11], [14], [16], fractional programming approach [17] and heuristic algorithm [18].

However, in order to obtain the solution of the problem by the above methods, a more and accurate network information are required, which may not be effectively and practically under the change of the communication environment in practice. For the time-varying dynamic environment, how to solve this problem more effectively and intelligently is still a challenge for the HetNets. Thus, the emerging artificial intelligence method turns into an efficient tool for the problem [19]. By constantly interacting with the environment, reinforcement learning [20], [21] can solve the long-term decision and game-theoretic problems through the online learning. Among the reinforcement learning algorithms, the Q-learning algorithm is widely used because it does not need to know the state transition probability [22]. By using less prior knowledge of the environment, the Q-learning method can obtain the optimal policy to solve the intelligent decision problems. In [23], by using the Q-learning method, the author studied a joint channel allocation and power control problem for device-to-device (D2D) transmission underlaying a conventional single-cell cellular network. Then a Q-learning based method for autonomous channel and power level selection by D2D users in a multi-cell network was studied in [24]. For load balancing in the vehicular networks with heterogeneous BSs, a distributed user association algorithm based online Q-learning was studied in [25]. In [26], a Q-learning based power control scheme for energy-efficient optimization in femtocell networks was studied. The problem of joint caching and resource allocation was investigated for a network of cache-enabled unmanned aerial vehicles (UAVs) that serve wireless ground users over the LTE licensed and unlicensed bands [27].

However, the space of state and action considered in [23], [24], [25], [26], [27] is relatively small. For the joint user association and power control problem in the HetNets, since the space of state and action is relatively large, it is difficult to get a better performance by Q-learning method. In order to deal with the large space and make up for the deficiency of Q-learning method, a deep reinforcement learning [28] is proposed as a method to handle the large-scale problem. Based on the deep reinforcement learning approach, through the combination of Q-learning and deep neural network (DNN), the deep Q-network (DQN) [29] can effectively improve the network learning performance. In other word, the agent can learn optimal strategy from high dimensional state and action space by using DQN method. Recently, the DQN method has been studied in some works to solve the intelligent resource management and decision problem. In order to minimize the interference to vehicle-to-infrastructure (V2I) communications, a DQN based framework was proposed to optimize the joint sub-band and power level problem in [30], [31]. Then, for the mobile edge computing (MEC) system, the author formulated the sum cost of delay and energy consumption for all UEs as the optimization objective and they jointly optimize the offloading decision and computational resource allocation by the DQN method [32]. The author tackled the joint caching, computing, and radio resources allocation problem in the fog-enabled internet of things (IoT), in order to minimize the service latency under the DQN method [33]. By considering the long-term system power consumption under the dynamics of edge cache states, a DQN-based joint mode selection and resource management approach was studied in [34]. However, a few recent literatures study the DQN based method to solve the joint optimization problem in HetNets, such as [35], [36]. In [35], the deep reinforcement learning for user association and channel allocation in HetNets was studied, where the author considered the difference between the UE’s rate and the BS’s transmit power as a reward. The author in [36] studied the control of user association and power allocation to maximize UEs’ sum-rate under the constraints of UE’s QoS by using the DQN scheme, where a convolutional neural network (CNN) was applied. However, the above studies focus on joint user association and channel allocation (or power allocation) in HetNets without considering the analysis of energy efficiency. Considering the continuous emergence of various new business and application scenarios [37], [38], the energy consumption of UE is also rising together with the growing of intensive mobile data computing and applications. Since the current battery technology cannot satisfy the energy consumption of mobile UEs, optimizing the energy efficiency of UEs becomes even more important in the HetNets.

Based on the above analysis, as deep reinforcement learning shows great potential in handling large systems, in this paper, a multi-agent deep reinforcement learning for joint user association and power control is studied. The main contributions of this paper are summarized as follows.

1) In this paper, in order to maximize the energy efficiency of all UEs, we first jointly optimize the user association and power control in OFDMA based uplink HetNets by using the multi-agent DQN method.

2) Since the problem is a mixed-integer non-linear fractional programing (MINLFP) problem, it is difficult to obtain the optimal solution by the traditional methods, and a multi-agent DQN algorithm which requires less transmission overhead information is studied. Based on the contradiction between energy consumption and battery capacity of UE, the UE’s energy efficiency is redefined as the reward function in this paper. For the decentralized reinforcement learning framework, the agents are capable of intelligently making their adaptive decisions to maximize their energy efficiency under the constraints of maximum transmit power and UEs’ QoS requirements without coordinating with other agents.

3) The performance of multi-agent DQN from the perspectives of the convergence, optimality and stability are analyzed. Simulation results show that the multi-agent DQN based framework achieves better convergence and energy efficiency of all UEs compared to other four methods. From the results, the multi-agent DQN algorithm shows great potential in handling large systems.

The rest of this paper is organized as follows. Section II describes the system model. Section III presents the problem formulation and the multi-agent DQN based framework. Simulation and performance analysis are included in section IV. Finally, conclusions are provided in Section V.

Section snippets

System model

In this work, an OFDMA based two-tier HetNet is considered as shown in Fig. 1. In this scenario, a macro BS is modeled by m=0 and within the coverage area of the macro BS, a set of small BSs is deployed. Without loss of generality, the set of all BSs is denoted as M={0,1,2,,M}. The learning process is done by the cloud server which is connected to the macro or small BSs through the optical fiber cables. The UEs are randomly distributed in the network, and the set of the UEs is U={1,2,,U},

Multi-agent DQN for joint user association and power control

From P1, it can be seen that the user association and power control mechanisms are mutually involved with each other, and the problem is a mixed-integer and non-convex problem. To efficiently solve the problem, a multi-agent DQN method based on reinforcement learning is studied. The main parts of the reinforcement learning based Markov decision process are shown with a new proposed reward function before presenting the multi-agent DQN approach.

Simulation results and analysis

In this section, the multi-agent DQN algorithm is simulated in a two-tier HetNet where has one macro BS and some micro BSs. There are 25 UEs who are randomly dispersed over the macro BS’s coverage area with setting as a 200 m  ×  200 m square. Besides, the micro BSs are also randomly distributed in the considered area. The maximum transmit power of UEs is 23 dBm and the total number of subchannels is 15. The pass loss of macro BS and micro BS are PL1=34+40log10(d), PL2=37+30log10(d)

Conclusion

In this paper, the joint optimization problem of user association and power control has been studied in the OFDMA based HetNets. The above resource management problem has been formulated as a maximum long-term uplink energy efficiency of all UEs under the constraints of maximum transmit power and UE’s QoS requirements. The multi-agent DQN approach has been utilized to solve the above MINLFP problem. Different from traditional solution methods, a small communication information is needed by the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the National Science Fund of China for Excellent Young Scholars under Grant 61622111, the National Natural Science Foundation of China (No. 61860206005, 61671278, 61871466 and 61801278) and in part by the Guangxi Natural Science Foundation Innovation ResearchTeam Project under Grant 2016GXNSFGA380002.

Hui dingis currently pursuing the Ph.D. degree with the School of Information Science and Engineering, Shandong University, China. His research interests include heterogeneous network in 5G, resource allocation, user association and power control, non-convex optimization and reinforcement learning.

References (42)

  • J. Zheng et al.

    Optimal power control in ultra-dense small cell networks: a game-theoretic approach

    IEEE Trans. Wireless Commun.

    (2017)
  • Y. Wei et al.

    Joint optimization of caching, computing, and radio resources for fog-enabled iot using natural actor critic deep reinforcement learning

    IEEE Internet Things J.

    (2019)
  • S. Cai et al.

    Green 5g heterogeneous networks through dynamic small-cell operation

    IEEE J. Sel. Areas Commun.

    (2016)
  • W. Guo et al.

    Automated small-cell deployment for heterogeneous cellular networks

    IEEE Commun. Mag.

    (2013)
  • J. Chen et al.

    Cross-layer qoe optimization for d2d communication in cr-enabled heterogeneous cellular networks

    IEEE Trans. Cognit. Commun. Netw.

    (2018)
  • C. Liu et al.

    Interference precancellation for resource management in heterogeneous cellular networks

    IEEE Trans. Cognit. Commun. Netw.

    (2019)
  • H. Yin et al.

    Ofdma: a broadband wireless access technology

    2006 IEEE Sarnoff Symposium

    (2006)
  • S. Lohani et al.

    Joint resource allocation and dynamic activation of energy harvesting small cells in ofdma hetnets

    IEEE Trans. Wireless Commun.

    (2018)
  • S. Rezvani et al.

    Fairness and transmission-aware caching and delivery policies in ofdma-based hetnets

    IEEE Transactions on Mobile Computing

    (2019)
  • A. Damnjanovic et al.

    A survey on 3gpp heterogeneous networks

    IEEE Wireless Commun.

    (2011)
  • X. Ge et al.

    Joint user association and user scheduling for load balancing in heterogeneous networks

    IEEE Trans. Wireless Commun.

    (2018)
  • Y.L. Lee et al.

    User association for backhaul load balancing with quality of service provisioning for heterogeneous networks

    IEEE Commun. Lett.

    (2018)
  • Y. Xu et al.

    User association in massive mimo hetnets

    IEEE Syst. J.

    (2017)
  • T.M. Ho et al.

    Power control for interference management and qos guarantee in heterogeneous networks

    IEEE Commun. Lett.

    (2015)
  • B. Xu et al.

    Energy-aware power control in energy-cooperation enabled hetnets with hybrid energy supplies

    2016 IEEE Global Communications Conference (GLOBECOM)

    (2016)
  • M. Wang et al.

    Energy-efficient user association and power control in the heterogeneous network

    IEEE Access

    (2017)
  • V.N. Ha et al.

    Distributed base station association and power control for heterogeneous cellular networks

    IEEE Trans. Veh. Technol.

    (2014)
  • T. Zhou et al.

    Joint user association and power control for load balancing in downlink heterogeneous cellular networks

    IEEE Trans. Veh. Technol.

    (2018)
  • Z. Chen et al.

    Delay-aware uplink user association and power control in heterogeneous cellular networks

    IEEE Wireless Commun. Lett.

    (2015)
  • R. Li et al.

    Intelligent 5g: when cellular networks meet artificial intelligence

    IEEE Wireless Commun.

    (2017)
  • L.P. Kaelbling et al.

    Reinforcement learning: a survey

    J. Artif. Intell. Res.

    (May 1996)
  • Cited by (55)

    View all citing articles on Scopus

    Hui dingis currently pursuing the Ph.D. degree with the School of Information Science and Engineering, Shandong University, China. His research interests include heterogeneous network in 5G, resource allocation, user association and power control, non-convex optimization and reinforcement learning.

    Feng Zhaoreceived his Ph.D. degree in communication and information system from Shandong University, China in 2007. He received his B.S. degree from Guilin University of Electronic Technology, China in 1997. From April 2008 to May 2011, he worked as postdoc in Beijing University of Posts and Telecommunications (part time). He was a visiting scholar at the University of Texas at Arlington from February to August 2013. He is currently a Professor with the Guangxi Colleges and Universities Key Laboratory of Complex System Optimization and Big Data Processing, Yulin Normal University, Yulin, China. His current research interests include cognitive radio networks, MIMO wireless communications, cooperative communications, and smart antenna techniques. His research has been supported by the National Science Foundation of China. Dr. Zhao has published more than 60 papers in journals and international conferences. He was awarded the second prize of Shandong province science and technology progress twice, in 2007,2012 and 2017, respectively.

    Jie tian(S’12-M’ 16) received the B.E. and M.E. degrees from Shandong Normal University, China, in 2008 and 2011, respectively, and the Ph.D. degree in communication and information systems from the School of Information Science and Engineering, Shandong University, China, in 2016. She is currently an Associate Professor with the School of Information Science and Engineering, Shandong Normal University, Jinan, China. She is a member of IEEE, IEEE Communications Society, and ACM. Her research interests include cross-layer design of wireless communication networks, intelligent radio resource management in heterogeneous networks, and signal processing for communications.

    Dongyang Liis currently pursuing Ph.D. degree with the Department of Information and Communications Technology, the School of Information Science and Engineering, Shandong University. His research interests include wireless big data, wireless edge caching and deep learning.

    Haixia Zhang(M’08-SM’11) received the B.E. degree from the Department of Communication and Information Engineering, Guilin University of Electronic Technology, China, in 2001, and received the M.Eng. and Ph.D. degrees in communication and information systems from the School of Information Science and Engineering, Shandong University, China, in 2004 and 2008. From 2006 to 2008, she was with the Institute for Circuit and Signal Processing, Munich University of Technology as an academic assistant. From 2016 to 2017, she worked as a visiting professor at University of Florida, USA. Currently, she works as full professor at Shandong University. She has been actively participating in many academic events, serving as TPC members, session chairs, and giving invited talks for conferences, and serving as reviewers for numerous journals. She is the associate editor for the International Journal of Communication Systems and IEEE Wireless Communication Letters. Her current research interests include cognitive radio systems, cooperative (relay) communications, resource management, space time process techniques, mobile edge computing and smart communication technologies.

    View full text