Elsevier

Physical Communication

Volume 32, February 2019, Pages 81-87
Physical Communication

Full length article
Delay-aware packet scheduling for massive MIMO beamforming transmission using large-scale reinforcement learning

https://doi.org/10.1016/j.phycom.2018.11.009Get rights and content

Abstract

This work addresses the massive multiple input multiple output (MIMO) beamforming scheduling problem. The scheduling model for packet transmission with paralleled caches using massive MIMO multicast beamforming is proposed by using reinforcement learning (RL) theory to denote the mapping function from the system state to the beamforming strategy. The delay-aware model-free RL scheduling problem for massive MIMO multicast beamforming is derived and analyzed in asymptotic condition. We show that the model-free RL method can be a complementary way to optimize the packet delay when the channel state information (CSI) is unavailable and the traditional convex optimization based methods are consequently ineffective. To derive low-complexity algorithm, we build the RL sub-problems for each specific multicast group which are asymptotically independent. The policy gradient method is used to solve the proposed RL problems. In numerical experiments we provide simulation results under different number of transmitting antennas and provide performance comparison between the proposed method and randomized method. It shows that even without using CSI, the RL based policy can still dynamically optimize the delay of the system and out-perform the randomized policy.

Introduction

The massive multiple input multiple output (MIMO) technology has recently been widely acknowledged to be the foundation of the physical layer technologies in future wireless communication system since it can achieve high spectrum efficiency and energy efficiency [1]. The beamforming technology in massive MIMO scenario is of particular research interest since it is one of the most important form of utilizing the advantage of the large number of antennas to achieve high throughput [2], [3]. In traditional massive MIMO beamforming researches, the commonly seen method is to derive an optimization problem to obtain the beamformer and scheduling policy. Substantial works have been focusing on these problems by using convex optimization theory and model-based MDP theory  [4], [5], [6], [7], [8], [9]. However, these methods suffer from the well-known pilot contamination in massive MIMO multi-cell system which results in poor performance of channel estimation and imperfect channel state information (CSI). Since CSI is the input parameter of convex optimization based algorithms, the inaccurate CSI introduced by pilot contamination will render these methods ineffective.

Motivated by these challenges and the recent advances in machine learning  [10], we try to adopt the large-scale Markov decision process (MDP) theory [11], [12] and the reinforcement learning (RL) theory [13] to model the beamforming scheduling problem. The motivations are as follows. First, the model-free RL theory grants that it is possible to solve the optimal beamformer without knowing the accurate transition dynamics of the environment or perfectly acquiring fully observed information of the system, i.e., the CSI as well as other system states concerned, by observing the practical performance of the current beamforming strategy and improving it gradually over time, which means it will not be directly affected by the pilot contamination in massive MIMO system. Second, since the channel estimation is also performed in non-continuous time slots gradually, the gradually-optimizing RL paradigm is intuitively in accordance with the process of channel estimation and does not introduce significant change of the system architecture. Also, most of the convex optimization based theories are transient, i.e., only considers the current time slot and does not consider the dynamic feature of the wireless system, while the RL methods can utilize the dynamic feature of the environment to render the optimal policy.

To the best of our knowledge, many existing MDP and RL works that address the MIMO transmission problem have the following drawbacks. One is the naive assumption of system dynamics by modeling the environment’s behavior, as known as the model-based method in RL’s prospective. Another is using the dynamic programming (DP) based methods which is computational expensive and impractical considering the dimensionality issue and the continuous action case. Furthermore, in massive MIMO scenario it is possible to decompose the large-scale RL problem into paralleled ones to overcome dimensionality, since the channels are asymptotically orthogonal and become independent.

Our main contributions are as follows. First, we propose a general RL architecture of packet scheduling for massive MIMO beamforming system with different types of formulation. Based on that, we analyze the asymptotic behavior in massive MIMO scenario and decompose the high-dimensional scheduling problem into sub-problems. The model-free packet scheduling algorithms with low computational complexity is proposed. By utilizing large-scale RL algorithms, the proposed method can dynamically optimize the system delay without using CSI, hence it can be used as a complimentary way for the traditional convex optimization based methods when CSI is unavailable. The rest of the paper is organized as follows. Section 2 presents the massive MIMO beamforming signal model. Section 3 presents the RL based scheduling architecture for the massive MIMO beamforming system. Section 4 presents the detailed analysis with asymptotic results and low-complexity algorithm for solving the proposed RL problem. Section 5 presents the numerical results, and Section 6 concludes the paper.

Notation: Boldface uppercase and lowercase letters denote matrices and vectors respectively. N+ and denote the set of positive integer and complex number respectively. E denotes the stochastic expectation, and H denotes the Hermitian transpose. IN and 0N denote the N×N identity matrix and zero matrix. xCN0,I denotes that x is a circular symmetrical complex Gaussian random vector with mean vector being 0 and covariance vector being I.

Section snippets

Massive MIMO multicast beamforming transmission model

Consider a single base station equipped with massive MIMO antenna that utilizes wireless multicast beamforming to transmit information simultaneously to users within its coverage area. The multicast information belongs to M groups, which are stored and scheduled in M paralleled caches. The arrival rate of multicast packet in each cache is denoted by λm, as is shown in Fig. 1. Each group utilizes a specific beamformer ωm to perform multicast beamforming. The beamformed vector information of each

Large-scale RL based scheduling structure

To address the proposed problem and derive the optimal multicast beamforming strategy, we utilize the RL theory by modeling the beamformer ωm and power ratio ρm using RL policies. For the convenience of analysis, we consider the average cost model assumption for finite-state MDP only, where the optimal policy is stationary for all initial states [12]. To build the problem, we formulate the following RL components.

Low-complexity method for model-free RL based scheduling problem

We consider two steps to simplify the problem above. First we use the asymptotic characteristic of massive MIMO channel to derive the decomposed RL problems, then solve it by utilizing the theory for large-scale RL.

Numerical experiments

We use the Asynchronous Advantage Actor Critic (A3C) algorithm [14] with TensorFlow [15] and OpenAI Gym [16] to provide numerical results for the proposed model. For the observation environment part, we set 4 users in 2 multicast groups with distance dm,k from the base station being {0.02,0.05,0.02,0.05} in kilometer. The total transmission power of the base station is set to be 10 dBW. The wireless channel fast fading vector hm,k is assumed to be Rayleigh distributed with unit variance. The

Conclusion

In this paper we propose the delay-aware packet scheduling structure in massive MIMO system using large-scale RL. We present the modeling process of a series of RL problems to solve the multicast beamforming scheduling problem and provide theoretical analysis for these models. We show that by using model-free large-scale RL algorithms, the proposed model can dynamically solve the beamforming problem without using CSI, which can overcome the imperfect CSI issue in traditional signal process

Acknowledgments

This work is supported by National Science Foundation of China Project (Grant No. 61471066) and Open Project Fund (Grant No. 201600017) of the National Key Laboratory of Electromagnetic Environment, China .

Xinran Zhang received B.S. and M.S. degrees from Beijing University of Posts and Telecommunications (BUPT) in 2010 and 2013, respectively. He has been an assistant professor at the Beijing Electronic Science and Technology Institute, Beijing, China, since 2013. He has been working toward his Ph.D. degree at BUPT since 2015. His research interests include large-scale reinforcement learning theory and wireless signal processing theory.

References (16)

  • ZhengK. et al.

    Massive mimo channel models: A survey

    Int. J. Antennas Propag.

    (2014)
  • XiangZ. et al.

    Massive mimo multicasting in noncooperative cellular networks

    IEEE J. Sel. Areas Commun.

    (2014)
  • ZhouH. et al.

    Joint multicast beamforming and user grouping in massive mimo systems

  • ChenE. et al.

    A fast algorithm for multi-group multicast beamforming in large-scale wireless systems

  • LiP. et al.

    A cmdp-based approach for energy efficient power allocation in massive mimo systems

  • DjoninD.V. et al.

    Q-learning algorithms for constrained markov decision processes with randomized monotone policies: Application to mimo transmission control

    IEEE Trans. Signal Process.

    (2007)
  • SunS. et al.

    On stochastic feedback control for multi-antenna beamforming: Formulation and low-complexity algorithms

    IEEE Trans. Wireless Commun.

    (2014)
  • LiJ. et al.

    Delay-aware cooperative multipoint transmission with backhaul limitation in cloud-ran

There are more references available in the full text version of this article.

Cited by (0)

Xinran Zhang received B.S. and M.S. degrees from Beijing University of Posts and Telecommunications (BUPT) in 2010 and 2013, respectively. He has been an assistant professor at the Beijing Electronic Science and Technology Institute, Beijing, China, since 2013. He has been working toward his Ph.D. degree at BUPT since 2015. His research interests include large-scale reinforcement learning theory and wireless signal processing theory.

Songlin Sun received B.S. and M.S. degrees from Shandong University of Technology in 1997 and 2000, respectively, and his Ph.D. degree from Beijing University of Posts and Telecommunications in 2003. He is currently a professor at Beijing University of Posts and Telecommunications. He has long been engaged in the research and development and teaching work of wireless communications, multimedia communication, the Internet of Things, and embedded systems.

This work is supported by National Science Foundation of China Project (Grant No. 61471066) and Open Project Fund (Grant No. 201600017) of the National Key Laboratory of Electromagnetic Environment, China.

View full text