Abstract
In this paper, we address a cross-layer issue of long-term average utility maximization in energy-efficient cognitive radio networks supporting packetized data traffic under the constraint of collision rate with licensed users. Utility is determined by the number of packets transmitted successfully per consumed power and buffer occupancy. We formulate the problem by dynamic programming method namely constrained Markov decision process (CMDP). Reinforcement learning (RL) approach is employed to finding a near-optimal policy under undiscovered environment. The policy learned by RL can guide transmitter to access available channels and select proper transmission rate at the beginning of each frame for its long-term optimal goals. Some implement problems of the RL approach are discussed. Firstly, state space compaction is utilized to cope with so-called curse of dimensionality due to large state space of formulated CMDP. Secondly, action set reduction is presented to reduce the number of actions for some system states. Finally, the CMDP is converted to a corresponding unconstrained Markov decision process (UMDP) by Lagrangian multiplier approach and a golden section search method is proposed to find the proper multiplier. In order to evaluate the performance of the policy learned by RL, we present two naive policies and compare them by simulations.
Similar content being viewed by others
References
Mitola III, J., (1999). Cognitive radios: Making software radios more personal. IEEE Personal Communications, 6(4), 13–18.
Haykin, S. (2005). Cognitive radio: Brain-empowered wireless communications. IEEE Journal on Selected Areas in Communications, 23(2), 201–220.
Akyildiz, I. F., (2006). NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey. Computer Networks Journal, 50(13), 2127–2159.
Mahmoud, Q. H. (2007). Cognitive networks: Towards self-aware networks. New York: Wiley.
Chen, Y., et al. (2006). Distributed cognitive MAC for energy-constrained dynamic spectrum access. In IEEE military communication conference (pp. 1–7).
Kim, H., et al. (Accepted). Efficient discovery of spectrum opportunities with MAC-layer sensing in cognitive radio Networks. In IEEE transactions on mobile computing.
Chen, Y., Zhao, Q., & Swami, A. (2007). Bursty traffic in energy-constrained opportunistic spectrum access. In IEEE global communications conference (Globecom).
Yu, F., Wong, V. W. S., & Leung, V. C. M. (2006). Efficient QoS provisioning for adaptive multimedia in mobile communication networks by reinforcement learning. Mobile Networks and Applications Journal, 11(1), 101–110.
Tong, H., & Brown, T. X. (2000). Adaptive call admission control under quality of service constraints: A reinforcement learning solution. IEEE Journal on Selected Areas in Communications, 18(2), 209–221.
Nie, J., & Haykin, S. (1999). A dynamic channel assignment policy through Q-learning. IEEE Transactions on Information Neural Networks, 10(6), 1443–1455.
Pandana, C., & Ray Liu, K. J. (2005). Near-optimal reinforcement learning framework for energy-aware sensor communications. IEEE Transactions on Wireless Communications, 23(4), 788–797.
Liao, C.-Y., Yu, F., Leung, V. C. M., & Chang, C.-J. (2006). A novel dynamic cell configuration scheme in next-generation situation-aware CDMA networks. IEEE Transactions on Wireless Communications, 24(1), 16–25.
Chang, C. J., Chen, B. W., Liu, T. Y., & Ren, F. C. (2000). Fuzzy/neural congestion control for integrated voice and data DS-CDMA/FRMA cellular networks. IEEE Transactions on Wireless Communications, 18(2), 283–293.
Chung, S. T., & Goldsmith, A. (2001). Degrees of freedom in adaptive modulation: a unified view. IEEE Transactions on Communications, 49(9), 1561–1571.
Karmokar, A. K., (2006). POMDP-based coding rate adaptation for type-I hybrid ARQ systems over fading channels with memory. IEEE Transactions on Wireless Communications, 5(12), 3512–3523.
Barlow, R. E., & Hunter, L. C. (1961). Reliability analysis of a one-unit system. Operations Research, 9(2), 200–208.
Baxter, L. A. (1981). Availability measures for a two-state system, Journal of Applied Probability, 18, 227–235.
Wang, H. S., & Moayeri, N. (1995). Finite-state Markov channel–a useful model for radio communication channels. IEEE Transactions Vehicle Technology, 44(1), 163–171.
Hossain, Md. J., Djonin, D. V., & Bhargava, V. K. (2005). Delay limited optimal and suboptimal power and bit loading algorithms for OFDM systems over correlated fading. In IEEE global communications conference.
Karmokar, A. K., (2006). Optimal and suboptimal packet scheduling over time-varying flat fading channels. IEEE Transactions on Wireless Communications, 5(2), 446–457.
Puterman, M. L. (1994). Markov decision process: Discrete stochastic dynamic programming. New York: Wiley.
Wong, C. Y., Cheng, R. S., (1999). Multiuser OFDM with adaptive subcarrier, bit, and power allocation. IEEE Journal on Selected Areas in Communications, 17(10), 1747–1758.
Shakkottai, S., & Stolyar, A. L. (2002). Scheduling for multiple flows sharing a time-varying channel: The exponential rule. In American Mathematical Society Translations, ISSU 207, pp. 185–202.
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 159–195.
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In The tenth international conference on machine learning (pp. 298–305).
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Belmont: Athena Scientific.
Altman, E. (1999). Constrained Markov decision process: Stochastic modeling. London: Chapman and Hall/CRC.
Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Application, 112, 236–252.
Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In 15th international conference on machine learning.
Bolch, G., Greiner, S., & de Meer, H. (2006). Queueing networks and Markov chains: Modeling and performance evaluation with computer science applications. New York: Wiley.
Winston, W. L. (1994). Operations research: Mathematical programming. New York: Wadsworth.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported by Hi-Tech Research and Development Program of China (863 Program) with grant number: 2005AA123910, 2007AA01Z209 and sponsored by Huawei Technologies Co., Ltd.
Rights and permissions
About this article
Cite this article
Zhu, J., Wang, J., Luo, T. et al. Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning. Telecommun Syst 42, 123–138 (2009). https://doi.org/10.1007/s11235-009-9174-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-009-9174-9