Skip to main content

Advertisement

Log in

Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

In this paper, we address a cross-layer issue of long-term average utility maximization in energy-efficient cognitive radio networks supporting packetized data traffic under the constraint of collision rate with licensed users. Utility is determined by the number of packets transmitted successfully per consumed power and buffer occupancy. We formulate the problem by dynamic programming method namely constrained Markov decision process (CMDP). Reinforcement learning (RL) approach is employed to finding a near-optimal policy under undiscovered environment. The policy learned by RL can guide transmitter to access available channels and select proper transmission rate at the beginning of each frame for its long-term optimal goals. Some implement problems of the RL approach are discussed. Firstly, state space compaction is utilized to cope with so-called curse of dimensionality due to large state space of formulated CMDP. Secondly, action set reduction is presented to reduce the number of actions for some system states. Finally, the CMDP is converted to a corresponding unconstrained Markov decision process (UMDP) by Lagrangian multiplier approach and a golden section search method is proposed to find the proper multiplier. In order to evaluate the performance of the policy learned by RL, we present two naive policies and compare them by simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Mitola III, J., (1999). Cognitive radios: Making software radios more personal. IEEE Personal Communications, 6(4), 13–18.

    Article  Google Scholar 

  2. Haykin, S. (2005). Cognitive radio: Brain-empowered wireless communications. IEEE Journal on Selected Areas in Communications, 23(2), 201–220.

    Article  Google Scholar 

  3. Akyildiz, I. F., (2006). NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey. Computer Networks Journal, 50(13), 2127–2159.

    Article  Google Scholar 

  4. Mahmoud, Q. H. (2007). Cognitive networks: Towards self-aware networks. New York: Wiley.

    Google Scholar 

  5. Chen, Y., et al. (2006). Distributed cognitive MAC for energy-constrained dynamic spectrum access. In IEEE military communication conference (pp. 1–7).

  6. Kim, H., et al. (Accepted). Efficient discovery of spectrum opportunities with MAC-layer sensing in cognitive radio Networks. In IEEE transactions on mobile computing.

  7. Chen, Y., Zhao, Q., & Swami, A. (2007). Bursty traffic in energy-constrained opportunistic spectrum access. In IEEE global communications conference (Globecom).

  8. Yu, F., Wong, V. W. S., & Leung, V. C. M. (2006). Efficient QoS provisioning for adaptive multimedia in mobile communication networks by reinforcement learning. Mobile Networks and Applications Journal, 11(1), 101–110.

    Article  Google Scholar 

  9. Tong, H., & Brown, T. X. (2000). Adaptive call admission control under quality of service constraints: A reinforcement learning solution. IEEE Journal on Selected Areas in Communications, 18(2), 209–221.

    Article  Google Scholar 

  10. Nie, J., & Haykin, S. (1999). A dynamic channel assignment policy through Q-learning. IEEE Transactions on Information Neural Networks, 10(6), 1443–1455.

    Article  Google Scholar 

  11. Pandana, C., & Ray Liu, K. J. (2005). Near-optimal reinforcement learning framework for energy-aware sensor communications. IEEE Transactions on Wireless Communications, 23(4), 788–797.

    Google Scholar 

  12. Liao, C.-Y., Yu, F., Leung, V. C. M., & Chang, C.-J. (2006). A novel dynamic cell configuration scheme in next-generation situation-aware CDMA networks. IEEE Transactions on Wireless Communications, 24(1), 16–25.

    Google Scholar 

  13. Chang, C. J., Chen, B. W., Liu, T. Y., & Ren, F. C. (2000). Fuzzy/neural congestion control for integrated voice and data DS-CDMA/FRMA cellular networks. IEEE Transactions on Wireless Communications, 18(2), 283–293.

    Google Scholar 

  14. Chung, S. T., & Goldsmith, A. (2001). Degrees of freedom in adaptive modulation: a unified view. IEEE Transactions on Communications, 49(9), 1561–1571.

    Article  Google Scholar 

  15. Karmokar, A. K., (2006). POMDP-based coding rate adaptation for type-I hybrid ARQ systems over fading channels with memory. IEEE Transactions on Wireless Communications, 5(12), 3512–3523.

    Article  Google Scholar 

  16. Barlow, R. E., & Hunter, L. C. (1961). Reliability analysis of a one-unit system. Operations Research, 9(2), 200–208.

    Article  Google Scholar 

  17. Baxter, L. A. (1981). Availability measures for a two-state system, Journal of Applied Probability, 18, 227–235.

    Article  Google Scholar 

  18. Wang, H. S., & Moayeri, N. (1995). Finite-state Markov channel–a useful model for radio communication channels. IEEE Transactions Vehicle Technology, 44(1), 163–171.

    Article  Google Scholar 

  19. Hossain, Md. J., Djonin, D. V., & Bhargava, V. K. (2005). Delay limited optimal and suboptimal power and bit loading algorithms for OFDM systems over correlated fading. In IEEE global communications conference.

  20. Karmokar, A. K., (2006). Optimal and suboptimal packet scheduling over time-varying flat fading channels. IEEE Transactions on Wireless Communications, 5(2), 446–457.

    Article  Google Scholar 

  21. Puterman, M. L. (1994). Markov decision process: Discrete stochastic dynamic programming. New York: Wiley.

    Google Scholar 

  22. Wong, C. Y., Cheng, R. S., (1999). Multiuser OFDM with adaptive subcarrier, bit, and power allocation. IEEE Journal on Selected Areas in Communications, 17(10), 1747–1758.

    Article  Google Scholar 

  23. Shakkottai, S., & Stolyar, A. L. (2002). Scheduling for multiple flows sharing a time-varying channel: The exponential rule. In American Mathematical Society Translations, ISSU 207, pp. 185–202.

  24. Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 159–195.

    Google Scholar 

  25. Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In The tenth international conference on machine learning (pp. 298–305).

  26. Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Belmont: Athena Scientific.

    Google Scholar 

  27. Altman, E. (1999). Constrained Markov decision process: Stochastic modeling. London: Chapman and Hall/CRC.

    Google Scholar 

  28. Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Application, 112, 236–252.

    Article  Google Scholar 

  29. Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In 15th international conference on machine learning.

  30. Bolch, G., Greiner, S., & de Meer, H. (2006). Queueing networks and Markov chains: Modeling and performance evaluation with computer science applications. New York: Wiley.

    Google Scholar 

  31. Winston, W. L. (1994). Operations research: Mathematical programming. New York: Wadsworth.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Zhu.

Additional information

This work has been supported by Hi-Tech Research and Development Program of China (863 Program) with grant number: 2005AA123910, 2007AA01Z209 and sponsored by Huawei Technologies Co., Ltd.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Wang, J., Luo, T. et al. Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning. Telecommun Syst 42, 123–138 (2009). https://doi.org/10.1007/s11235-009-9174-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-009-9174-9

Keywords

Navigation