Skip to main content

Continual Model-Based Reinforcement Learning for Data Efficient Wireless Network Optimisation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (ECML PKDD 2023)

Abstract

We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain.

C. Hasan, A. Agapitos and D. Lynch—Authors with equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The names of CPs, PCs, and EPs cannot be disclosed.

References

  1. Balevi, E., Andrews, J.G.: Online antenna tuning in heterogeneous cellular networks with deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 5(4), 1113–1124 (2019). https://doi.org/10.1109/TCCN.2019.2933420

    Article  Google Scholar 

  2. Biza, K., Tsamardinos, I., Triantafillou, S.: Out-of-sample tuning for causal discovery. IEEE Trans. Neural Netw. Learn. Syst. 1–11 (2022). https://doi.org/10.1109/TNNLS.2022.3185842

  3. Bothe, S., Masood, U., Farooq, H., Imran, A.: Neuromorphic AI empowered root cause analysis of faults in emerging networks. In: IEEE International Black Sea Conference on Communications and Networking, BlackSeaCom 2020, Odessa, Ukraine, 26–29 May 2020. pp. 1–6. IEEE (2020). https://doi.org/10.1109/BlackSeaCom48709.2020.9235002

  4. Bouton, M., Farooq, H., Forgeat, J., Bothe, S., Shirazipour, M., Karlsson, P.: Coordinated reinforcement learning for optimizing mobile networks (Sep 2021)

    Google Scholar 

  5. Calabrese, F.D., Wang, L., Ghadimi, E., Peters, G., Hanzo, L., Soldati, P.: Learning radio resource management in rans: Framework, opportunities, and challenges. IEEE Commun. Mag. 56, 138–145 (2018)

    Article  Google Scholar 

  6. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  7. Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40(1), 294–321 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dandanov, N., Al-Shatri, H., Klein, A., Poulkov, V.: Dynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networks. Wirel. Pers. Commun. 92(1), 251–278 (2017). https://doi.org/10.1007/s11277-016-3849-9

    Article  Google Scholar 

  9. Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2169–2176. IEEE (2017)

    Google Scholar 

  10. Eckhardt, H., Klein, S., Gruber, M.: Vertical antenna tilt optimization for lte base stations. In: VTC Spring, pp. 1–5. IEEE (2011). http://dblp.uni-trier.de/db/conf/vtc/vtc2011s.html#EckhardtKG11

  11. Eisenblätter, A., Geerdes, H.: Capacity optimization for UMTS: bounds and benchmarks for interference reduction. In: Proceedings of the IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2008, 15–18 September 2008, Cannes, French Riviera, France, pp. 1–6. IEEE (2008). https://doi.org/10.1109/PIMRC.2008.4699919

  12. Isele, D., Cosgun, A.: Selective experience replay for lifelong learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  13. Khetarpal, K., Riemer, M., Rish, I., Precup, D.: Towards continual reinforcement learning: a review and perspectives. J. Artif. Intell. Res. 75, 1401–1476 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)

    Article  Google Scholar 

  16. Mendez, J.A., van Seijen, H., Eaton, E.: Modular lifelong reinforcement learning via neural composition. arXiv preprint arXiv:2207.00429 (2022)

  17. Nasir, Y.S., Guo, D.: Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE J. Sel. Areas Commun. 37(10), 2239–2250 (2019). https://doi.org/10.1109/JSAC.2019.2933973

    Article  Google Scholar 

  18. Ogarrio, J.M., Spirtes, P., Ramsey, J.: A hybrid causal search algorithm for latent variable models. In: Proceedings of the Eighth International Conference on Probabilistic Graphical Models, 06–09 Sep, vol. 52, pp. 368–379. PMLR (2016)

    Google Scholar 

  19. Partov, B., Leith, D.J., Razavi, R.: Utility fair optimization of antenna tilt angles in lte networks. IEEE/ACM Trans. Netw. 23(1), 175–185 (2015). https://doi.org/10.1109/TNET.2013.2294965

    Article  Google Scholar 

  20. Raghu, V.K., et al.: Comparison of strategies for scalable causal discovery of latent variable models from mixed data. Inter. J. Data Sci. Anal. 6, 33–45 (2018)

    Article  MathSciNet  Google Scholar 

  21. Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G.: Experience replay for continual learning. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  22. Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)

  23. Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  24. Saini, S.K., Dhamnani, S., Ibrahim, A.A., Chavan, P.: Multiple treatment effect estimation using deep generative model with task embedding. In: The World Wide Web Conference, pp. 1601–1611 (2019)

    Google Scholar 

  25. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  26. Schwarz, J., et al.: Progress & Compress: A scalable framework for continual learning. In: International Conference on Machine Learning, pp. 4528–4537. PMLR (2018)

    Google Scholar 

  27. Shafin, R.S.B., et al.: Self-tuning sectorization: deep reinforcement learning meets broadcast beam optimization. IEEE Trans. Wirel. Commun. 19(6), 4038–4053 (2020). https://doi.org/10.1109/TWC.2020.2979446

    Article  Google Scholar 

  28. Shi, C., Blei, D., Veitch, V.: Adapting Neural Networks for the Estimation of Treatment Effects. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  29. Traoré, Ret al.: Discorl: continual reinforcement learning via policy distillation. arXiv preprint arXiv:1907.05855 (2019)

  30. Vannella, F., Jeong, J., Proutière, A.: Off-policy learning for remote electrical tilt optimization. In: 92nd IEEE Vehicular Technology Conference, VTC Fall 2020, Victoria, BC, Canada, 18 November -16 December 2020. pp. 1–5. IEEE (2020). https://doi.org/10.1109/VTC2020-Fall49728.2020.9348456

  31. Yin, H., Pan, S.J.: Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  32. Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547 (2017)

  33. Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020).https://doi.org/10.1109/SSCI47803.2020.9308468

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cengis Hasan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hasan, C. et al. (2023). Continual Model-Based Reinforcement Learning for Data Efficient Wireless Network Optimisation. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43427-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43426-6

  • Online ISBN: 978-3-031-43427-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics