Abstract
We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain.
C. Hasan, A. Agapitos and D. Lynch—Authors with equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The names of CPs, PCs, and EPs cannot be disclosed.
References
Balevi, E., Andrews, J.G.: Online antenna tuning in heterogeneous cellular networks with deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 5(4), 1113–1124 (2019). https://doi.org/10.1109/TCCN.2019.2933420
Biza, K., Tsamardinos, I., Triantafillou, S.: Out-of-sample tuning for causal discovery. IEEE Trans. Neural Netw. Learn. Syst. 1–11 (2022). https://doi.org/10.1109/TNNLS.2022.3185842
Bothe, S., Masood, U., Farooq, H., Imran, A.: Neuromorphic AI empowered root cause analysis of faults in emerging networks. In: IEEE International Black Sea Conference on Communications and Networking, BlackSeaCom 2020, Odessa, Ukraine, 26–29 May 2020. pp. 1–6. IEEE (2020). https://doi.org/10.1109/BlackSeaCom48709.2020.9235002
Bouton, M., Farooq, H., Forgeat, J., Bothe, S., Shirazipour, M., Karlsson, P.: Coordinated reinforcement learning for optimizing mobile networks (Sep 2021)
Calabrese, F.D., Wang, L., Ghadimi, E., Peters, G., Hanzo, L., Soldati, P.: Learning radio resource management in rans: Framework, opportunities, and challenges. IEEE Commun. Mag. 56, 138–145 (2018)
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems 31 (2018)
Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40(1), 294–321 (2012)
Dandanov, N., Al-Shatri, H., Klein, A., Poulkov, V.: Dynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networks. Wirel. Pers. Commun. 92(1), 251–278 (2017). https://doi.org/10.1007/s11277-016-3849-9
Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2169–2176. IEEE (2017)
Eckhardt, H., Klein, S., Gruber, M.: Vertical antenna tilt optimization for lte base stations. In: VTC Spring, pp. 1–5. IEEE (2011). http://dblp.uni-trier.de/db/conf/vtc/vtc2011s.html#EckhardtKG11
Eisenblätter, A., Geerdes, H.: Capacity optimization for UMTS: bounds and benchmarks for interference reduction. In: Proceedings of the IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2008, 15–18 September 2008, Cannes, French Riviera, France, pp. 1–6. IEEE (2008). https://doi.org/10.1109/PIMRC.2008.4699919
Isele, D., Cosgun, A.: Selective experience replay for lifelong learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Khetarpal, K., Riemer, M., Rish, I., Precup, D.: Towards continual reinforcement learning: a review and perspectives. J. Artif. Intell. Res. 75, 1401–1476 (2020)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)
Mendez, J.A., van Seijen, H., Eaton, E.: Modular lifelong reinforcement learning via neural composition. arXiv preprint arXiv:2207.00429 (2022)
Nasir, Y.S., Guo, D.: Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE J. Sel. Areas Commun. 37(10), 2239–2250 (2019). https://doi.org/10.1109/JSAC.2019.2933973
Ogarrio, J.M., Spirtes, P., Ramsey, J.: A hybrid causal search algorithm for latent variable models. In: Proceedings of the Eighth International Conference on Probabilistic Graphical Models, 06–09 Sep, vol. 52, pp. 368–379. PMLR (2016)
Partov, B., Leith, D.J., Razavi, R.: Utility fair optimization of antenna tilt angles in lte networks. IEEE/ACM Trans. Netw. 23(1), 175–185 (2015). https://doi.org/10.1109/TNET.2013.2294965
Raghu, V.K., et al.: Comparison of strategies for scalable causal discovery of latent variable models from mixed data. Inter. J. Data Sci. Anal. 6, 33–45 (2018)
Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G.: Experience replay for continual learning. In: Advances in Neural Information Processing Systems 32 (2019)
Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Saini, S.K., Dhamnani, S., Ibrahim, A.A., Chavan, P.: Multiple treatment effect estimation using deep generative model with task embedding. In: The World Wide Web Conference, pp. 1601–1611 (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Schwarz, J., et al.: Progress & Compress: A scalable framework for continual learning. In: International Conference on Machine Learning, pp. 4528–4537. PMLR (2018)
Shafin, R.S.B., et al.: Self-tuning sectorization: deep reinforcement learning meets broadcast beam optimization. IEEE Trans. Wirel. Commun. 19(6), 4038–4053 (2020). https://doi.org/10.1109/TWC.2020.2979446
Shi, C., Blei, D., Veitch, V.: Adapting Neural Networks for the Estimation of Treatment Effects. In: Advances in Neural Information Processing Systems 32 (2019)
Traoré, Ret al.: Discorl: continual reinforcement learning via policy distillation. arXiv preprint arXiv:1907.05855 (2019)
Vannella, F., Jeong, J., Proutière, A.: Off-policy learning for remote electrical tilt optimization. In: 92nd IEEE Vehicular Technology Conference, VTC Fall 2020, Victoria, BC, Canada, 18 November -16 December 2020. pp. 1–5. IEEE (2020). https://doi.org/10.1109/VTC2020-Fall49728.2020.9348456
Yin, H., Pan, S.J.: Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547 (2017)
Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020).https://doi.org/10.1109/SSCI47803.2020.9308468
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hasan, C. et al. (2023). Continual Model-Based Reinforcement Learning for Data Efficient Wireless Network Optimisation. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-43427-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)