skip to main content
10.1145/3492866.3549726acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article
Public Access

Index-aware reinforcement learning for adaptive video streaming at the wireless edge

Published:03 October 2022Publication History

ABSTRACT

We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.

References

  1. Zahaib Akhtar, Yun Seong Nam, Ramesh Govindan, Sanjay Rao, Jessica Chen, Ethan Katz-Bassett, Bruno Ribeiro, Jibin Zhan, and Hui Zhang. 2018. Oboe: Auto-Tuning Video ABR Algorithms to Network Conditions. In Proc. of ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Eitan Altman. 1999. Constrained Markov Decision Processes. Vol. 7. CRC Press.Google ScholarGoogle Scholar
  3. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 2 (2002), 235--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Konstantin Avrachenkov and Vivek S Borkar. 2020. Whittle Index Based Q-learning for Restless Bandits with Average Reward. arXiv preprint arXiv:2004.14427 (2020).Google ScholarGoogle Scholar
  5. Dilip Bethanabhotla, Giuseppe Caire, and Michael J Neely. 2016. WiFlix: Adaptive Video Streaming in Massive MU-MIMO Wireless Networks. IEEE Transactions on Wireless Communications 15, 6 (2016), 4088--4103.Google ScholarGoogle ScholarCross RefCross Ref
  6. Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. 2019. Qflow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks. In Proc. of ACM MobiHoc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chao Chen, Robert W Heath, Alan C Bovik, and Gustavo de Veciana. 2013. A Markov Decision Model for Adaptive Scheduling of Stored Scalable Videos. IEEE Transactions on Circuits and Systems for Video Technology 23, 6 (2013), 1081--1095.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yonathan Efroni, Shie Mannor, and Matteo Pirotta. 2020. Exploration-Exploitation in Constrained MDPs. arXiv preprint arXiv:2003.02189 (2020).Google ScholarGoogle Scholar
  9. Jing Fu, Yoni Nazarathy, Sarat Moka, and Peter G Taylor. 2019. Towards Q-Learning the Whittle Index for Restless Bandits. In 2019 Australian & New Zealand Control Conference (ANZCC). IEEE, 249--254.Google ScholarGoogle Scholar
  10. Chen Gong and Xiaodong Wang. 2013. Adaptive Transmission for Delay-Constrained Wireless Video. IEEE Transactions on Wireless Communications 13, 1 (2013), 49--61.Google ScholarGoogle ScholarCross RefCross Ref
  11. Aditya Gopalan and Shie Mannor. 2015. Thompson Sampling for Learning Parameterized Markov Decision Processes. In Proc. of COLT.Google ScholarGoogle Scholar
  12. Yashuang Guo, Qinghai Yang, F Richard Yu, and Victor CM Leung. 2017. Dynamic Quality Adaptation and Bandwidth Allocation for Adaptive Streaming Over Time-Varying Wireless Networks. IEEE Transactions on Wireless Communications 16, 12 (2017), 8077--8091.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David J Hodge and Kevin D Glazebrook. 2015. On the Asymptotic Optimality of Greedy Index Heuristics for Multi-Action Restless Bandits. Advances in Applied Probability 47, 3 (2015), 652--667.Google ScholarGoogle ScholarCross RefCross Ref
  14. Weici Hu and Peter Frazier. 2017. An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits. arXiv preprint arXiv:1707.00205 (2017).Google ScholarGoogle Scholar
  15. Thomas Jaksch, Ronald Ortner, and Peter Auer. 2010. Near-Optimal Regret Bounds for Reinforcement Learning. Journal of Machine Learning Research 11, 4 (2010).Google ScholarGoogle Scholar
  16. Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, and Tiancheng Yu. 2019. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition. arXiv preprint arXiv:1912.01192 (2019).Google ScholarGoogle Scholar
  17. Krishna C Kalagarla, Rahul Jain, and Pierluigi Nuzzo. 2021. A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints. In Proc. of AAAI.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jonathan Kua, Grenville Armitage, and Philip Branch. 2017. A Survey of Rate Adaptation Techniques for Dynamic Adaptive Streaming Over HTTP. IEEE Communications Surveys & Tutorials 19, 3 (2017), 1842--1866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Qiao Lan, Bojie Lv, Rui Wang, Kaibin Huang, and Yi Gong. 2020. Adaptive Video Streaming for Massive MIMO Networks via Approximate MDP and Reinforcement Learning. IEEE Transactions on Wireless Communications 19, 9 (2020), 5716--5731.Google ScholarGoogle ScholarCross RefCross Ref
  20. Stefan Lederer, Christopher Müller, and Christian Timmerer. 2012. Dynamic Adaptive Streaming over HTTP Dataset. In Proc. of ACM MMSys.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhi Li, Xiaoqing Zhu, Joshua Gahm, Rong Pan, Hao Hu, Ali C Begen, and David Oran. 2014. Probe and Adapt: Rate Adaptation for HTTP Video Streaming at Scale. IEEE Journal on Selected Areas in Communications 32, 4 (2014), 719--733.Google ScholarGoogle ScholarCross RefCross Ref
  22. Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In Proc. of ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andreas Maurer and Massimiliano Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. arXiv preprint arXiv:0907.3740 (2009).Google ScholarGoogle Scholar
  24. José Niño-Mora. 2007. Dynamic Priority Allocation via Restless Bandit Marginal Productivity Indices. Top 15, 2 (2007), 161--198.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ronald Ortner, Daniil Ryabko, Peter Auer, and Rémi Munos. 2012. Regret Bounds for Restless Markov Bandits. In Proc. of Algorithmic Learning Theory.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christos H Papadimitriou and John N Tsitsiklis. 1994. The Complexity of Optimal Queueing Network Control. In Proc. of IEEE Conference on Structure in Complexity Theory.Google ScholarGoogle ScholarCross RefCross Ref
  27. Martin L Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.Google ScholarGoogle Scholar
  28. Aviv Rosenberg and Yishay Mansour. 2019. Online Convex Optimization in Adversarial Markov Decision Processes. In Proc. of ICML.Google ScholarGoogle Scholar
  29. Iraj Sodagar. 2011. The MPEG-DASH Standard for Multimedia Streaming Over the Internet. IEEE Multimedia 18, 4 (2011), 62--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kevin Spiteri, Rahul Urgaonkar, and Ramesh K Sitaraman. 2020. BOLA: Near-Optimal Bitrate Adaptation for Online Videos. IEEE/ACM Transactions on Networking 28, 4 (2020), 1698--1711.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Thomas Stockhammer. 2011. Dynamic Adaptive Streaming Over HTTP-Standards and Design Principles. In Proc. of ACM MMSys.Google ScholarGoogle Scholar
  32. Cisco Systems. 2019. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2017--2022 White Paper. [Online.] Available: https://s3.amazonaws.com/media.mediapost.com/uploads/CiscoForecast.pdf (2019).Google ScholarGoogle Scholar
  33. Kexin Tang, Nuowen Kan, Junni Zou, Chenglin Li, Xiao Fu, Mingyi Hong, and Hongkai Xiong. 2021. Multi-user Adaptive Video Delivery over Wireless Networks: A Physical Layer Resource-Aware Deep Reinforcement Learning Approach. IEEE Transactions on Circuits and Systems for Video Technology 31, 2 (2021), 798--815.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck. 2016. HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks. IEEE Communications Letters 20, 11 (2016), 2177--2180.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ina Maria Verloop. 2016. Asymptotically Optimal Priority Policies for Indexable and Nonindexable Restless Bandits. The Annals of Applied Probability 26, 4 (2016), 1947--1995.Google ScholarGoogle ScholarCross RefCross Ref
  36. Siwei Wang, Longbo Huang, and John Lui. 2020. Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits. In Proc. of NeurIPS.Google ScholarGoogle Scholar
  37. Richard R Weber and Gideon Weiss. 1990. On An Index Policy for Restless Bandits. Journal of Applied Probability (1990), 637--648.Google ScholarGoogle Scholar
  38. Peter Whittle. 1988. Restless Bandits: Activity Allocation in A Changing World. Journal of Applied Probability (1988), 287--298.Google ScholarGoogle Scholar
  39. Guojun Xiong, Jian Li, and Rahul Singh. 2022. Reinforcement Learning Augmented Asymptotically Optimal Index Policies for Finite-Horizon Restless Bandits. In Proc. of AAAI 2022.Google ScholarGoogle ScholarCross RefCross Ref
  40. Guojun Xiong, Shufan Wang, Jian Li, and Rahul Singh. 2022. Model-free Reinforcement Learning for Content Caching at the Wireless Edge via Restless Bandits. arXiv preprint arXiv:2202.13187 (2022).Google ScholarGoogle Scholar
  41. Guojun Xiong, Shufan Wang, Gang Yan, and Jian Li. 2022. Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach. In Proc. of IEEE INFOCOM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A Control-Theoretic Approach for Dynamic Adaptive Video Streaming Over HTTP. In Proc. of ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gabriel Zayas-Cabán, Stefanus Jasin, and Guihua Wang. 2019. An Asymptotically Optimal Heuristic for General Nonstationary Finite-Horizon Restless Multi-Armed, Multi-Action Bandits. Advances in Applied Probability 51, 3 (2019), 745--772.Google ScholarGoogle ScholarCross RefCross Ref
  44. Chao Zhou, Chia-Wen Lin, and Zongming Guo. 2016. mDASH: A Markov Decision-based Rate Adaptation Approach for Dynamic HTTP Streaming. IEEE Transactions on Multimedia 18, 4 (2016), 738--751.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yihan Zou, Kwang Taik Kim, Xiaojun Lin, and Mung Chiang. 2021. Minimizing Age-of-Information in Heterogeneous Multi-Channel Systems: A New Partial-Index Approach. In Proc. of ACM MobiHoc.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Index-aware reinforcement learning for adaptive video streaming at the wireless edge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
        October 2022
        442 pages
        ISBN:9781450391658
        DOI:10.1145/3492866

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate296of1,843submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader