research-article

Public Access

Index-aware reinforcement learning for adaptive video streaming at the wireless edge

Authors:
Guojun Xiong

SUNY-Binghamton University

SUNY-Binghamton University
View Profile

,
Xudong Qin

Pennsylvania State University

Pennsylvania State University
View Profile

,
Bin Li

Pennsylvania State University

Pennsylvania State University
View Profile

,
Rahul Singh

Indian Institute of Science

Indian Institute of Science
View Profile

,
Jian Li

SUNY-Binghamton University

SUNY-Binghamton University
View Profile

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingOctober 2022Pages 81–90https://doi.org/10.1145/3492866.3549726

Published:03 October 2022Publication History

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Pages 81–90

ABSTRACT

We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.

References

Zahaib Akhtar, Yun Seong Nam, Ramesh Govindan, Sanjay Rao, Jessica Chen, Ethan Katz-Bassett, Bruno Ribeiro, Jibin Zhan, and Hui Zhang. 2018. Oboe: Auto-Tuning Video ABR Algorithms to Network Conditions. In Proc. of ACM SIGCOMM.Google ScholarDigital Library
Eitan Altman. 1999. Constrained Markov Decision Processes. Vol. 7. CRC Press.Google Scholar
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 2 (2002), 235--256.Google ScholarDigital Library
Konstantin Avrachenkov and Vivek S Borkar. 2020. Whittle Index Based Q-learning for Restless Bandits with Average Reward. arXiv preprint arXiv:2004.14427 (2020).Google Scholar
Dilip Bethanabhotla, Giuseppe Caire, and Michael J Neely. 2016. WiFlix: Adaptive Video Streaming in Massive MU-MIMO Wireless Networks. IEEE Transactions on Wireless Communications 15, 6 (2016), 4088--4103.Google ScholarCross Ref
Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. 2019. Qflow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks. In Proc. of ACM MobiHoc.Google ScholarDigital Library
Chao Chen, Robert W Heath, Alan C Bovik, and Gustavo de Veciana. 2013. A Markov Decision Model for Adaptive Scheduling of Stored Scalable Videos. IEEE Transactions on Circuits and Systems for Video Technology 23, 6 (2013), 1081--1095.Google ScholarDigital Library
Yonathan Efroni, Shie Mannor, and Matteo Pirotta. 2020. Exploration-Exploitation in Constrained MDPs. arXiv preprint arXiv:2003.02189 (2020).Google Scholar
Jing Fu, Yoni Nazarathy, Sarat Moka, and Peter G Taylor. 2019. Towards Q-Learning the Whittle Index for Restless Bandits. In 2019 Australian & New Zealand Control Conference (ANZCC). IEEE, 249--254.Google Scholar
Chen Gong and Xiaodong Wang. 2013. Adaptive Transmission for Delay-Constrained Wireless Video. IEEE Transactions on Wireless Communications 13, 1 (2013), 49--61.Google ScholarCross Ref
Aditya Gopalan and Shie Mannor. 2015. Thompson Sampling for Learning Parameterized Markov Decision Processes. In Proc. of COLT.Google Scholar
Yashuang Guo, Qinghai Yang, F Richard Yu, and Victor CM Leung. 2017. Dynamic Quality Adaptation and Bandwidth Allocation for Adaptive Streaming Over Time-Varying Wireless Networks. IEEE Transactions on Wireless Communications 16, 12 (2017), 8077--8091.Google ScholarDigital Library
David J Hodge and Kevin D Glazebrook. 2015. On the Asymptotic Optimality of Greedy Index Heuristics for Multi-Action Restless Bandits. Advances in Applied Probability 47, 3 (2015), 652--667.Google ScholarCross Ref
Weici Hu and Peter Frazier. 2017. An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits. arXiv preprint arXiv:1707.00205 (2017).Google Scholar
Thomas Jaksch, Ronald Ortner, and Peter Auer. 2010. Near-Optimal Regret Bounds for Reinforcement Learning. Journal of Machine Learning Research 11, 4 (2010).Google Scholar
Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, and Tiancheng Yu. 2019. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition. arXiv preprint arXiv:1912.01192 (2019).Google Scholar
Krishna C Kalagarla, Rahul Jain, and Pierluigi Nuzzo. 2021. A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints. In Proc. of AAAI.Google ScholarCross Ref
Jonathan Kua, Grenville Armitage, and Philip Branch. 2017. A Survey of Rate Adaptation Techniques for Dynamic Adaptive Streaming Over HTTP. IEEE Communications Surveys & Tutorials 19, 3 (2017), 1842--1866.Google ScholarDigital Library
Qiao Lan, Bojie Lv, Rui Wang, Kaibin Huang, and Yi Gong. 2020. Adaptive Video Streaming for Massive MIMO Networks via Approximate MDP and Reinforcement Learning. IEEE Transactions on Wireless Communications 19, 9 (2020), 5716--5731.Google ScholarCross Ref
Stefan Lederer, Christopher Müller, and Christian Timmerer. 2012. Dynamic Adaptive Streaming over HTTP Dataset. In Proc. of ACM MMSys.Google ScholarDigital Library
Zhi Li, Xiaoqing Zhu, Joshua Gahm, Rong Pan, Hao Hu, Ali C Begen, and David Oran. 2014. Probe and Adapt: Rate Adaptation for HTTP Video Streaming at Scale. IEEE Journal on Selected Areas in Communications 32, 4 (2014), 719--733.Google ScholarCross Ref
Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In Proc. of ACM SIGCOMM.Google ScholarDigital Library
Andreas Maurer and Massimiliano Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. arXiv preprint arXiv:0907.3740 (2009).Google Scholar
José Niño-Mora. 2007. Dynamic Priority Allocation via Restless Bandit Marginal Productivity Indices. Top 15, 2 (2007), 161--198.Google ScholarCross Ref
Ronald Ortner, Daniil Ryabko, Peter Auer, and Rémi Munos. 2012. Regret Bounds for Restless Markov Bandits. In Proc. of Algorithmic Learning Theory.Google ScholarDigital Library
Christos H Papadimitriou and John N Tsitsiklis. 1994. The Complexity of Optimal Queueing Network Control. In Proc. of IEEE Conference on Structure in Complexity Theory.Google ScholarCross Ref
Martin L Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.Google Scholar
Aviv Rosenberg and Yishay Mansour. 2019. Online Convex Optimization in Adversarial Markov Decision Processes. In Proc. of ICML.Google Scholar
Iraj Sodagar. 2011. The MPEG-DASH Standard for Multimedia Streaming Over the Internet. IEEE Multimedia 18, 4 (2011), 62--67.Google ScholarDigital Library
Kevin Spiteri, Rahul Urgaonkar, and Ramesh K Sitaraman. 2020. BOLA: Near-Optimal Bitrate Adaptation for Online Videos. IEEE/ACM Transactions on Networking 28, 4 (2020), 1698--1711.Google ScholarDigital Library
Thomas Stockhammer. 2011. Dynamic Adaptive Streaming Over HTTP-Standards and Design Principles. In Proc. of ACM MMSys.Google Scholar
Cisco Systems. 2019. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2017--2022 White Paper. [Online.] Available: https://s3.amazonaws.com/media.mediapost.com/uploads/CiscoForecast.pdf (2019).Google Scholar
Kexin Tang, Nuowen Kan, Junni Zou, Chenglin Li, Xiao Fu, Mingyi Hong, and Hongkai Xiong. 2021. Multi-user Adaptive Video Delivery over Wireless Networks: A Physical Layer Resource-Aware Deep Reinforcement Learning Approach. IEEE Transactions on Circuits and Systems for Video Technology 31, 2 (2021), 798--815.Google ScholarCross Ref
J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck. 2016. HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks. IEEE Communications Letters 20, 11 (2016), 2177--2180.Google ScholarCross Ref
Ina Maria Verloop. 2016. Asymptotically Optimal Priority Policies for Indexable and Nonindexable Restless Bandits. The Annals of Applied Probability 26, 4 (2016), 1947--1995.Google ScholarCross Ref
Siwei Wang, Longbo Huang, and John Lui. 2020. Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits. In Proc. of NeurIPS.Google Scholar
Richard R Weber and Gideon Weiss. 1990. On An Index Policy for Restless Bandits. Journal of Applied Probability (1990), 637--648.Google Scholar
Peter Whittle. 1988. Restless Bandits: Activity Allocation in A Changing World. Journal of Applied Probability (1988), 287--298.Google Scholar
Guojun Xiong, Jian Li, and Rahul Singh. 2022. Reinforcement Learning Augmented Asymptotically Optimal Index Policies for Finite-Horizon Restless Bandits. In Proc. of AAAI 2022.Google ScholarCross Ref
Guojun Xiong, Shufan Wang, Jian Li, and Rahul Singh. 2022. Model-free Reinforcement Learning for Content Caching at the Wireless Edge via Restless Bandits. arXiv preprint arXiv:2202.13187 (2022).Google Scholar
Guojun Xiong, Shufan Wang, Gang Yan, and Jian Li. 2022. Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach. In Proc. of IEEE INFOCOM.Google ScholarDigital Library
Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A Control-Theoretic Approach for Dynamic Adaptive Video Streaming Over HTTP. In Proc. of ACM SIGCOMM.Google ScholarDigital Library
Gabriel Zayas-Cabán, Stefanus Jasin, and Guihua Wang. 2019. An Asymptotically Optimal Heuristic for General Nonstationary Finite-Horizon Restless Multi-Armed, Multi-Action Bandits. Advances in Applied Probability 51, 3 (2019), 745--772.Google ScholarCross Ref
Chao Zhou, Chia-Wen Lin, and Zongming Guo. 2016. mDASH: A Markov Decision-based Rate Adaptation Approach for Dynamic HTTP Streaming. IEEE Transactions on Multimedia 18, 4 (2016), 738--751.Google ScholarDigital Library
Yihan Zou, Kwang Taik Kim, Xiaojun Lin, and Mung Chiang. 2021. Minimizing Age-of-Information in Heterogeneous Multi-Channel Systems: A New Partial-Index Approach. In Proc. of ACM MobiHoc.Google ScholarDigital Library

Index Terms

Index-aware reinforcement learning for adaptive video streaming at the wireless edge
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
2. Networks
  1. Network performance evaluation
    1. Network performance analysis

Recommendations

Whittle index based Q-learning for restless bandits with average reward
Abstract
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to ...
Read More
Reinforcement learning-based rate adaptation in dynamic video streaming
Abstract
Video streaming stands out as the most significant traffic type consumed by mobile devices. This increased demand has been a major driver for research on bitrate adaptation algorithms. Bitrate adaptation ensures high user-perceived quality, which, ...
Read More
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
October 2022
442 pages
ISBN:9781450391658
DOI:10.1145/3492866
General Chairs:
Song Chong
KAIST
,
Changhee Joo
Korea University
,
Kyunghan Lee
Seoul National University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
index policy
reinforcement learning
restless bandits
video streaming
wireless edge networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate296of1,843submissions,16%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 367
  Total Downloads
- Downloads (Last 12 months)166
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Index-aware reinforcement learning for adaptive video streaming at the wireless edge

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Whittle index based Q-learning for restless bandits with average reward

Reinforcement learning-based rate adaptation in dynamic video streaming

Reward Shaping in Episodic Reinforcement Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Index-aware reinforcement learning for adaptive video streaming at the wireless edge

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Whittle index based Q-learning for restless bandits with average reward

Reinforcement learning-based rate adaptation in dynamic video streaming

Reward Shaping in Episodic Reinforcement Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media