Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework
Introduction
Given the great advances in multimedia and communication technologies, online video sharing has become increasingly attractive to both research and industrial communities [34], [5]. To provide low-latency and high-quality online video services, several adaptive bitrate streaming (ABS) techniques have been proposed, such as Adobe HTTP Dynamic Streaming (HDS) [18], Apple HTTP Live Streaming (HLS) [4], Microsoft Smooth Streaming (MSS) [31] and MPEG Dynamic Adaptive Streaming over HTTP (DASH) [36]. Among these techniques, DASH has become an international ABS standard over HTTP [19] because it can guarantee high-quality video services that ensure the quality of experience (QoE) for users under time-varying network conditions. Fig. 1 illustrates a standard DASH system. As shown in Fig. 1(a), the DASH system first encodes the video content in different representations (e.g., with varying bitrates, resolutions or qualities). Then, each representation is divided into several segments (or chunks) with a fixed playback duration. The corresponding representation description is recorded in an XML-like file called a media presentation description (MPD) [37]. The encoded videos and MPD file are stored on a standard hypertext transfer protocol (HTTP) web server. Online users can request DASH video contents via HTTP, as shown in Fig. 1(b). The DASH client first receives and parses the MPD file and subsequently requests the desired segments in an appropriate representation version for playback with the adaptive bitrate controller according to the network conditions and playback state.
To guarantee user QoE under fluctuating network conditions between the DASH client and the DASH server, an adaptive bitrate (ABR) selection algorithm should be designed for the DASH system. With the ABR algorithm, the DASH client can request successive video segments at an appropriate bitrate based on the network conditions (bandwidth) and playback state (buffer length, playback freezing, and video quality) to avoid QoE losses [11]. Therefore, designing efficient ABR algorithms is a critical challenge for the DASH system. Generally, state-of-the-art DASH ABR algorithms can be grouped into two classes: model-based [33], [28], [30], [17], [45], [42], [38] and learning-based [10], [14], [39] algorithms. Model-based algorithms focus on pre-building a QoE model, and they control the ABR decision based on the model, whereas learning-based methods attempt to capitalise on experiences to reach a decision using learning methods such as reinforcement learning (RL). The existing methods have achieved some QoE gains; however, challenges and shortcomings remain in the state-of-the-art ABR algorithm design. First, the nature of variable bitrate video encoding is overlooked in the design of ABR algorithms [44]. Second, the video quality and fluctuations directly affect the user QoE; the video bitrate cannot directly affect the video quality in the DASH QoE modelling [10]. Additionally, streaming issues are perceived differently by different users: most users identify video quality fluctuations as the most irritating factor when streaming, whereas others identify video quality as the prime irritating factor [43]. Playback freezing is another factor that should be considered. Playback freezing occurs when the client buffer runs out of data or does not fully receive the next segment. An efficient ABR algorithm should consider the freezing frequency and duration to avoid such issues.
To address these deficiencies, in this paper, we introduce an RL-based DASH framework to improve the user QoE. With the RL algorithm, the optimised ABR algorithm is directly learned from experience without the need to pre-set fixed heuristics or inaccurate system models [18]. The proposed framework includes a more stable DASH video encoding method and an RL-based ABR algorithm that considers the aforementioned deficiencies. The main innovations and contributions of this paper are as follows:
- •
The DASH ABR selection problem is formulated as a Markov decision process (MDP) problem, and an RL-based ABR method is used to solve the MDP problem. In the proposed algorithm, the DASH client acts as the RL agent, and the network variation acts as the environment. We compare our proposed method with other successful RL algorithms to demonstrate that our method is the most appropriate for solving the problem.
- •
In this algorithm, the video quality and bitrate, client buffer status, and playback freezing issues are adopted as the RL input, while the proposed user QoE, which jointly considers the video quality and buffer status, is used as the reward. Then, the proposed RL-based ABR algorithm is embedded in the proposed DASH framework.
The experimental results show that the proposed RL-based ABR algorithm outperforms other state-of-the-art schemes by a noticeable margin in terms of both temporal and visual QoE metrics while also guaranteeing the application-level fairness when the clients share a bottlenecked network.
The remainder of this paper is structured as follows. Section 2 provides an overview of the state-of-the-art ABS technologies and DASH ABR algorithms. Section 3 details the proposed RL-based DASH framework model. Experimental results and conclusions are provided in Sections 4 Performance evaluation, 5 Conclusion, respectively.
Section snippets
Adaptive bitrate streaming technology
Because of the rapidly increasing demand for online video services, ABS systems have become a new technical trend [5]. State-of-the-art ABS systems provide consumers with higher quality videos using less manpower and fewer resources, and they have become the predominant video delivery system. When the ABS system works properly, end users should enjoy high-quality video playback without notable interruption.
Among various ABS strategies, DASH has become the most widely used [31] because it has
QoE-oriented DASH framework design
The proposed QoE-oriented DASH framework is shown in Fig. 3. The DASH video dataset is generated with the SSIM-based rate control (RC) scheme. The RL-based ABR controller takes the ABR decision to guarantee a high user QoE according to the observed “segment download state” and “playback state”. In this section, we first describe our observations and motivation to establish the video dataset with the RC scheme and optimising video quality instead of the bitrate. Then, we formulate the DASH ABR
Performance evaluation
We verified the effectiveness of the proposed algorithm through experiments on an experimental platform based on the guidelines provided by the DASH industry forum (DASH-IF) [6], as shown in Fig. 7. The client server is connected through a network emulator that can simulate network conditions with network traces. Apache 2.4.1 acts as the DASH server and stores the original and proposed video datasets. On the client side, we adopt and modify dash.js [7] to support each of the aforementioned
Conclusion
In this paper, we designed a QoE-oriented RL-based DASH streaming framework to further improve the adaptability and user QoE of the DASH system. When the RL algorithm is carefully designed and well trained, the optimised ABR algorithm can outperform other methods. Due to the RL-based ABR algorithm, which jointly considers the video quality and playback buffer status, the proposed QoE-oriented DASH framework outperforms other state-of-the-art approaches by a noticeable margin in terms of both
CRediT authorship contribution statement
Xuekai Wei: Conceptualization, Investigation, Methodology, Formal analysis, Project administration, Software, Writing - original draft. Mingliang Zhou: Formal analysis, Software, Validation, Visualization, Writing - review & editing. Sam Kwong: Funding acquisition, Supervision, Writing - review & editing. Hui Yuan: Funding acquisition, Supervision, Writing - review & editing. Shiqi Wang: Resources, Visualization, Writing - review & editing. Guopu Zhu: Data curation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the Natural Science Foundation of China under Grant 61801303, 61672443, and 61871342, in part by Hong Kong RGC General Research Fund 9042489 under Grant CityU 11206317, under Grant 9042958 (CityU 11203820), under Grant 9042816 (CityU 11209819), in part by the General Program of National Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0790, the Fundamental Research Funds for the Central Universities under Grant 2020CDJ-LHZZ-052, the
References (48)
- et al.
Lpg-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network
Information Sciences
(2020) - et al.
Counting the frequency of time-constrained serial episodes in a streaming sequence
Information Sciences
(2019) - et al.
Gradient descent evolved imbalanced data gravitation classification with an application on internet video traffic identification
Information Sciences
(2020) - et al.
Cooperative bargaining game-based multiuser bandwidth allocation for dynamic adaptive streaming over http
IEEE Transactions on Multimedia
(2018) - et al.
Towards influence of chunk size variation on video streaming in wireless networks
- et al.
Ssim-based global optimization for ctu-level rate control in hevc
- F.C.C. 2016. Raw data – measuring broadband america....
- et al.
Tensorflow: A system for large-scale machine learning
- et al.
Dash adaptation algorithm based on adaptive forgetting factor estimation
IEEE Transactions on Multimedia
(May 2018) - et al.
In-network quality optimization for adaptive video streaming services
IEEE Transactions on Multimedia
(2014)
Steady-state behavior of kalman filter with discrete- and continuous-time observations
IEEE Transactions on Automatic Control
D-dash: A deep q-learning framework for dash video streaming
IEEE Transactions on Cognitive Communications and Networking
Real-time qoe estimation of dash-based mobile video applications through edge computing
Multimedia open source project
A quantitative measure of fairness and discrimination for resource allocation in shared computer systems
CoRR, cs.NI/9809099
Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive
IEEE/ACM Transactions on Networking
Helping hand or hidden hurdle: Proxy-assisted http-based adaptive streaming performance
Cited by (18)
HTTP adaptive streaming scheme based on reinforcement learning with edge computing assistance
2023, Journal of Network and Computer ApplicationsIntelligent aerial video streaming: Achievements and challenges
2023, Journal of Network and Computer ApplicationsCitation Excerpt :The performance evaluation indicates that the proposed framework has improved rate control, making video quality more stable, and buffer occupancy is minimised. An important discovery by Sunny et al. (2019) where they have spotted the instability of DASH was caused by the prolonged oscillations of user bandwidth and dynamics, therefore in Wei et al. (2021), a RL-based solution for DASH adaptive bitrate for enhancing QoE was proposed. Likewise, the bitrate selection problem was firstly formulated as MDP.
A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings
2022, Information SciencesCitation Excerpt :Furthermore, model-free methods may suffer from one fatal limitation: low data efficiency in practical applications. The model-based methods train the agent by utilizing both real and imaginary data, thereby increasing data efficiency [32,33]. The imaginary data is generated by the dynamic model, which can reduce interaction with the environment and provide extra training data.
Toward Optimal Real-Time Volumetric Video Streaming: A Rolling Optimization and Deep Reinforcement Learning Based Approach
2023, IEEE Transactions on Circuits and Systems for Video TechnologyOptimizing Adaptive Video Streaming with Human Feedback
2023, MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
- 1
Xuekai Wei and Mingliang Zhou contribute equally to this work.