Elsevier

Information Sciences

Volume 569, August 2021, Pages 786-803
Information Sciences

Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework

https://doi.org/10.1016/j.ins.2021.05.012Get rights and content

Abstract

Dynamic adaptive streaming over the HTTP (DASH) standard has been widely adopted by many content providers for online video transmission and greatly improve the performance. Designing an efficient DASH system is challenging because of the inherent large fluctuations characterizing both encoded video sequences and network traces. In this paper, a reinforcement learning (RL)-based DASH technique that addresses user quality of experience (QoE) is constructed. The DASH adaptive bitrate (ABR) selection problem is formulated as a Markov decision process (MDP) problem. Accordingly, an RL-based solution is proposed to solve the MDP problem, in which the DASH clients act as the RL agent, and the network variation constitutes the environment. The proposed user QoE is used as the reward by jointly considering the video quality and buffer status. The goal of the RL algorithm is to select a suitable video quality level for each video segment to maximize the total reward. Then, the proposed RL-based ABR algorithm is embedded in the QoE-oriented DASH framework. Experimental results show that the proposed RL-based ABR algorithm outperforms state-of-the-art schemes in terms of both temporal and visual QoE factors by a noticeable margin while guaranteeing application-level fairness when multiple clients share a bottlenecked network.

Introduction

Given the great advances in multimedia and communication technologies, online video sharing has become increasingly attractive to both research and industrial communities [34], [5]. To provide low-latency and high-quality online video services, several adaptive bitrate streaming (ABS) techniques have been proposed, such as Adobe HTTP Dynamic Streaming (HDS) [18], Apple HTTP Live Streaming (HLS) [4], Microsoft Smooth Streaming (MSS) [31] and MPEG Dynamic Adaptive Streaming over HTTP (DASH) [36]. Among these techniques, DASH has become an international ABS standard over HTTP [19] because it can guarantee high-quality video services that ensure the quality of experience (QoE) for users under time-varying network conditions. Fig. 1 illustrates a standard DASH system. As shown in Fig. 1(a), the DASH system first encodes the video content in different representations (e.g., with varying bitrates, resolutions or qualities). Then, each representation is divided into several segments (or chunks) with a fixed playback duration. The corresponding representation description is recorded in an XML-like file called a media presentation description (MPD) [37]. The encoded videos and MPD file are stored on a standard hypertext transfer protocol (HTTP) web server. Online users can request DASH video contents via HTTP, as shown in Fig. 1(b). The DASH client first receives and parses the MPD file and subsequently requests the desired segments in an appropriate representation version for playback with the adaptive bitrate controller according to the network conditions and playback state.

To guarantee user QoE under fluctuating network conditions between the DASH client and the DASH server, an adaptive bitrate (ABR) selection algorithm should be designed for the DASH system. With the ABR algorithm, the DASH client can request successive video segments at an appropriate bitrate based on the network conditions (bandwidth) and playback state (buffer length, playback freezing, and video quality) to avoid QoE losses [11]. Therefore, designing efficient ABR algorithms is a critical challenge for the DASH system. Generally, state-of-the-art DASH ABR algorithms can be grouped into two classes: model-based [33], [28], [30], [17], [45], [42], [38] and learning-based [10], [14], [39] algorithms. Model-based algorithms focus on pre-building a QoE model, and they control the ABR decision based on the model, whereas learning-based methods attempt to capitalise on experiences to reach a decision using learning methods such as reinforcement learning (RL). The existing methods have achieved some QoE gains; however, challenges and shortcomings remain in the state-of-the-art ABR algorithm design. First, the nature of variable bitrate video encoding is overlooked in the design of ABR algorithms [44]. Second, the video quality and fluctuations directly affect the user QoE; the video bitrate cannot directly affect the video quality in the DASH QoE modelling [10]. Additionally, streaming issues are perceived differently by different users: most users identify video quality fluctuations as the most irritating factor when streaming, whereas others identify video quality as the prime irritating factor [43]. Playback freezing is another factor that should be considered. Playback freezing occurs when the client buffer runs out of data or does not fully receive the next segment. An efficient ABR algorithm should consider the freezing frequency and duration to avoid such issues.

To address these deficiencies, in this paper, we introduce an RL-based DASH framework to improve the user QoE. With the RL algorithm, the optimised ABR algorithm is directly learned from experience without the need to pre-set fixed heuristics or inaccurate system models [18]. The proposed framework includes a more stable DASH video encoding method and an RL-based ABR algorithm that considers the aforementioned deficiencies. The main innovations and contributions of this paper are as follows:

  • The DASH ABR selection problem is formulated as a Markov decision process (MDP) problem, and an RL-based ABR method is used to solve the MDP problem. In the proposed algorithm, the DASH client acts as the RL agent, and the network variation acts as the environment. We compare our proposed method with other successful RL algorithms to demonstrate that our method is the most appropriate for solving the problem.

  • In this algorithm, the video quality and bitrate, client buffer status, and playback freezing issues are adopted as the RL input, while the proposed user QoE, which jointly considers the video quality and buffer status, is used as the reward. Then, the proposed RL-based ABR algorithm is embedded in the proposed DASH framework.

The experimental results show that the proposed RL-based ABR algorithm outperforms other state-of-the-art schemes by a noticeable margin in terms of both temporal and visual QoE metrics while also guaranteeing the application-level fairness when the clients share a bottlenecked network.

The remainder of this paper is structured as follows. Section 2 provides an overview of the state-of-the-art ABS technologies and DASH ABR algorithms. Section 3 details the proposed RL-based DASH framework model. Experimental results and conclusions are provided in Sections 4 Performance evaluation, 5 Conclusion, respectively.

Section snippets

Adaptive bitrate streaming technology

Because of the rapidly increasing demand for online video services, ABS systems have become a new technical trend [5]. State-of-the-art ABS systems provide consumers with higher quality videos using less manpower and fewer resources, and they have become the predominant video delivery system. When the ABS system works properly, end users should enjoy high-quality video playback without notable interruption.

Among various ABS strategies, DASH has become the most widely used [31] because it has

QoE-oriented DASH framework design

The proposed QoE-oriented DASH framework is shown in Fig. 3. The DASH video dataset is generated with the SSIM-based rate control (RC) scheme. The RL-based ABR controller takes the ABR decision to guarantee a high user QoE according to the observed “segment download state” and “playback state”. In this section, we first describe our observations and motivation to establish the video dataset with the RC scheme and optimising video quality instead of the bitrate. Then, we formulate the DASH ABR

Performance evaluation

We verified the effectiveness of the proposed algorithm through experiments on an experimental platform based on the guidelines provided by the DASH industry forum (DASH-IF) [6], as shown in Fig. 7. The client server is connected through a network emulator that can simulate network conditions with network traces. Apache 2.4.1 acts as the DASH server and stores the original and proposed video datasets. On the client side, we adopt and modify dash.js [7] to support each of the aforementioned

Conclusion

In this paper, we designed a QoE-oriented RL-based DASH streaming framework to further improve the adaptability and user QoE of the DASH system. When the RL algorithm is carefully designed and well trained, the optimised ABR algorithm can outperform other methods. Due to the RL-based ABR algorithm, which jointly considers the video quality and playback buffer status, the proposed QoE-oriented DASH framework outperforms other state-of-the-art approaches by a noticeable margin in terms of both

CRediT authorship contribution statement

Xuekai Wei: Conceptualization, Investigation, Methodology, Formal analysis, Project administration, Software, Writing - original draft. Mingliang Zhou: Formal analysis, Software, Validation, Visualization, Writing - review & editing. Sam Kwong: Funding acquisition, Supervision, Writing - review & editing. Hui Yuan: Funding acquisition, Supervision, Writing - review & editing. Shiqi Wang: Resources, Visualization, Writing - review & editing. Guopu Zhu: Data curation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grant 61801303, 61672443, and 61871342, in part by Hong Kong RGC General Research Fund 9042489 under Grant CityU 11206317, under Grant 9042958 (CityU 11203820), under Grant 9042816 (CityU 11209819), in part by the General Program of National Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0790, the Fundamental Research Funds for the Central Universities under Grant 2020CDJ-LHZZ-052, the

References (48)

  • DASH-IF. Dash industry forum: Catalyzing the adoption of mpeg-dash,...
  • DASH-IF:dash.js. A reference client implementation for the playback of mpeg dash via javascript and compliant browsers,...
  • J. De Vriendt, D. De Vleeschauwer, and D. Robinson. Model for estimating qoe of video delivered using http adaptive...
  • B. Friedland

    Steady-state behavior of kalman filter with discrete- and continuous-time observations

    IEEE Transactions on Automatic Control

    (October 1980)
  • M. Gadaleta et al.

    D-dash: A deep q-learning framework for dash video streaming

    IEEE Transactions on Cognitive Communications and Networking

    (2017)
  • C. Ge et al.

    Real-time qoe estimation of dash-based mobile video applications through edge computing

  • G.P.A.C. Gpac

    Multimedia open source project

    (2019)
  • R. Haakon, V. Paul, G. Carsten, and H. Pål. Commute path bandwidth traces from 3g networks: Analysis and applications....
  • M. Hongzi, N. Ravi, and A. Mohammad. Neural adaptive video streaming with pensieve, in:s Proceedings of the Conference...
  • T. Huang, R.-X. Zhang, C. Zhou, and L. Sun. Qarc: Video quality aware rate control for real-time video streaming based...
  • R. Jain et al.

    A quantitative measure of fairness and discrimination for resource allocation in shared computer systems

    CoRR, cs.NI/9809099

    (1998)
  • J. Jiang et al.

    Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive

    IEEE/ACM Transactions on Networking

    (2014)
  • V. Krishnamoorthi et al.

    Helping hand or hidden hurdle: Proxy-assisted http-based adaptive streaming performance

  • T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with...
  • Cited by (18)

    • Intelligent aerial video streaming: Achievements and challenges

      2023, Journal of Network and Computer Applications
      Citation Excerpt :

      The performance evaluation indicates that the proposed framework has improved rate control, making video quality more stable, and buffer occupancy is minimised. An important discovery by Sunny et al. (2019) where they have spotted the instability of DASH was caused by the prolonged oscillations of user bandwidth and dynamics, therefore in Wei et al. (2021), a RL-based solution for DASH adaptive bitrate for enhancing QoE was proposed. Likewise, the bitrate selection problem was firstly formulated as MDP.

    • A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings

      2022, Information Sciences
      Citation Excerpt :

      Furthermore, model-free methods may suffer from one fatal limitation: low data efficiency in practical applications. The model-based methods train the agent by utilizing both real and imaginary data, thereby increasing data efficiency [32,33]. The imaginary data is generated by the dynamic model, which can reduce interaction with the environment and provide extra training data.

    • Optimizing Adaptive Video Streaming with Human Feedback

      2023, MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
    View all citing articles on Scopus
    1

    Xuekai Wei and Mingliang Zhou contribute equally to this work.

    View full text