Abstract
With the rapid advancement of streaming media technology, adaptive video streaming quality of experience (QoE) has become a key factor in optimizing adaptive bitrate and compression algorithms. However, distortions such as video compression artifacts and rebuffering frequently occur during the generation and transmission of adaptive video streaming, which poses a significant challenge to accurately assessing QoE. To address this challenge, we propose a novel multi-layer feature perception method (MLFP). The MLFP framework consists of three layers, i.e., frame-level perception layer, segment-level perception layer, and global perception layer. In the frame-level perception layer, the mask attention mechanism and ResNet50 capture users’ nuanced sensations of compression and transmission distortions. To effectively characterize the impact of rebuffering on the visual experience, an adaptive streaming rebuffering perception discriminator is designed. The segment-level perception layer identifies four key quality of service (QoS) features and employs the one-dimensional group convolution and GRU network to capture the intrinsic connections between these features. The global perception layer uses a streamlined neural network to handle global statistical QoS features, providing comprehensive insights into overall QoE. The perceptual scores from the three layers are fused to generate a final QoE score. The results of experiments on four public adaptive video streaming datasets, namely WaterlooSQoE-I, LIVE-NFLX-II, WaterlooSQoE-III, and WaterlooSQoE-IV, demonstrate that the proposed method can achieve effective QoE assessment in real distortion and outperforms other partially recent QoE methods.






Similar content being viewed by others
Data Availability
No datasets were generated or analysed during the current study.
References
Mordor Intelligence.: Analysis of media streaming market size and share - growth trends and forecast (2024-2029). IOP Publishing PhysicsWeb. https://www.mordorintelligence.com/zh-CN/industry-reports/media-streaming-market. Accessed (15 May 2024)
Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R.: A survey on bitrate adaptation schemes for streaming media over http. IEEE Commun. Surv. & Tutor. 21(1), 562–585 (2018)
Lina, D., Li, J., Zhuo, L., Yang, S.: Vcfnet: video clarity-fluency network for quality of experience evaluation model of http adaptive video streaming services. Multimed. Tools Appl. 81(29), 42907–42923 (2022)
ITU.: ITU-T. Vocabulary for performance, quality of service and quality of experience, IOP Publishing PhysicsWeb. https://www.itu.int/rec/T-REC-P.10-201711-I/en (2017). Accessed (15 Jun 2024)
Series, P.: Terminals and subjective and objective assessment methods. Geneva, Switzerland, ITU (2016)
Mok, Ricky K.P., Chan, Edmond W.W., Chang, Rocky K.C.: Measuring the quality of experience of http video streaming. In 12th IFIP/IEEE international symposium on integrated network management (IM 2011) and workshops, pages 485–492. IEEE, (2011)
Hoßfeld, T., Schatz, R., Biersack, E., Plissonneau, L.: Internet video delivery in youtube: From traffic measurements to quality of experience. Data Traffic Monitoring and Analysis: From Measurement, Classification, and Anomaly Detection to Quality of Experience, pages 264–301, (2013)
Eswara, N., Ashique, S., Panchbhai, A., Chakraborty, S., Sethuram, H.P., Kuchi, K., Kumar, A., Channappayya, S.S.: Streaming video qoe modeling and prediction: a long short-term memory approach. IEEE Trans. Circuits Syst. Video Technol. 30(3), 661–673 (2019)
Ghadiyaram, D., Pan, J., Bovik, A.C.: Learning a continuous-time streaming video qoe model. IEEE Trans. Image Process. 27(5), 2257–2271 (2018)
Barman, N., Martini, M.G.: Qoe modeling for http adaptive video streaming-a survey and open challenges. Ieee Access 7, 30831–30859 (2019)
Li, C., Lim, M., Bentaleb, A., Zimmermann, R.: A real-time blind quality-of-experience assessment metric for http adaptive streaming. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 1661–1666. IEEE, (2023)
Hoßfeld, T., Seufert, M., Hirth, M., Zinner, T., Tran-Gia, P., Schatz, R.: Quantification of youtube qoe via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499. IEEE, (2011)
Rodriguez, D.Z., Abrahao, J., Begazo, D.C., Rosa, R.L., Bressan, G.: Quality metric to assess video streaming service over tcp considering temporal location of pauses. IEEE Trans. Consum. Electron. 58(3), 985–992 (2012)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. image process. 13(4), 600–612 (2004)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. image process. 21(12), 4695–4708 (2012)
Sun, W., Min, X., Lu, W., Zhai, G.: A deep learning based no-reference quality assessment model for ugc videos. In Proceedings of the 30th ACM International Conference on Multimedia, pages 856–865, (2022)
Duanmu, Z., Zeng, K., Ma, K., Rehman, A., Wang, Z.: A quality-of-experience index for streaming video. IEEE J. Sel. Top. Sign. Process. 11(1), 154–166 (2016)
Bampis, C.G., Bovik, A.C.: Learning to predict streaming video qoe: Distortions, rebuffering and memory. arXiv preprint arXiv:1703.00633, (2017)
Robitza, W., Garcia, M.N., Raake, A.: A modular http adaptive streaming qoe model-candidate for itu-t p. 1203 (“p. nats”). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, (2017)
Duanmu, Z., Liu, W., Chen, D., Li, Z., Wang, Z., Wang, Y., Gao, W.: A bayesian quality-of-experience model for adaptive streaming videos. ACM Trans. Multimed. Comput. Commun. Appl. 18(3s), 1–24 (2023)
Chen, P., Li, L., Jinjian, W., Zhang, Y., Lin, W.: Temporal reasoning guided qoe evaluation for mobile live video broadcasting. IEEE Trans. Image Process. 30, 3279–3292 (2021)
Jia, Z., Min, X., Sun, W., Zhai, G.: Continuous and overall quality of experience evaluation for streaming video based on rich features exploration and dual-stage attention. IEEE Trans. Circuits Syst. Video Technol. 34, 11709–11723 (2024)
He, K, Zhang, X, Ren, S, Sun, J: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, (2016)
Li, D., Jiang, T., Jiang, M.: Unified quality assessment of in-the-wild videos with mixed datasets training. Int. J. Comput. Vis. 129(4), 1238–1257 (2021)
Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst. (2017)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, (2014)
Tran, H.T.T., Vu, T., Ngoc, N.P., Thang, T.C.: A novel quality model for http adaptive streaming. In 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pages 423–428. IEEE, (2016)
Asan, A., Robitza, W., Mkwawa, I., Sun, L., Ifeachor, E., Raake, A.: Impact of video resolution changes on qoe for adaptive video streaming. In 2017 IEEE international conference on multimedia and expo (ICME), pages 499–504. IEEE, (2017)
Bampis, C.G, Li, Z., Katsavounidis, I., Huang, T.Y., Ekanadham, C., Bovik, A.C.: Towards perceptually optimized end-to-end adaptive video streaming. arXiv preprint arXiv:1808.03898, (2018)
Duanmu, Z., Rehman, A., Wang, Z.: A quality-of-experience database for adaptive video streaming. IEEE Trans. Broadcast. 64(2), 474–487 (2018)
Duanmu, Z., Liu, W., Li, Z., Chen, D., Wang, Z., Wang, Y., Gao, W.: Assessing the quality-of-experience of adaptive bitrate video streaming. arXiv preprint arXiv:2008.08804, (2020)
Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image process. 15(11), 3440–3451 (2006)
Liu, X., Dobrian, F., Milner, H., Jiang, J., Sekar, V., Stoica, I., Zhang, H.: A case for a coordinated internet video control plane. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pages 359–370, (2012)
Xue, J., Zhang, D.Q., Yu, H., Chen, C.W.: Assessing quality of experience for adaptive http video streaming. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6. IEEE, (2014)
Yin, X., Jindal, A., Sekar, V., Sinopoli, B.: A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325–338, (2015)
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind’’ image quality analyzer. IEEE Signal process. lett. 20(3), 209–212 (2012)
Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2015)
Zhang, Z., Wu, W., Sun, W., Tu, D., Lu, W., Min, X., Chen, Y., Zhai, G.: Md-vqa: Multi-dimensional quality assessment for ugc live videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1746–1755, (2023)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, (2009)
Kinga, D., Adam, Jimmy Ba, et al.: A method for stochastic optimization. In International conference on learning representations (ICLR), volume 5, page 6. San Diego, California;, (2015)
Zhengzhong, Tu., Xiangxu, Yu., Wang, Yilin, Birkbeck, Neil, Adsumilli, Balu, Bovik, Alan C.: Rapique: rapid and accurate video quality prediction of user generated content. IEEE Open J. Signal Process. 2, 425–440 (2021)
Funding
This work was supported in part by the Science and Technology Foundation of Guizhou Province (Grant No. QKHJCZK[2024]063), and in part by National Natural Science Foundation of China (Grant No. 62266011).
Author information
Authors and Affiliations
Contributions
J.H., G.K., X.D., and H.L. contributed to the conceptualization and design of the study. J.H. conducted the experiments and collected the data. G.K., J.H., X.D. and H.L. analyzed the data and performed statistical analysis. J.H. and G.K. wrote the main manuscript text. J.H. and X.D. prepared all figures and tables in the manuscript. All authors reviewed and edited the manuscript. This author contributions statement has been approved by all authors.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A parameter ablation
Appendix A parameter ablation
We conducted some simple experiments to obtain the optimal parameters of the MLFP. First, we compared AMPM to traditional methods such as LSTM and GRU, which are commonly used to learn long-term dependencies. AMPM was replaced by the LSTM and GRU structures, while other aspects of the methodology remained consistent. To ensure fairness in the evaluation, all models were configured with two layers to produce a score through average pooling. As Table 7 shows, AMPM outperforms LSTM and GRU in predicting the overall QoE. In contrast to LSTM and GRU, which treat all time steps equally, attention methods selectively focusing on important segments help to improve the performance of quality of experience assessment.
Second, we compared three pooling methods: "last_one", subjective inspiration pooling (SIP) [24] based on human memory effects, and simple averaging ("mean"). Table 7 demonstrated the superior performance of the mean pooling method. Despite not explicitly addressing human memory effects, the attention mechanism implicitly captures relevant features, improving the performance of the mean pooling method.
Third, we investigated the impact of varying the number of attention heads (1, 2, 4, 8, and 16) on method performance. Figure 7 shows that the method achieves optimal performance with 2 attention heads.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, J., Kong, G., Duan, X. et al. Adaptive video streaming quality of experience assessment based on multi-layer feature perception. SIViP 19, 335 (2025). https://doi.org/10.1007/s11760-025-03906-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-025-03906-1