Skip to main content
Log in

Adaptive video streaming quality of experience assessment based on multi-layer feature perception

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

With the rapid advancement of streaming media technology, adaptive video streaming quality of experience (QoE) has become a key factor in optimizing adaptive bitrate and compression algorithms. However, distortions such as video compression artifacts and rebuffering frequently occur during the generation and transmission of adaptive video streaming, which poses a significant challenge to accurately assessing QoE. To address this challenge, we propose a novel multi-layer feature perception method (MLFP). The MLFP framework consists of three layers, i.e., frame-level perception layer, segment-level perception layer, and global perception layer. In the frame-level perception layer, the mask attention mechanism and ResNet50 capture users’ nuanced sensations of compression and transmission distortions. To effectively characterize the impact of rebuffering on the visual experience, an adaptive streaming rebuffering perception discriminator is designed. The segment-level perception layer identifies four key quality of service (QoS) features and employs the one-dimensional group convolution and GRU network to capture the intrinsic connections between these features. The global perception layer uses a streamlined neural network to handle global statistical QoS features, providing comprehensive insights into overall QoE. The perceptual scores from the three layers are fused to generate a final QoE score. The results of experiments on four public adaptive video streaming datasets, namely WaterlooSQoE-I, LIVE-NFLX-II, WaterlooSQoE-III, and WaterlooSQoE-IV, demonstrate that the proposed method can achieve effective QoE assessment in real distortion and outperforms other partially recent QoE methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Mordor Intelligence.: Analysis of media streaming market size and share - growth trends and forecast (2024-2029). IOP Publishing PhysicsWeb. https://www.mordorintelligence.com/zh-CN/industry-reports/media-streaming-market. Accessed (15 May 2024)

  2. Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R.: A survey on bitrate adaptation schemes for streaming media over http. IEEE Commun. Surv. & Tutor. 21(1), 562–585 (2018)

    Google Scholar 

  3. Lina, D., Li, J., Zhuo, L., Yang, S.: Vcfnet: video clarity-fluency network for quality of experience evaluation model of http adaptive video streaming services. Multimed. Tools Appl. 81(29), 42907–42923 (2022)

    Google Scholar 

  4. ITU.: ITU-T. Vocabulary for performance, quality of service and quality of experience, IOP Publishing PhysicsWeb. https://www.itu.int/rec/T-REC-P.10-201711-I/en (2017). Accessed (15 Jun 2024)

  5. Series, P.: Terminals and subjective and objective assessment methods. Geneva, Switzerland, ITU (2016)

    MATH  Google Scholar 

  6. Mok, Ricky K.P., Chan, Edmond W.W., Chang, Rocky K.C.: Measuring the quality of experience of http video streaming. In 12th IFIP/IEEE international symposium on integrated network management (IM 2011) and workshops, pages 485–492. IEEE, (2011)

  7. Hoßfeld, T., Schatz, R., Biersack, E., Plissonneau, L.: Internet video delivery in youtube: From traffic measurements to quality of experience. Data Traffic Monitoring and Analysis: From Measurement, Classification, and Anomaly Detection to Quality of Experience, pages 264–301, (2013)

  8. Eswara, N., Ashique, S., Panchbhai, A., Chakraborty, S., Sethuram, H.P., Kuchi, K., Kumar, A., Channappayya, S.S.: Streaming video qoe modeling and prediction: a long short-term memory approach. IEEE Trans. Circuits Syst. Video Technol. 30(3), 661–673 (2019)

    Google Scholar 

  9. Ghadiyaram, D., Pan, J., Bovik, A.C.: Learning a continuous-time streaming video qoe model. IEEE Trans. Image Process. 27(5), 2257–2271 (2018)

    MathSciNet  MATH  Google Scholar 

  10. Barman, N., Martini, M.G.: Qoe modeling for http adaptive video streaming-a survey and open challenges. Ieee Access 7, 30831–30859 (2019)

    MATH  Google Scholar 

  11. Li, C., Lim, M., Bentaleb, A., Zimmermann, R.: A real-time blind quality-of-experience assessment metric for http adaptive streaming. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 1661–1666. IEEE, (2023)

  12. Hoßfeld, T., Seufert, M., Hirth, M., Zinner, T., Tran-Gia, P., Schatz, R.: Quantification of youtube qoe via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499. IEEE, (2011)

  13. Rodriguez, D.Z., Abrahao, J., Begazo, D.C., Rosa, R.L., Bressan, G.: Quality metric to assess video streaming service over tcp considering temporal location of pauses. IEEE Trans. Consum. Electron. 58(3), 985–992 (2012)

    Google Scholar 

  14. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. image process. 13(4), 600–612 (2004)

    MATH  Google Scholar 

  15. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. image process. 21(12), 4695–4708 (2012)

    MathSciNet  MATH  Google Scholar 

  16. Sun, W., Min, X., Lu, W., Zhai, G.: A deep learning based no-reference quality assessment model for ugc videos. In Proceedings of the 30th ACM International Conference on Multimedia, pages 856–865, (2022)

  17. Duanmu, Z., Zeng, K., Ma, K., Rehman, A., Wang, Z.: A quality-of-experience index for streaming video. IEEE J. Sel. Top. Sign. Process. 11(1), 154–166 (2016)

    MATH  Google Scholar 

  18. Bampis, C.G., Bovik, A.C.: Learning to predict streaming video qoe: Distortions, rebuffering and memory. arXiv preprint arXiv:1703.00633, (2017)

  19. Robitza, W., Garcia, M.N., Raake, A.: A modular http adaptive streaming qoe model-candidate for itu-t p. 1203 (“p. nats”). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, (2017)

  20. Duanmu, Z., Liu, W., Chen, D., Li, Z., Wang, Z., Wang, Y., Gao, W.: A bayesian quality-of-experience model for adaptive streaming videos. ACM Trans. Multimed. Comput. Commun. Appl. 18(3s), 1–24 (2023)

    MATH  Google Scholar 

  21. Chen, P., Li, L., Jinjian, W., Zhang, Y., Lin, W.: Temporal reasoning guided qoe evaluation for mobile live video broadcasting. IEEE Trans. Image Process. 30, 3279–3292 (2021)

    MATH  Google Scholar 

  22. Jia, Z., Min, X., Sun, W., Zhai, G.: Continuous and overall quality of experience evaluation for streaming video based on rich features exploration and dual-stage attention. IEEE Trans. Circuits Syst. Video Technol. 34, 11709–11723 (2024)

    MATH  Google Scholar 

  23. He, K, Zhang, X, Ren, S, Sun, J: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, (2016)

  24. Li, D., Jiang, T., Jiang, M.: Unified quality assessment of in-the-wild videos with mixed datasets training. Int. J. Comput. Vis. 129(4), 1238–1257 (2021)

    MATH  Google Scholar 

  25. Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst. (2017)

  26. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, (2014)

  27. Tran, H.T.T., Vu, T., Ngoc, N.P., Thang, T.C.: A novel quality model for http adaptive streaming. In 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pages 423–428. IEEE, (2016)

  28. Asan, A., Robitza, W., Mkwawa, I., Sun, L., Ifeachor, E., Raake, A.: Impact of video resolution changes on qoe for adaptive video streaming. In 2017 IEEE international conference on multimedia and expo (ICME), pages 499–504. IEEE, (2017)

  29. Bampis, C.G, Li, Z., Katsavounidis, I., Huang, T.Y., Ekanadham, C., Bovik, A.C.: Towards perceptually optimized end-to-end adaptive video streaming. arXiv preprint arXiv:1808.03898, (2018)

  30. Duanmu, Z., Rehman, A., Wang, Z.: A quality-of-experience database for adaptive video streaming. IEEE Trans. Broadcast. 64(2), 474–487 (2018)

    MATH  Google Scholar 

  31. Duanmu, Z., Liu, W., Li, Z., Chen, D., Wang, Z., Wang, Y., Gao, W.: Assessing the quality-of-experience of adaptive bitrate video streaming. arXiv preprint arXiv:2008.08804, (2020)

  32. Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image process. 15(11), 3440–3451 (2006)

    Google Scholar 

  33. Liu, X., Dobrian, F., Milner, H., Jiang, J., Sekar, V., Stoica, I., Zhang, H.: A case for a coordinated internet video control plane. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pages 359–370, (2012)

  34. Xue, J., Zhang, D.Q., Yu, H., Chen, C.W.: Assessing quality of experience for adaptive http video streaming. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6. IEEE, (2014)

  35. Yin, X., Jindal, A., Sekar, V., Sinopoli, B.: A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325–338, (2015)

  36. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind’’ image quality analyzer. IEEE Signal process. lett. 20(3), 209–212 (2012)

    MATH  Google Scholar 

  37. Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2015)

    MathSciNet  MATH  Google Scholar 

  38. Zhang, Z., Wu, W., Sun, W., Tu, D., Lu, W., Min, X., Chen, Y., Zhai, G.: Md-vqa: Multi-dimensional quality assessment for ugc live videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1746–1755, (2023)

  39. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, (2009)

  40. Kinga, D., Adam, Jimmy Ba, et al.: A method for stochastic optimization. In International conference on learning representations (ICLR), volume 5, page 6. San Diego, California;, (2015)

  41. Zhengzhong, Tu., Xiangxu, Yu., Wang, Yilin, Birkbeck, Neil, Adsumilli, Balu, Bovik, Alan C.: Rapique: rapid and accurate video quality prediction of user generated content. IEEE Open J. Signal Process. 2, 425–440 (2021)

    Google Scholar 

Download references

Funding

This work was supported in part by the Science and Technology Foundation of Guizhou Province (Grant No. QKHJCZK[2024]063), and in part by National Natural Science Foundation of China (Grant No. 62266011).

Author information

Authors and Affiliations

Authors

Contributions

J.H., G.K., X.D., and H.L. contributed to the conceptualization and design of the study. J.H. conducted the experiments and collected the data. G.K., J.H., X.D. and H.L. analyzed the data and performed statistical analysis. J.H. and G.K. wrote the main manuscript text. J.H. and X.D. prepared all figures and tables in the manuscript. All authors reviewed and edited the manuscript. This author contributions statement has been approved by all authors.

Corresponding author

Correspondence to Guangqian Kong.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A parameter ablation

Appendix A parameter ablation

We conducted some simple experiments to obtain the optimal parameters of the MLFP. First, we compared AMPM to traditional methods such as LSTM and GRU, which are commonly used to learn long-term dependencies. AMPM was replaced by the LSTM and GRU structures, while other aspects of the methodology remained consistent. To ensure fairness in the evaluation, all models were configured with two layers to produce a score through average pooling. As Table 7 shows, AMPM outperforms LSTM and GRU in predicting the overall QoE. In contrast to LSTM and GRU, which treat all time steps equally, attention methods selectively focusing on important segments help to improve the performance of quality of experience assessment.

Table 7 Memory method and pooling ablation

Second, we compared three pooling methods: "last_one", subjective inspiration pooling (SIP) [24] based on human memory effects, and simple averaging ("mean"). Table 7 demonstrated the superior performance of the mean pooling method. Despite not explicitly addressing human memory effects, the attention mechanism implicitly captures relevant features, improving the performance of the mean pooling method.

Third, we investigated the impact of varying the number of attention heads (1, 2, 4, 8, and 16) on method performance. Figure 7 shows that the method achieves optimal performance with 2 attention heads.

Fig. 7
figure 7

Comparison of SRCC/PLCC values for different attention heads on WaterlooSQoE-III

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Kong, G., Duan, X. et al. Adaptive video streaming quality of experience assessment based on multi-layer feature perception. SIViP 19, 335 (2025). https://doi.org/10.1007/s11760-025-03906-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03906-1

Keywords