Adaptive video streaming quality of experience assessment based on multi-layer feature perception

Huang, Jing; Kong, Guangqian; Duan, Xun; Long, Huiyun

doi:10.1007/s11760-025-03906-1

Adaptive video streaming quality of experience assessment based on multi-layer feature perception

Original Paper
Published: 24 February 2025

Volume 19, article number 335, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jing Huang¹,
Guangqian Kong¹,
Xun Duan¹ &
…
Huiyun Long¹

159 Accesses
Explore all metrics

Abstract

With the rapid advancement of streaming media technology, adaptive video streaming quality of experience (QoE) has become a key factor in optimizing adaptive bitrate and compression algorithms. However, distortions such as video compression artifacts and rebuffering frequently occur during the generation and transmission of adaptive video streaming, which poses a significant challenge to accurately assessing QoE. To address this challenge, we propose a novel multi-layer feature perception method (MLFP). The MLFP framework consists of three layers, i.e., frame-level perception layer, segment-level perception layer, and global perception layer. In the frame-level perception layer, the mask attention mechanism and ResNet50 capture users’ nuanced sensations of compression and transmission distortions. To effectively characterize the impact of rebuffering on the visual experience, an adaptive streaming rebuffering perception discriminator is designed. The segment-level perception layer identifies four key quality of service (QoS) features and employs the one-dimensional group convolution and GRU network to capture the intrinsic connections between these features. The global perception layer uses a streamlined neural network to handle global statistical QoS features, providing comprehensive insights into overall QoE. The perceptual scores from the three layers are fused to generate a final QoE score. The results of experiments on four public adaptive video streaming datasets, namely WaterlooSQoE-I, LIVE-NFLX-II, WaterlooSQoE-III, and WaterlooSQoE-IV, demonstrate that the proposed method can achieve effective QoE assessment in real distortion and outperforms other partially recent QoE methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality of Experience Evaluation Model with No-Reference VMAF Metric and Deep Spatio-temporal Features of Video

Article 18 April 2022

Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network

Quality Assessment for Networked Video Streaming Based on Deep Learning

Data Availability

No datasets were generated or analysed during the current study.

References

Mordor Intelligence.: Analysis of media streaming market size and share - growth trends and forecast (2024-2029). IOP Publishing PhysicsWeb. https://www.mordorintelligence.com/zh-CN/industry-reports/media-streaming-market. Accessed (15 May 2024)
Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R.: A survey on bitrate adaptation schemes for streaming media over http. IEEE Commun. Surv. & Tutor. 21(1), 562–585 (2018)
Google Scholar
Lina, D., Li, J., Zhuo, L., Yang, S.: Vcfnet: video clarity-fluency network for quality of experience evaluation model of http adaptive video streaming services. Multimed. Tools Appl. 81(29), 42907–42923 (2022)
Google Scholar
ITU.: ITU-T. Vocabulary for performance, quality of service and quality of experience, IOP Publishing PhysicsWeb. https://www.itu.int/rec/T-REC-P.10-201711-I/en (2017). Accessed (15 Jun 2024)
Series, P.: Terminals and subjective and objective assessment methods. Geneva, Switzerland, ITU (2016)
MATH Google Scholar
Mok, Ricky K.P., Chan, Edmond W.W., Chang, Rocky K.C.: Measuring the quality of experience of http video streaming. In 12th IFIP/IEEE international symposium on integrated network management (IM 2011) and workshops, pages 485–492. IEEE, (2011)
Hoßfeld, T., Schatz, R., Biersack, E., Plissonneau, L.: Internet video delivery in youtube: From traffic measurements to quality of experience. Data Traffic Monitoring and Analysis: From Measurement, Classification, and Anomaly Detection to Quality of Experience, pages 264–301, (2013)
Eswara, N., Ashique, S., Panchbhai, A., Chakraborty, S., Sethuram, H.P., Kuchi, K., Kumar, A., Channappayya, S.S.: Streaming video qoe modeling and prediction: a long short-term memory approach. IEEE Trans. Circuits Syst. Video Technol. 30(3), 661–673 (2019)
Google Scholar
Ghadiyaram, D., Pan, J., Bovik, A.C.: Learning a continuous-time streaming video qoe model. IEEE Trans. Image Process. 27(5), 2257–2271 (2018)
MathSciNet MATH Google Scholar
Barman, N., Martini, M.G.: Qoe modeling for http adaptive video streaming-a survey and open challenges. Ieee Access 7, 30831–30859 (2019)
MATH Google Scholar
Li, C., Lim, M., Bentaleb, A., Zimmermann, R.: A real-time blind quality-of-experience assessment metric for http adaptive streaming. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 1661–1666. IEEE, (2023)
Hoßfeld, T., Seufert, M., Hirth, M., Zinner, T., Tran-Gia, P., Schatz, R.: Quantification of youtube qoe via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499. IEEE, (2011)
Rodriguez, D.Z., Abrahao, J., Begazo, D.C., Rosa, R.L., Bressan, G.: Quality metric to assess video streaming service over tcp considering temporal location of pauses. IEEE Trans. Consum. Electron. 58(3), 985–992 (2012)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. image process. 13(4), 600–612 (2004)
MATH Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. image process. 21(12), 4695–4708 (2012)
MathSciNet MATH Google Scholar
Sun, W., Min, X., Lu, W., Zhai, G.: A deep learning based no-reference quality assessment model for ugc videos. In Proceedings of the 30th ACM International Conference on Multimedia, pages 856–865, (2022)
Duanmu, Z., Zeng, K., Ma, K., Rehman, A., Wang, Z.: A quality-of-experience index for streaming video. IEEE J. Sel. Top. Sign. Process. 11(1), 154–166 (2016)
MATH Google Scholar
Bampis, C.G., Bovik, A.C.: Learning to predict streaming video qoe: Distortions, rebuffering and memory. arXiv preprint arXiv:1703.00633, (2017)
Robitza, W., Garcia, M.N., Raake, A.: A modular http adaptive streaming qoe model-candidate for itu-t p. 1203 (“p. nats”). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, (2017)
Duanmu, Z., Liu, W., Chen, D., Li, Z., Wang, Z., Wang, Y., Gao, W.: A bayesian quality-of-experience model for adaptive streaming videos. ACM Trans. Multimed. Comput. Commun. Appl. 18(3s), 1–24 (2023)
MATH Google Scholar
Chen, P., Li, L., Jinjian, W., Zhang, Y., Lin, W.: Temporal reasoning guided qoe evaluation for mobile live video broadcasting. IEEE Trans. Image Process. 30, 3279–3292 (2021)
MATH Google Scholar
Jia, Z., Min, X., Sun, W., Zhai, G.: Continuous and overall quality of experience evaluation for streaming video based on rich features exploration and dual-stage attention. IEEE Trans. Circuits Syst. Video Technol. 34, 11709–11723 (2024)
MATH Google Scholar
He, K, Zhang, X, Ren, S, Sun, J: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, (2016)
Li, D., Jiang, T., Jiang, M.: Unified quality assessment of in-the-wild videos with mixed datasets training. Int. J. Comput. Vis. 129(4), 1238–1257 (2021)
MATH Google Scholar
Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst. (2017)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, (2014)
Tran, H.T.T., Vu, T., Ngoc, N.P., Thang, T.C.: A novel quality model for http adaptive streaming. In 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pages 423–428. IEEE, (2016)
Asan, A., Robitza, W., Mkwawa, I., Sun, L., Ifeachor, E., Raake, A.: Impact of video resolution changes on qoe for adaptive video streaming. In 2017 IEEE international conference on multimedia and expo (ICME), pages 499–504. IEEE, (2017)
Bampis, C.G, Li, Z., Katsavounidis, I., Huang, T.Y., Ekanadham, C., Bovik, A.C.: Towards perceptually optimized end-to-end adaptive video streaming. arXiv preprint arXiv:1808.03898, (2018)
Duanmu, Z., Rehman, A., Wang, Z.: A quality-of-experience database for adaptive video streaming. IEEE Trans. Broadcast. 64(2), 474–487 (2018)
MATH Google Scholar
Duanmu, Z., Liu, W., Li, Z., Chen, D., Wang, Z., Wang, Y., Gao, W.: Assessing the quality-of-experience of adaptive bitrate video streaming. arXiv preprint arXiv:2008.08804, (2020)
Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image process. 15(11), 3440–3451 (2006)
Google Scholar
Liu, X., Dobrian, F., Milner, H., Jiang, J., Sekar, V., Stoica, I., Zhang, H.: A case for a coordinated internet video control plane. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pages 359–370, (2012)
Xue, J., Zhang, D.Q., Yu, H., Chen, C.W.: Assessing quality of experience for adaptive http video streaming. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6. IEEE, (2014)
Yin, X., Jindal, A., Sekar, V., Sinopoli, B.: A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325–338, (2015)
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind’’ image quality analyzer. IEEE Signal process. lett. 20(3), 209–212 (2012)
MATH Google Scholar
Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2015)
MathSciNet MATH Google Scholar
Zhang, Z., Wu, W., Sun, W., Tu, D., Lu, W., Min, X., Chen, Y., Zhai, G.: Md-vqa: Multi-dimensional quality assessment for ugc live videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1746–1755, (2023)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, (2009)
Kinga, D., Adam, Jimmy Ba, et al.: A method for stochastic optimization. In International conference on learning representations (ICLR), volume 5, page 6. San Diego, California;, (2015)
Zhengzhong, Tu., Xiangxu, Yu., Wang, Yilin, Birkbeck, Neil, Adsumilli, Balu, Bovik, Alan C.: Rapique: rapid and accurate video quality prediction of user generated content. IEEE Open J. Signal Process. 2, 425–440 (2021)
Google Scholar

Download references

Funding

This work was supported in part by the Science and Technology Foundation of Guizhou Province (Grant No. QKHJCZK[2024]063), and in part by National Natural Science Foundation of China (Grant No. 62266011).

Author information

Authors and Affiliations

State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, Guizhou, China
Jing Huang, Guangqian Kong, Xun Duan & Huiyun Long

Authors

Jing Huang
View author publications
You can also search for this author inPubMed Google Scholar
Guangqian Kong
View author publications
You can also search for this author inPubMed Google Scholar
Xun Duan
View author publications
You can also search for this author inPubMed Google Scholar
Huiyun Long
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

J.H., G.K., X.D., and H.L. contributed to the conceptualization and design of the study. J.H. conducted the experiments and collected the data. G.K., J.H., X.D. and H.L. analyzed the data and performed statistical analysis. J.H. and G.K. wrote the main manuscript text. J.H. and X.D. prepared all figures and tables in the manuscript. All authors reviewed and edited the manuscript. This author contributions statement has been approved by all authors.

Corresponding author

Correspondence to Guangqian Kong.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A parameter ablation

We conducted some simple experiments to obtain the optimal parameters of the MLFP. First, we compared AMPM to traditional methods such as LSTM and GRU, which are commonly used to learn long-term dependencies. AMPM was replaced by the LSTM and GRU structures, while other aspects of the methodology remained consistent. To ensure fairness in the evaluation, all models were configured with two layers to produce a score through average pooling. As Table 7 shows, AMPM outperforms LSTM and GRU in predicting the overall QoE. In contrast to LSTM and GRU, which treat all time steps equally, attention methods selectively focusing on important segments help to improve the performance of quality of experience assessment.

Table 7 Memory method and pooling ablation

Full size table

Second, we compared three pooling methods: "last_one", subjective inspiration pooling (SIP) [24] based on human memory effects, and simple averaging ("mean"). Table 7 demonstrated the superior performance of the mean pooling method. Despite not explicitly addressing human memory effects, the attention mechanism implicitly captures relevant features, improving the performance of the mean pooling method.

Third, we investigated the impact of varying the number of attention heads (1, 2, 4, 8, and 16) on method performance. Figure 7 shows that the method achieves optimal performance with 2 attention heads.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, J., Kong, G., Duan, X. et al. Adaptive video streaming quality of experience assessment based on multi-layer feature perception. SIViP 19, 335 (2025). https://doi.org/10.1007/s11760-025-03906-1

Download citation

Received: 09 October 2024
Revised: 03 November 2024
Accepted: 30 January 2025
Published: 24 February 2025
DOI: https://doi.org/10.1007/s11760-025-03906-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive video streaming quality of experience assessment based on multi-layer feature perception

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quality of Experience Evaluation Model with No-Reference VMAF Metric and Deep Spatio-temporal Features of Video

Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network

Quality Assessment for Networked Video Streaming Based on Deep Learning

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A parameter ablation

Appendix A parameter ablation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now