Skip to main content
Log in

Study on no-reference video quality assessment method incorporating dual deep learning networks

  • Track 1: General Multimedia Topics
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The quality assessment of user-generated content (UGC) videos is a challenge. Unlike synthetic videos, these videos are then susceptible to various distortions caused by the external environment during the generation process. This paper proposes a video quality assessment method (VQA) incorporating a dual-depth network architecture. First, the diversity of video acquisition information is ensured by global average pooling and global standard deviation pooling under the InceptionV3 network and ResNet50 network, and video frame quality scores are obtained under bidirectional GRU networks. Second, in the spatial-temporal domain, a temporal memory block is constructed by exploiting human temporal memory and content-dependent effects to obtain components of video quality. Meanwhile, a Gaussian distribution is also added to the spatial domain to reduce the effect of content variation. Finally, extensive experiments are conducted using the KoNViD-1 k and LIVEVQC databases. The experimental results show that the metrics Spearman’s rank-order correlation (SROCC) and Pearson’s linear correlation coefficient (PLCC) are 0.7786 and 0.7759 in the overall performance,which 2.87% and 0.52% higher than Tang, respectively. This verifies the validity of the model. In addition, the cross-validation experiments show that the present model also has strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

All data used in the experiments are from the public database. The datasets generated during the current study are available from the corresponding author on reasonable request.

Code availability

The code generated during the current study are available from the corresponding author on reasonable request.

References

  1. Ahn S, Lee S (2018) Deep blind video quality assessment based on temporal human perception. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 619–623

    Chapter  Google Scholar 

  2. Bampis CG, Li Z, Bovik AC (2017) Continuous prediction of streaming video QoE using dynamic networks. IEEE Signal Process Lett 24(7):1083–1087

    Article  Google Scholar 

  3. Bampis CG, Li Z, Moorthy AK, Katsavounidis I, Aaron A, Bovik AC (2017) Study of temporal effects on subjective video quality of experience. IEEE Trans Image Process 26(11):5217–5231

    Article  MathSciNet  MATH  Google Scholar 

  4. Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337

    Article  Google Scholar 

  5. Bampis CG, Li Z, Katsavounidis I, Bovik AC (2018) Recurrent and dynamic models for predicting streaming video quality of experience. IEEE Trans Image Process 27(7):3316–3331

    Article  MathSciNet  MATH  Google Scholar 

  6. Bampis CG, Li Z, Bovik AC (2018) Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Trans Circ Syst Video Technol 29(8):2256–2270

    Article  Google Scholar 

  7. Chen B, Zhu L, Li G, Lu F, Fan H, Wang S (2021) Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment. IEEE Trans Circ Syst Video Technol 32:1903–1916

    Article  Google Scholar 

  8. Chikkerur S, Sundaram V, Reisslein M, Karam LJ (2011) Objective video quality assessment methods: a classification, review, and performance comparison. IEEE Trans Broadcast 57(2):165–182

    Article  Google Scholar 

  9. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  10. Dendi SVR, Channappayya SS (2020) No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans Image Process 29:5612–5624

    Article  MATH  Google Scholar 

  11. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

    Chapter  Google Scholar 

  12. Ebenezer JP, Shang Z, Wu Y, Wei H, Sethuraman S, Bovik AC (2021) ChipQA: no-reference video quality prediction via space-time chips. IEEE Trans Image Process 30:8059–8074

    Article  Google Scholar 

  13. Fan Q, Luo W, Xia Y, Li G, He D (2019) Metrics and methods of video quality assessment: a brief review. Multimed Tools Appl 78(22):31019–31033

    Article  Google Scholar 

  14. Fu H, Pan D, Shi P (2021) Full-reference Video quality assessment based on spatiotemporal visual sensitivity. In: 2021 international conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 305–309

    Chapter  Google Scholar 

  15. Ghadiyaram D, Bovik AC (2017) Perceptual quality prediction on authentically distorted images using a bag of features approach. J Vis 17(1):32–32

    Article  Google Scholar 

  16. Götz-Hahn F, Hosu V, Lin H, Saupe D (2021) KonVid-150k: a dataset for no-reference video quality assessment of videos in-the-wild. IEEE Access 9:72139–72160

    Article  Google Scholar 

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

    Google Scholar 

  18. Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, … Saupe D (2017) The Konstanz natural video database (KoNViD-1k). In: 2017 ninth international conference on quality of multimedia experience (QoMEX). IEEE, pp 1–6

    Google Scholar 

  19. Kim W, Kim J, Ahn S, Kim J, Lee S (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219–234

    Google Scholar 

  20. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  21. Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Trans Image Process 28(12):5923–5938

    Article  MathSciNet  MATH  Google Scholar 

  22. Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped HDR pictures. IEEE Trans Image Process 26(6):2957–2971

    Article  MathSciNet  MATH  Google Scholar 

  23. Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2016) Toward a practical perceptual video quality metric. The Netflix tech blog, 6(2). http://techblog.netflix.com/2016/06/toward-practical-perceptualvideo.html

  24. Li X, Guo Q, Lu X (2016) Spatiotemporal statistics for video quality assessment. IEEE Trans Image Process 25(7):3329–3342

    Article  MathSciNet  MATH  Google Scholar 

  25. Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM international conference on multimedia, pp 2351–2359

    Chapter  Google Scholar 

  26. Li D, Jiang T, Jiang M (2021) Unified quality assessment of in-the-wild videos with mixed datasets training. Int J Comput Vis 129(4):1238–1257

    Article  Google Scholar 

  27. Li MW, Xu DY, Geng J, Hong WC (2022) A ship motion forecasting approach based on empirical mode decomposition method hybrid deep learning network and quantum butterfly optimization algorithm. Nonlinear Dyn:1–21

  28. Li B, Zhang W, Tian M, Zhai G, Wang X (2022) Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Trans Circ Syst Video Technol:1

  29. Liu Y, Wu J, Li A, Li L, Dong W, Shi G, Lin W (2021) Video quality assessment with serial dependence modeling. IEEE Trans Multimedia:1

  30. Manasa K, Channappayya SS (2016) An optical flow-based full reference video quality assessment algorithm. IEEE Trans Image Process 25(6):2480–2492

    Article  MathSciNet  MATH  Google Scholar 

  31. Min X, Zhai G, Zhou J, Farias MC, Bovik AC (2020) Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans Image Process 29:6054–6068

    Article  MATH  Google Scholar 

  32. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708

    Article  MathSciNet  MATH  Google Scholar 

  33. Mittal A, Saad MA, Bovik AC (2015) A completely blind video integrity oracle. IEEE Trans Image Process 25(1):289–300

    Article  MathSciNet  MATH  Google Scholar 

  34. Pandremmenou K, Shahid M, Kondi LP, Lövström B (2015) A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human vision and electronic imaging XX, vol 9394. SPIE, pp 486–497

    Google Scholar 

  35. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lerer A (2017) Automatic differentiation in pytorch

  36. Saad M, Bovik AC, Charrier C (2013) Blind prediction of natural video quality and h. 264 applications. In: Seventh international workshop on video processing and quality metrics for consumer electronics (VQPM), pp 47–51

    Google Scholar 

  37. Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365

    Article  MathSciNet  MATH  Google Scholar 

  38. Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441

    Article  MathSciNet  MATH  Google Scholar 

  39. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  40. Sinno Z, Bovik AC (2018) Large-scale study of perceptual video quality. IEEE Trans Image Process 28(2):612–627

    Article  MathSciNet  MATH  Google Scholar 

  41. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

    Google Scholar 

  42. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

    Google Scholar 

  43. Tang J, Dong Y, Xie R, Gu X, Song L, Li L, Zhou B (2020) Deep blind Video quality assessment for user generated videos. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 156–159

    Chapter  Google Scholar 

  44. Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) UGC-VQA: benchmarking blind video quality assessment for user generated content. IEEE Trans Image Process 30:4449–4464

    Article  Google Scholar 

  45. Video VOOMO (2000) Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, PHASE II© 2000 VQEG

  46. Wainwright MJ, Simoncelli E (1999) Scale mixtures of Gaussians and the statistics of natural images. Adv Neural Inf Proces Syst 12

  47. Xu J, Ye P, Liu Y, Doermann D (2014) No-reference video quality assessment via feature learning. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 491–495

    Chapter  Google Scholar 

  48. Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans Image Process 23(11):4850–4862

    Article  MathSciNet  MATH  Google Scholar 

  49. Yi F, Chen M, Sun W, Min X, Tian Y, Zhai G (2021) Attention based network for no-reference UGC Video quality assessment. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 1414–1418

    Chapter  Google Scholar 

  50. Ying Z, Mandal M, Ghadiyaram D, Bovik A (2021) Patch-VQ: 'Patching Up' the video quality problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14019–14029

    Google Scholar 

  51. Zhou W, Chen Z (2020) Deep local and global spatiotemporal feature aggregation for blind video quality assessment. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 338–341

    Chapter  Google Scholar 

Download references

Funding

This work was supported by National Natural Science Foundation of China (Grant No: 61374022) and by Zhejiang Provincial Basic Public Welfare Research Project of China (Grant No: LGF22F030001 and LGG19F03001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Li.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Li, X. Study on no-reference video quality assessment method incorporating dual deep learning networks. Multimed Tools Appl 82, 3081–3100 (2023). https://doi.org/10.1007/s11042-022-13383-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13383-0

Keywords

Navigation