Abstract
The quality assessment of user-generated content (UGC) videos is a challenge. Unlike synthetic videos, these videos are then susceptible to various distortions caused by the external environment during the generation process. This paper proposes a video quality assessment method (VQA) incorporating a dual-depth network architecture. First, the diversity of video acquisition information is ensured by global average pooling and global standard deviation pooling under the InceptionV3 network and ResNet50 network, and video frame quality scores are obtained under bidirectional GRU networks. Second, in the spatial-temporal domain, a temporal memory block is constructed by exploiting human temporal memory and content-dependent effects to obtain components of video quality. Meanwhile, a Gaussian distribution is also added to the spatial domain to reduce the effect of content variation. Finally, extensive experiments are conducted using the KoNViD-1 k and LIVEVQC databases. The experimental results show that the metrics Spearman’s rank-order correlation (SROCC) and Pearson’s linear correlation coefficient (PLCC) are 0.7786 and 0.7759 in the overall performance,which 2.87% and 0.52% higher than Tang, respectively. This verifies the validity of the model. In addition, the cross-validation experiments show that the present model also has strong generalization ability.
Similar content being viewed by others
Data availability
All data used in the experiments are from the public database. The datasets generated during the current study are available from the corresponding author on reasonable request.
Code availability
The code generated during the current study are available from the corresponding author on reasonable request.
References
Ahn S, Lee S (2018) Deep blind video quality assessment based on temporal human perception. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 619–623
Bampis CG, Li Z, Bovik AC (2017) Continuous prediction of streaming video QoE using dynamic networks. IEEE Signal Process Lett 24(7):1083–1087
Bampis CG, Li Z, Moorthy AK, Katsavounidis I, Aaron A, Bovik AC (2017) Study of temporal effects on subjective video quality of experience. IEEE Trans Image Process 26(11):5217–5231
Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337
Bampis CG, Li Z, Katsavounidis I, Bovik AC (2018) Recurrent and dynamic models for predicting streaming video quality of experience. IEEE Trans Image Process 27(7):3316–3331
Bampis CG, Li Z, Bovik AC (2018) Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Trans Circ Syst Video Technol 29(8):2256–2270
Chen B, Zhu L, Li G, Lu F, Fan H, Wang S (2021) Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment. IEEE Trans Circ Syst Video Technol 32:1903–1916
Chikkerur S, Sundaram V, Reisslein M, Karam LJ (2011) Objective video quality assessment methods: a classification, review, and performance comparison. IEEE Trans Broadcast 57(2):165–182
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Dendi SVR, Channappayya SS (2020) No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans Image Process 29:5612–5624
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Ebenezer JP, Shang Z, Wu Y, Wei H, Sethuraman S, Bovik AC (2021) ChipQA: no-reference video quality prediction via space-time chips. IEEE Trans Image Process 30:8059–8074
Fan Q, Luo W, Xia Y, Li G, He D (2019) Metrics and methods of video quality assessment: a brief review. Multimed Tools Appl 78(22):31019–31033
Fu H, Pan D, Shi P (2021) Full-reference Video quality assessment based on spatiotemporal visual sensitivity. In: 2021 international conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 305–309
Ghadiyaram D, Bovik AC (2017) Perceptual quality prediction on authentically distorted images using a bag of features approach. J Vis 17(1):32–32
Götz-Hahn F, Hosu V, Lin H, Saupe D (2021) KonVid-150k: a dataset for no-reference video quality assessment of videos in-the-wild. IEEE Access 9:72139–72160
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, … Saupe D (2017) The Konstanz natural video database (KoNViD-1k). In: 2017 ninth international conference on quality of multimedia experience (QoMEX). IEEE, pp 1–6
Kim W, Kim J, Ahn S, Kim J, Lee S (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219–234
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Trans Image Process 28(12):5923–5938
Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped HDR pictures. IEEE Trans Image Process 26(6):2957–2971
Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2016) Toward a practical perceptual video quality metric. The Netflix tech blog, 6(2). http://techblog.netflix.com/2016/06/toward-practical-perceptualvideo.html
Li X, Guo Q, Lu X (2016) Spatiotemporal statistics for video quality assessment. IEEE Trans Image Process 25(7):3329–3342
Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM international conference on multimedia, pp 2351–2359
Li D, Jiang T, Jiang M (2021) Unified quality assessment of in-the-wild videos with mixed datasets training. Int J Comput Vis 129(4):1238–1257
Li MW, Xu DY, Geng J, Hong WC (2022) A ship motion forecasting approach based on empirical mode decomposition method hybrid deep learning network and quantum butterfly optimization algorithm. Nonlinear Dyn:1–21
Li B, Zhang W, Tian M, Zhai G, Wang X (2022) Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Trans Circ Syst Video Technol:1
Liu Y, Wu J, Li A, Li L, Dong W, Shi G, Lin W (2021) Video quality assessment with serial dependence modeling. IEEE Trans Multimedia:1
Manasa K, Channappayya SS (2016) An optical flow-based full reference video quality assessment algorithm. IEEE Trans Image Process 25(6):2480–2492
Min X, Zhai G, Zhou J, Farias MC, Bovik AC (2020) Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans Image Process 29:6054–6068
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Mittal A, Saad MA, Bovik AC (2015) A completely blind video integrity oracle. IEEE Trans Image Process 25(1):289–300
Pandremmenou K, Shahid M, Kondi LP, Lövström B (2015) A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human vision and electronic imaging XX, vol 9394. SPIE, pp 486–497
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lerer A (2017) Automatic differentiation in pytorch
Saad M, Bovik AC, Charrier C (2013) Blind prediction of natural video quality and h. 264 applications. In: Seventh international workshop on video processing and quality metrics for consumer electronics (VQPM), pp 47–51
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sinno Z, Bovik AC (2018) Large-scale study of perceptual video quality. IEEE Trans Image Process 28(2):612–627
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Tang J, Dong Y, Xie R, Gu X, Song L, Li L, Zhou B (2020) Deep blind Video quality assessment for user generated videos. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 156–159
Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) UGC-VQA: benchmarking blind video quality assessment for user generated content. IEEE Trans Image Process 30:4449–4464
Video VOOMO (2000) Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, PHASE II© 2000 VQEG
Wainwright MJ, Simoncelli E (1999) Scale mixtures of Gaussians and the statistics of natural images. Adv Neural Inf Proces Syst 12
Xu J, Ye P, Liu Y, Doermann D (2014) No-reference video quality assessment via feature learning. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 491–495
Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans Image Process 23(11):4850–4862
Yi F, Chen M, Sun W, Min X, Tian Y, Zhai G (2021) Attention based network for no-reference UGC Video quality assessment. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 1414–1418
Ying Z, Mandal M, Ghadiyaram D, Bovik A (2021) Patch-VQ: 'Patching Up' the video quality problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14019–14029
Zhou W, Chen Z (2020) Deep local and global spatiotemporal feature aggregation for blind video quality assessment. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 338–341
Funding
This work was supported by National Natural Science Foundation of China (Grant No: 61374022) and by Zhejiang Provincial Basic Public Welfare Research Project of China (Grant No: LGF22F030001 and LGG19F03001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Li, X. Study on no-reference video quality assessment method incorporating dual deep learning networks. Multimed Tools Appl 82, 3081–3100 (2023). https://doi.org/10.1007/s11042-022-13383-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13383-0