Study on no-reference video quality assessment method incorporating dual deep learning networks

Li, Junfeng; Li, Xiao

doi:10.1007/s11042-022-13383-0

Study on no-reference video quality assessment method incorporating dual deep learning networks

Track 1: General Multimedia Topics
Published: 30 June 2022

Volume 82, pages 3081–3100, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

377 Accesses
2 Citations
Explore all metrics

Abstract

The quality assessment of user-generated content (UGC) videos is a challenge. Unlike synthetic videos, these videos are then susceptible to various distortions caused by the external environment during the generation process. This paper proposes a video quality assessment method (VQA) incorporating a dual-depth network architecture. First, the diversity of video acquisition information is ensured by global average pooling and global standard deviation pooling under the InceptionV3 network and ResNet50 network, and video frame quality scores are obtained under bidirectional GRU networks. Second, in the spatial-temporal domain, a temporal memory block is constructed by exploiting human temporal memory and content-dependent effects to obtain components of video quality. Meanwhile, a Gaussian distribution is also added to the spatial domain to reduce the effect of content variation. Finally, extensive experiments are conducted using the KoNViD-1 k and LIVEVQC databases. The experimental results show that the metrics Spearman’s rank-order correlation (SROCC) and Pearson’s linear correlation coefficient (PLCC) are 0.7786 and 0.7759 in the overall performance,which 2.87% and 0.52% higher than Tang, respectively. This verifies the validity of the model. In addition, the cross-validation experiments show that the present model also has strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfake: An Overview

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

All data used in the experiments are from the public database. The datasets generated during the current study are available from the corresponding author on reasonable request.

Code availability

The code generated during the current study are available from the corresponding author on reasonable request.

References

Ahn S, Lee S (2018) Deep blind video quality assessment based on temporal human perception. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 619–623
Chapter Google Scholar
Bampis CG, Li Z, Bovik AC (2017) Continuous prediction of streaming video QoE using dynamic networks. IEEE Signal Process Lett 24(7):1083–1087
Article Google Scholar
Bampis CG, Li Z, Moorthy AK, Katsavounidis I, Aaron A, Bovik AC (2017) Study of temporal effects on subjective video quality of experience. IEEE Trans Image Process 26(11):5217–5231
Article MathSciNet MATH Google Scholar
Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337
Article Google Scholar
Bampis CG, Li Z, Katsavounidis I, Bovik AC (2018) Recurrent and dynamic models for predicting streaming video quality of experience. IEEE Trans Image Process 27(7):3316–3331
Article MathSciNet MATH Google Scholar
Bampis CG, Li Z, Bovik AC (2018) Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Trans Circ Syst Video Technol 29(8):2256–2270
Article Google Scholar
Chen B, Zhu L, Li G, Lu F, Fan H, Wang S (2021) Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment. IEEE Trans Circ Syst Video Technol 32:1903–1916
Article Google Scholar
Chikkerur S, Sundaram V, Reisslein M, Karam LJ (2011) Objective video quality assessment methods: a classification, review, and performance comparison. IEEE Trans Broadcast 57(2):165–182
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Dendi SVR, Channappayya SS (2020) No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans Image Process 29:5612–5624
Article MATH Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Chapter Google Scholar
Ebenezer JP, Shang Z, Wu Y, Wei H, Sethuraman S, Bovik AC (2021) ChipQA: no-reference video quality prediction via space-time chips. IEEE Trans Image Process 30:8059–8074
Article Google Scholar
Fan Q, Luo W, Xia Y, Li G, He D (2019) Metrics and methods of video quality assessment: a brief review. Multimed Tools Appl 78(22):31019–31033
Article Google Scholar
Fu H, Pan D, Shi P (2021) Full-reference Video quality assessment based on spatiotemporal visual sensitivity. In: 2021 international conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 305–309
Chapter Google Scholar
Ghadiyaram D, Bovik AC (2017) Perceptual quality prediction on authentically distorted images using a bag of features approach. J Vis 17(1):32–32
Article Google Scholar
Götz-Hahn F, Hosu V, Lin H, Saupe D (2021) KonVid-150k: a dataset for no-reference video quality assessment of videos in-the-wild. IEEE Access 9:72139–72160
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, … Saupe D (2017) The Konstanz natural video database (KoNViD-1k). In: 2017 ninth international conference on quality of multimedia experience (QoMEX). IEEE, pp 1–6
Google Scholar
Kim W, Kim J, Ahn S, Kim J, Lee S (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219–234
Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Trans Image Process 28(12):5923–5938
Article MathSciNet MATH Google Scholar
Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped HDR pictures. IEEE Trans Image Process 26(6):2957–2971
Article MathSciNet MATH Google Scholar
Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2016) Toward a practical perceptual video quality metric. The Netflix tech blog, 6(2). http://techblog.netflix.com/2016/06/toward-practical-perceptualvideo.html
Li X, Guo Q, Lu X (2016) Spatiotemporal statistics for video quality assessment. IEEE Trans Image Process 25(7):3329–3342
Article MathSciNet MATH Google Scholar
Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM international conference on multimedia, pp 2351–2359
Chapter Google Scholar
Li D, Jiang T, Jiang M (2021) Unified quality assessment of in-the-wild videos with mixed datasets training. Int J Comput Vis 129(4):1238–1257
Article Google Scholar
Li MW, Xu DY, Geng J, Hong WC (2022) A ship motion forecasting approach based on empirical mode decomposition method hybrid deep learning network and quantum butterfly optimization algorithm. Nonlinear Dyn:1–21
Li B, Zhang W, Tian M, Zhai G, Wang X (2022) Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Trans Circ Syst Video Technol:1
Liu Y, Wu J, Li A, Li L, Dong W, Shi G, Lin W (2021) Video quality assessment with serial dependence modeling. IEEE Trans Multimedia:1
Manasa K, Channappayya SS (2016) An optical flow-based full reference video quality assessment algorithm. IEEE Trans Image Process 25(6):2480–2492
Article MathSciNet MATH Google Scholar
Min X, Zhai G, Zhou J, Farias MC, Bovik AC (2020) Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans Image Process 29:6054–6068
Article MATH Google Scholar
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Article MathSciNet MATH Google Scholar
Mittal A, Saad MA, Bovik AC (2015) A completely blind video integrity oracle. IEEE Trans Image Process 25(1):289–300
Article MathSciNet MATH Google Scholar
Pandremmenou K, Shahid M, Kondi LP, Lövström B (2015) A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human vision and electronic imaging XX, vol 9394. SPIE, pp 486–497
Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lerer A (2017) Automatic differentiation in pytorch
Saad M, Bovik AC, Charrier C (2013) Blind prediction of natural video quality and h. 264 applications. In: Seventh international workshop on video processing and quality metrics for consumer electronics (VQPM), pp 47–51
Google Scholar
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365
Article MathSciNet MATH Google Scholar
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Article MathSciNet MATH Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sinno Z, Bovik AC (2018) Large-scale study of perceptual video quality. IEEE Trans Image Process 28(2):612–627
Article MathSciNet MATH Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Google Scholar
Tang J, Dong Y, Xie R, Gu X, Song L, Li L, Zhou B (2020) Deep blind Video quality assessment for user generated videos. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 156–159
Chapter Google Scholar
Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) UGC-VQA: benchmarking blind video quality assessment for user generated content. IEEE Trans Image Process 30:4449–4464
Article Google Scholar
Video VOOMO (2000) Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, PHASE II© 2000 VQEG
Wainwright MJ, Simoncelli E (1999) Scale mixtures of Gaussians and the statistics of natural images. Adv Neural Inf Proces Syst 12
Xu J, Ye P, Liu Y, Doermann D (2014) No-reference video quality assessment via feature learning. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 491–495
Chapter Google Scholar
Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans Image Process 23(11):4850–4862
Article MathSciNet MATH Google Scholar
Yi F, Chen M, Sun W, Min X, Tian Y, Zhai G (2021) Attention based network for no-reference UGC Video quality assessment. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 1414–1418
Chapter Google Scholar
Ying Z, Mandal M, Ghadiyaram D, Bovik A (2021) Patch-VQ: 'Patching Up' the video quality problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14019–14029
Google Scholar
Zhou W, Chen Z (2020) Deep local and global spatiotemporal feature aggregation for blind video quality assessment. In: 2020 IEEE international conference on visual communications and image processing (VCIP). IEEE, pp 338–341
Chapter Google Scholar

Download references

Funding

This work was supported by National Natural Science Foundation of China (Grant No: 61374022) and by Zhejiang Provincial Basic Public Welfare Research Project of China (Grant No: LGF22F030001 and LGG19F03001).

Author information

Authors and Affiliations

Faculty of Mechanical Engineering & Automation, Zhejiang Sci-Tech University, Hangzhou, China
Junfeng Li & Xiao Li

Authors

Junfeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junfeng Li.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Li, X. Study on no-reference video quality assessment method incorporating dual deep learning networks. Multimed Tools Appl 82, 3081–3100 (2023). https://doi.org/10.1007/s11042-022-13383-0

Download citation

Received: 03 March 2022
Revised: 10 May 2022
Accepted: 17 June 2022
Published: 30 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13383-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study on no-reference video quality assessment method incorporating dual deep learning networks

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Video summarization using deep learning techniques: a detailed analysis and investigation

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study on no-reference video quality assessment method incorporating dual deep learning networks

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Video summarization using deep learning techniques: a detailed analysis and investigation

Learning a Deep Convolutional Network for Image Super-Resolution

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation