Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion

Wen, Mingyang; Liu, Lixiong; Sang, Qingbing; Zhang, Yongmei

doi:10.1007/s11042-023-14652-2

Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion

Published: 15 February 2023

Volume 82, pages 28067–28086, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mingyang Wen¹,
Lixiong Liu ORCID: orcid.org/0000-0001-8357-1113¹,
Qingbing Sang² &
…
Yongmei Zhang³

972 Accesses
1 Altmetric
Explore all metrics

Abstract

Many popular video quality assessment (VQA) methods usually build models by simulating the process of human visual perception and adopt a simple regression strategy to predict video quality scores. However, these methods either hardly pay enough attention to regression processing prone to misprediction, or fail to accurately understand video content containing changes of movement or sudden movements. To remedy these, we propose a full reference (FR) video quality assessment model that integrates multi-task learning regression and analysis of spatio-temporal features to conduct video quality predictions. Firstly, the model arranges each frame of the reference and distorted videos into patches and calculates their entropy values to guide the selection of frame patches. A 2D Siamese network is then applied on the selected patches to learn spatial information. To more effectively capture temporal distortions, a multi-frame difference map is computed on each distorted video. The computed multi-frame difference maps are also partitioned into patches to select half of the ones with highest entropy values as temporal features. Additionally, we incorporate the temporal masking effect to optimize the spatial error and temporal features and adopt 3D convolutional neural network (CNN) in spatio-temporal feature fusion. Following recent evidence towards quality classification and quality regression, a constrained multi-task learning regression model is designed to aggregate the quality score, using quality classification subtask to contrain and optimize quality regression main task. Finally, the video quality score is predicted through the regression branch. We have evaluated our algorithm on five public VQA databases. The experimental results have revealed that the proposed algorithm can achieve superior performance as compared with the existing VQA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Learning a Deep Convolutional Network for Image Super-Resolution

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Data availability

Not applicable.

Code availability

Not applicable.

References

Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337
Article Google Scholar
Bosse S, Maniry D, Muller KR, Wiegand T, Samek W (2018) Deep neural networks or no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219
Article MATH MathSciNet Google Scholar
Bromley J, Guyon I, LeCun Y et al (1993) Signature verification using a Siamese time delay neural network. Adv Neural Inf Proces Syst 6:737–744
Google Scholar
Chen P, Li L, Ma L et al (2020) Rirnet: recurrent-in-recurrent network for video quality assessment. In: Proceedings of the 28th ACM international conference on multimedia, pp 834-842
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
Desai NP, Baluch MF, Makrariya A et al (2022) Image processing model with deep learning approach for fish species classification. Turk J Comput Math Educ 13(1):85–99
Google Scholar
Ghadiyaram D, Pan J, Bovik AC (2019) A subjective and objective study of stalling events in mobile streaming videos. IEEE Trans Circ Syst Video Technol 29(1):183–197
Article Google Scholar
Guan T, Li C, Gu K, Liu H, Zheng Y, Wu XJ (2022) Visibility and distortion measurement for no-reference Dehazed image quality assessment via complex Contourlet transform. IEEE Trans Multimedia:1. https://doi.org/10.1109/TMM.2022.3168438
Guo J, Luo Y (2021) No-reference omnidirectional video quality assessment based on generative adversarial networks. Multimed Tools Appl 80(18):27531–27552
Article Google Scholar
Hosu V, Lin H, Sziranyi T, Saupe D (2020) KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans Image Process 29:4041–4056
Article MATH Google Scholar
Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F (2018) No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent 50:247–262
Article Google Scholar
Kim W, Kim J, Ahn S et al (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219-234
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Processings of international conference on learning representation, pp 1–41
Laboratory of Computational Perception & Image Quality, Oklahoma State University: CSIQ video database (2013). http://vision.okstate.edu/csiq/
Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1):011006
Article Google Scholar
Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2017) Toward a practical perceptual video quality metric. http://techblog.netflix.com/2016/06/towardpractical-perceptual-video.html
Li T, Min X, Zhao H, Zhai G, Xu Y, Zhang W (2021) Subjective and objective quality assessment of compressed screen content videos. IEEE Trans Broadcast 67(2):438–449
Article Google Scholar
Li C, Guan T, Zheng Y, Zhong X, Wu X, Bovik A (2021) Blind image quality assessment in the contourlet domain. Signal Process Image Commun 91:116064
Article Google Scholar
Li F, Zhang Y, Cosman PC (2021) MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181
Liu L, Liu B, Huang H, Bovik AC (2014) No-reference image quality assessment based on spatial and spectral entropies. Signal Process Image Commun 29(8):856–863
Article Google Scholar
Liu X, Van De Weijer J, Bagdanov AD (2017) Rankiqa: learning from rankings for no-reference image quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1040–1049
Liu W, Duanmu Z, Wang Z (2018) End-to-end blind quality assessment of compressed videos using deep neural networks. In: ACM Multimedia, pp 546–554
Liu L, Wang T, Huang H (2019) Pre-attention and spatial dependency driven no-reference image quality assessment. IEEE Trans Multimedia 21(9):2305–2318
Article Google Scholar
Liu L, Wang T, Huang H, Bovik AC (2020) Video quality assessment using space-time slice mappings. Signal Process Image Commun 82:115749
Article Google Scholar
Liu S, Wang S, Liu X, Gandomi AH, Daneshmand M, Muhammad K, de Albuquerque VHC (2021) A multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia 23:2188–2198
Article Google Scholar
Liu S, Wang S, Liu X, Lin CT, Lv Z (2021) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102
Article Google Scholar
Liu S, Wang S, Liu X et al (2022) Human Inertial thinking strategy: a novel fuzzy reasoning mechanism for IoT-assisted visual monitoring. IEEE Internet Things J. 10.1109/JIOT.2022.3142115
Loh WT, Bong DBL (2018) An error-based video quality assessment method with temporal information. Multimed Tools Appl 77(23):30791–30814
Article Google Scholar
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: 7th International joint conference on artificial intelligence, pp 674–679
Luo W, Li Y, Urtasun R et al (2016) Understanding the effective receptive field in deep convolutional neural networks. In: 30th annual conference on neural information processing systems, pp 4905–4913
Moorthy AK, Choi LK, Bovik AC, de Veciana G (2012) Video quality assessment on Mobile devices: subjective, behavioral and objective studies. IEEE J Sel Top Signal Process 6(6):652–671
Article Google Scholar
Murdock BB Jr (1962) The serial position effect of free recall. J Exp Psychol 64(5):482–488
Article Google Scholar
Ponomarenko N, Ieremeiev O, Lukin V et al (2013) Color image database TID2013: peculiarities and preliminary results. In: European workshop on visual information processing (EUVIP), IEEE, pp 106-111
Rimac-Drlje S, Vranješ M, Žagar D (2010) Foveated mean squared error—a novel video quality metric. Multimed Tools Appl 49(3):425–445
Article Google Scholar
Rimac-Drlje S, Vranješ M, Žagar D (2013) Review of objective video quality metrics and performance comparison using different databases. Signal Process Image Commun 28(1):1–19
Article Google Scholar
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365
Article MATH MathSciNet Google Scholar
Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Article MATH MathSciNet Google Scholar
Seshadrinathan K, Bovik AC (2011) Temporal hysteresis model of time varying subjective video quality. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1153-1156
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Article MATH MathSciNet Google Scholar
Soundararajan R, Bovik AC (2013) Video quality assessment by reduced reference spatio-temporal entropic dierencing. IEEE Trans Circ Syst Video Technol 23(4):684–694
Article Google Scholar
Suchow JW, Alvarez GA (2011) Motion silences awareness of visual change. Curr Biol 21(2):140–143
Article Google Scholar
Vu PV, Chandler DM (2014) ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J Electron Imaging 23(1):013016
Article Google Scholar
Vu PV, Vu CT, Chandler DM (2011) A spatiotemporal Most-apparent-distortion model for video quality assessment. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2505-2508
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wang C, Su L, Zhang W (2018) COME for no-reference video quality assessment. In: 2018 IEEE conference on multimedia information processing and retrieval (MIPR), IEEE, pp 232–237
Wang G, Wang Z, Gu K et al (2021) Reference-free DIBR-synthesized video quality metric in spatial and temporal domains. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181
Westerink JH, Teunissen K (1995) Perceived sharpness in complex moving images. Displays 16(2):89–97
Article Google Scholar
Wu J, Liu Y, Dong W, Shi G, Lin W (2019) Quality assessment for video with degradation along salient trajectories. IEEE Trans Multimedia 21(11):2738–2749
Article Google Scholar
Xu M, Chen J, Wang H et al (2020) C3DVQA: full-reference video quality assessment with 3D convolutional neural network. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4447–4451
You J, Korhonen J (2021) Transformer for image quality assessment. In: 2021 IEEE international conference on image processing (ICIP), IEEE, pp 1389-1393
Zhang Y, Gao X, He L, Lu W, He R (2019) Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans Circ Syst Video Technol 29(8):2244–2255
Article Google Scholar
Zhang Y, Gao X, He L, Lu W, He R (2020) Objective video quality assessment combining transfer learning with CNN. IEEE Trans Neural Netw Learn Syst 31(8):2716–2730
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61672095, 61371143 and National Key Research and Development Program Project of China under Grant No. 2020YFC0811004.

Author information

Authors and Affiliations

Beijing Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Mingyang Wen & Lixiong Liu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Qingbing Sang
School of Information Science and Technology, North China University of Technology, Beijing, 100144, China
Yongmei Zhang

Authors

Mingyang Wen
View author publications
You can also search for this author in PubMed Google Scholar
Lixiong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qingbing Sang
View author publications
You can also search for this author in PubMed Google Scholar
Yongmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Not applicable.

Corresponding author

Correspondence to Lixiong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, M., Liu, L., Sang, Q. et al. Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion. Multimed Tools Appl 82, 28067–28086 (2023). https://doi.org/10.1007/s11042-023-14652-2

Download citation

Received: 15 December 2021
Revised: 20 June 2022
Accepted: 03 February 2023
Published: 15 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14652-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Learning a Deep Convolutional Network for Image Super-Resolution

Deep learning for video object segmentation: a review

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Learning a Deep Convolutional Network for Image Super-Resolution

Deep learning for video object segmentation: a review

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation