Abstract
Many popular video quality assessment (VQA) methods usually build models by simulating the process of human visual perception and adopt a simple regression strategy to predict video quality scores. However, these methods either hardly pay enough attention to regression processing prone to misprediction, or fail to accurately understand video content containing changes of movement or sudden movements. To remedy these, we propose a full reference (FR) video quality assessment model that integrates multi-task learning regression and analysis of spatio-temporal features to conduct video quality predictions. Firstly, the model arranges each frame of the reference and distorted videos into patches and calculates their entropy values to guide the selection of frame patches. A 2D Siamese network is then applied on the selected patches to learn spatial information. To more effectively capture temporal distortions, a multi-frame difference map is computed on each distorted video. The computed multi-frame difference maps are also partitioned into patches to select half of the ones with highest entropy values as temporal features. Additionally, we incorporate the temporal masking effect to optimize the spatial error and temporal features and adopt 3D convolutional neural network (CNN) in spatio-temporal feature fusion. Following recent evidence towards quality classification and quality regression, a constrained multi-task learning regression model is designed to aggregate the quality score, using quality classification subtask to contrain and optimize quality regression main task. Finally, the video quality score is predicted through the regression branch. We have evaluated our algorithm on five public VQA databases. The experimental results have revealed that the proposed algorithm can achieve superior performance as compared with the existing VQA methods.
Similar content being viewed by others
Data availability
Not applicable.
Code availability
Not applicable.
References
Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337
Bosse S, Maniry D, Muller KR, Wiegand T, Samek W (2018) Deep neural networks or no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219
Bromley J, Guyon I, LeCun Y et al (1993) Signature verification using a Siamese time delay neural network. Adv Neural Inf Proces Syst 6:737–744
Chen P, Li L, Ma L et al (2020) Rirnet: recurrent-in-recurrent network for video quality assessment. In: Proceedings of the 28th ACM international conference on multimedia, pp 834-842
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
Desai NP, Baluch MF, Makrariya A et al (2022) Image processing model with deep learning approach for fish species classification. Turk J Comput Math Educ 13(1):85–99
Ghadiyaram D, Pan J, Bovik AC (2019) A subjective and objective study of stalling events in mobile streaming videos. IEEE Trans Circ Syst Video Technol 29(1):183–197
Guan T, Li C, Gu K, Liu H, Zheng Y, Wu XJ (2022) Visibility and distortion measurement for no-reference Dehazed image quality assessment via complex Contourlet transform. IEEE Trans Multimedia:1. https://doi.org/10.1109/TMM.2022.3168438
Guo J, Luo Y (2021) No-reference omnidirectional video quality assessment based on generative adversarial networks. Multimed Tools Appl 80(18):27531–27552
Hosu V, Lin H, Sziranyi T, Saupe D (2020) KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans Image Process 29:4041–4056
Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F (2018) No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent 50:247–262
Kim W, Kim J, Ahn S et al (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219-234
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Processings of international conference on learning representation, pp 1–41
Laboratory of Computational Perception & Image Quality, Oklahoma State University: CSIQ video database (2013). http://vision.okstate.edu/csiq/
Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1):011006
Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2017) Toward a practical perceptual video quality metric. http://techblog.netflix.com/2016/06/towardpractical-perceptual-video.html
Li T, Min X, Zhao H, Zhai G, Xu Y, Zhang W (2021) Subjective and objective quality assessment of compressed screen content videos. IEEE Trans Broadcast 67(2):438–449
Li C, Guan T, Zheng Y, Zhong X, Wu X, Bovik A (2021) Blind image quality assessment in the contourlet domain. Signal Process Image Commun 91:116064
Li F, Zhang Y, Cosman PC (2021) MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181
Liu L, Liu B, Huang H, Bovik AC (2014) No-reference image quality assessment based on spatial and spectral entropies. Signal Process Image Commun 29(8):856–863
Liu X, Van De Weijer J, Bagdanov AD (2017) Rankiqa: learning from rankings for no-reference image quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1040–1049
Liu W, Duanmu Z, Wang Z (2018) End-to-end blind quality assessment of compressed videos using deep neural networks. In: ACM Multimedia, pp 546–554
Liu L, Wang T, Huang H (2019) Pre-attention and spatial dependency driven no-reference image quality assessment. IEEE Trans Multimedia 21(9):2305–2318
Liu L, Wang T, Huang H, Bovik AC (2020) Video quality assessment using space-time slice mappings. Signal Process Image Commun 82:115749
Liu S, Wang S, Liu X, Gandomi AH, Daneshmand M, Muhammad K, de Albuquerque VHC (2021) A multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia 23:2188–2198
Liu S, Wang S, Liu X, Lin CT, Lv Z (2021) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102
Liu S, Wang S, Liu X et al (2022) Human Inertial thinking strategy: a novel fuzzy reasoning mechanism for IoT-assisted visual monitoring. IEEE Internet Things J. 10.1109/JIOT.2022.3142115
Loh WT, Bong DBL (2018) An error-based video quality assessment method with temporal information. Multimed Tools Appl 77(23):30791–30814
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: 7th International joint conference on artificial intelligence, pp 674–679
Luo W, Li Y, Urtasun R et al (2016) Understanding the effective receptive field in deep convolutional neural networks. In: 30th annual conference on neural information processing systems, pp 4905–4913
Moorthy AK, Choi LK, Bovik AC, de Veciana G (2012) Video quality assessment on Mobile devices: subjective, behavioral and objective studies. IEEE J Sel Top Signal Process 6(6):652–671
Murdock BB Jr (1962) The serial position effect of free recall. J Exp Psychol 64(5):482–488
Ponomarenko N, Ieremeiev O, Lukin V et al (2013) Color image database TID2013: peculiarities and preliminary results. In: European workshop on visual information processing (EUVIP), IEEE, pp 106-111
Rimac-Drlje S, Vranješ M, Žagar D (2010) Foveated mean squared error—a novel video quality metric. Multimed Tools Appl 49(3):425–445
Rimac-Drlje S, Vranješ M, Žagar D (2013) Review of objective video quality metrics and performance comparison using different databases. Signal Process Image Commun 28(1):1–19
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365
Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Seshadrinathan K, Bovik AC (2011) Temporal hysteresis model of time varying subjective video quality. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1153-1156
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Soundararajan R, Bovik AC (2013) Video quality assessment by reduced reference spatio-temporal entropic dierencing. IEEE Trans Circ Syst Video Technol 23(4):684–694
Suchow JW, Alvarez GA (2011) Motion silences awareness of visual change. Curr Biol 21(2):140–143
Vu PV, Chandler DM (2014) ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J Electron Imaging 23(1):013016
Vu PV, Vu CT, Chandler DM (2011) A spatiotemporal Most-apparent-distortion model for video quality assessment. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2505-2508
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Wang C, Su L, Zhang W (2018) COME for no-reference video quality assessment. In: 2018 IEEE conference on multimedia information processing and retrieval (MIPR), IEEE, pp 232–237
Wang G, Wang Z, Gu K et al (2021) Reference-free DIBR-synthesized video quality metric in spatial and temporal domains. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181
Westerink JH, Teunissen K (1995) Perceived sharpness in complex moving images. Displays 16(2):89–97
Wu J, Liu Y, Dong W, Shi G, Lin W (2019) Quality assessment for video with degradation along salient trajectories. IEEE Trans Multimedia 21(11):2738–2749
Xu M, Chen J, Wang H et al (2020) C3DVQA: full-reference video quality assessment with 3D convolutional neural network. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4447–4451
You J, Korhonen J (2021) Transformer for image quality assessment. In: 2021 IEEE international conference on image processing (ICIP), IEEE, pp 1389-1393
Zhang Y, Gao X, He L, Lu W, He R (2019) Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans Circ Syst Video Technol 29(8):2244–2255
Zhang Y, Gao X, He L, Lu W, He R (2020) Objective video quality assessment combining transfer learning with CNN. IEEE Trans Neural Netw Learn Syst 31(8):2716–2730
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61672095, 61371143 and National Key Research and Development Program Project of China under Grant No. 2020YFC0811004.
Author information
Authors and Affiliations
Contributions
Not applicable.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, M., Liu, L., Sang, Q. et al. Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion. Multimed Tools Appl 82, 28067–28086 (2023). https://doi.org/10.1007/s11042-023-14652-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14652-2