Skip to main content

Advertisement

Log in

Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Many popular video quality assessment (VQA) methods usually build models by simulating the process of human visual perception and adopt a simple regression strategy to predict video quality scores. However, these methods either hardly pay enough attention to regression processing prone to misprediction, or fail to accurately understand video content containing changes of movement or sudden movements. To remedy these, we propose a full reference (FR) video quality assessment model that integrates multi-task learning regression and analysis of spatio-temporal features to conduct video quality predictions. Firstly, the model arranges each frame of the reference and distorted videos into patches and calculates their entropy values to guide the selection of frame patches. A 2D Siamese network is then applied on the selected patches to learn spatial information. To more effectively capture temporal distortions, a multi-frame difference map is computed on each distorted video. The computed multi-frame difference maps are also partitioned into patches to select half of the ones with highest entropy values as temporal features. Additionally, we incorporate the temporal masking effect to optimize the spatial error and temporal features and adopt 3D convolutional neural network (CNN) in spatio-temporal feature fusion. Following recent evidence towards quality classification and quality regression, a constrained multi-task learning regression model is designed to aggregate the quality score, using quality classification subtask to contrain and optimize quality regression main task. Finally, the video quality score is predicted through the regression branch. We have evaluated our algorithm on five public VQA databases. The experimental results have revealed that the proposed algorithm can achieve superior performance as compared with the existing VQA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

References

  1. Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) SpEED-QA: spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337

    Article  Google Scholar 

  2. Bosse S, Maniry D, Muller KR, Wiegand T, Samek W (2018) Deep neural networks or no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219

    Article  MATH  MathSciNet  Google Scholar 

  3. Bromley J, Guyon I, LeCun Y et al (1993) Signature verification using a Siamese time delay neural network. Adv Neural Inf Proces Syst 6:737–744

    Google Scholar 

  4. Chen P, Li L, Ma L et al (2020) Rirnet: recurrent-in-recurrent network for video quality assessment. In: Proceedings of the 28th ACM international conference on multimedia, pp 834-842

  5. Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255

  6. Desai NP, Baluch MF, Makrariya A et al (2022) Image processing model with deep learning approach for fish species classification. Turk J Comput Math Educ 13(1):85–99

    Google Scholar 

  7. Ghadiyaram D, Pan J, Bovik AC (2019) A subjective and objective study of stalling events in mobile streaming videos. IEEE Trans Circ Syst Video Technol 29(1):183–197

    Article  Google Scholar 

  8. Guan T, Li C, Gu K, Liu H, Zheng Y, Wu XJ (2022) Visibility and distortion measurement for no-reference Dehazed image quality assessment via complex Contourlet transform. IEEE Trans Multimedia:1. https://doi.org/10.1109/TMM.2022.3168438

  9. Guo J, Luo Y (2021) No-reference omnidirectional video quality assessment based on generative adversarial networks. Multimed Tools Appl 80(18):27531–27552

    Article  Google Scholar 

  10. Hosu V, Lin H, Sziranyi T, Saupe D (2020) KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans Image Process 29:4041–4056

    Article  MATH  Google Scholar 

  11. Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F (2018) No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent 50:247–262

    Article  Google Scholar 

  12. Kim W, Kim J, Ahn S et al (2018) Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European conference on computer vision (ECCV), pp 219-234

  13. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Processings of international conference on learning representation, pp 1–41

  14. Laboratory of Computational Perception & Image Quality, Oklahoma State University: CSIQ video database (2013). http://vision.okstate.edu/csiq/

  15. Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1):011006

    Article  Google Scholar 

  16. Li Z, Aaron A, Katsavounidis I, Moorthy A, Manohara M (2017) Toward a practical perceptual video quality metric. http://techblog.netflix.com/2016/06/towardpractical-perceptual-video.html

  17. Li T, Min X, Zhao H, Zhai G, Xu Y, Zhang W (2021) Subjective and objective quality assessment of compressed screen content videos. IEEE Trans Broadcast 67(2):438–449

    Article  Google Scholar 

  18. Li C, Guan T, Zheng Y, Zhong X, Wu X, Bovik A (2021) Blind image quality assessment in the contourlet domain. Signal Process Image Commun 91:116064

    Article  Google Scholar 

  19. Li F, Zhang Y, Cosman PC (2021) MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181

  20. Liu L, Liu B, Huang H, Bovik AC (2014) No-reference image quality assessment based on spatial and spectral entropies. Signal Process Image Commun 29(8):856–863

    Article  Google Scholar 

  21. Liu X, Van De Weijer J, Bagdanov AD (2017) Rankiqa: learning from rankings for no-reference image quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1040–1049

  22. Liu W, Duanmu Z, Wang Z (2018) End-to-end blind quality assessment of compressed videos using deep neural networks. In: ACM Multimedia, pp 546–554

  23. Liu L, Wang T, Huang H (2019) Pre-attention and spatial dependency driven no-reference image quality assessment. IEEE Trans Multimedia 21(9):2305–2318

    Article  Google Scholar 

  24. Liu L, Wang T, Huang H, Bovik AC (2020) Video quality assessment using space-time slice mappings. Signal Process Image Commun 82:115749

    Article  Google Scholar 

  25. Liu S, Wang S, Liu X, Gandomi AH, Daneshmand M, Muhammad K, de Albuquerque VHC (2021) A multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia 23:2188–2198

    Article  Google Scholar 

  26. Liu S, Wang S, Liu X, Lin CT, Lv Z (2021) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102

    Article  Google Scholar 

  27. Liu S, Wang S, Liu X et al (2022) Human Inertial thinking strategy: a novel fuzzy reasoning mechanism for IoT-assisted visual monitoring. IEEE Internet Things J. 10.1109/JIOT.2022.3142115

  28. Loh WT, Bong DBL (2018) An error-based video quality assessment method with temporal information. Multimed Tools Appl 77(23):30791–30814

    Article  Google Scholar 

  29. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: 7th International joint conference on artificial intelligence, pp 674–679

  30. Luo W, Li Y, Urtasun R et al (2016) Understanding the effective receptive field in deep convolutional neural networks. In: 30th annual conference on neural information processing systems, pp 4905–4913

  31. Moorthy AK, Choi LK, Bovik AC, de Veciana G (2012) Video quality assessment on Mobile devices: subjective, behavioral and objective studies. IEEE J Sel Top Signal Process 6(6):652–671

    Article  Google Scholar 

  32. Murdock BB Jr (1962) The serial position effect of free recall. J Exp Psychol 64(5):482–488

    Article  Google Scholar 

  33. Ponomarenko N, Ieremeiev O, Lukin V et al (2013) Color image database TID2013: peculiarities and preliminary results. In: European workshop on visual information processing (EUVIP), IEEE, pp 106-111

  34. Rimac-Drlje S, Vranješ M, Žagar D (2010) Foveated mean squared error—a novel video quality metric. Multimed Tools Appl 49(3):425–445

    Article  Google Scholar 

  35. Rimac-Drlje S, Vranješ M, Žagar D (2013) Review of objective video quality metrics and performance comparison using different databases. Signal Process Image Commun 28(1):1–19

    Article  Google Scholar 

  36. Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365

    Article  MATH  MathSciNet  Google Scholar 

  37. Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350

    Article  MATH  MathSciNet  Google Scholar 

  38. Seshadrinathan K, Bovik AC (2011) Temporal hysteresis model of time varying subjective video quality. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1153-1156

  39. Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441

    Article  MATH  MathSciNet  Google Scholar 

  40. Soundararajan R, Bovik AC (2013) Video quality assessment by reduced reference spatio-temporal entropic dierencing. IEEE Trans Circ Syst Video Technol 23(4):684–694

    Article  Google Scholar 

  41. Suchow JW, Alvarez GA (2011) Motion silences awareness of visual change. Curr Biol 21(2):140–143

    Article  Google Scholar 

  42. Vu PV, Chandler DM (2014) ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J Electron Imaging 23(1):013016

    Article  Google Scholar 

  43. Vu PV, Vu CT, Chandler DM (2011) A spatiotemporal Most-apparent-distortion model for video quality assessment. In: 2011 18th IEEE international conference on image processing, IEEE, pp 2505-2508

  44. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  45. Wang C, Su L, Zhang W (2018) COME for no-reference video quality assessment. In: 2018 IEEE conference on multimedia information processing and retrieval (MIPR), IEEE, pp 232–237

  46. Wang G, Wang Z, Gu K et al (2021) Reference-free DIBR-synthesized video quality metric in spatial and temporal domains. IEEE Trans Circ Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3074181

  47. Westerink JH, Teunissen K (1995) Perceived sharpness in complex moving images. Displays 16(2):89–97

    Article  Google Scholar 

  48. Wu J, Liu Y, Dong W, Shi G, Lin W (2019) Quality assessment for video with degradation along salient trajectories. IEEE Trans Multimedia 21(11):2738–2749

    Article  Google Scholar 

  49. Xu M, Chen J, Wang H et al (2020) C3DVQA: full-reference video quality assessment with 3D convolutional neural network. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4447–4451

  50. You J, Korhonen J (2021) Transformer for image quality assessment. In: 2021 IEEE international conference on image processing (ICIP), IEEE, pp 1389-1393

  51. Zhang Y, Gao X, He L, Lu W, He R (2019) Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans Circ Syst Video Technol 29(8):2244–2255

    Article  Google Scholar 

  52. Zhang Y, Gao X, He L, Lu W, He R (2020) Objective video quality assessment combining transfer learning with CNN. IEEE Trans Neural Netw Learn Syst 31(8):2716–2730

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61672095, 61371143 and National Key Research and Development Program Project of China under Grant No. 2020YFC0811004.

Author information

Authors and Affiliations

Authors

Contributions

Not applicable.

Corresponding author

Correspondence to Lixiong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, M., Liu, L., Sang, Q. et al. Deep video quality assessment using constrained multi-task regression and Spatio-temporal feature fusion. Multimed Tools Appl 82, 28067–28086 (2023). https://doi.org/10.1007/s11042-023-14652-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14652-2

Keywords

Navigation