research-article

End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks

Authors:

Zhengfang Duanmu,

Zhou WangAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 546 - 554

https://doi.org/10.1145/3240508.3240643

Published: 15 October 2018 Publication History

Abstract

Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.The source code of V-MEON is available at https://ece.uwaterloo.ca/~zduanmu/acmmm2018bvqa.

References

[1]

J. Ballé, V. Laparra, and E. P. Simoncelli. 2015. Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015), 1--14.

[2]

J. Ballé, V. Laparra, and E. P. Simoncelli. 2016. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016), 1--27.

[3]

M. Ben-Ezra, A. Zomet, and S. K. Nayar. 2005. Video super-resolution using controlled subpixel detector shifts. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 27, 6 (2005), 977--987.

Digital Library

[4]

S. Bianco, L. Celona, P. Napoletano, and R. Schettini. 2018. On the use of deep learning for blind image quality assessment. Signal, Image and Video Processing, Vol. 12, 2 (2018), 355--362.

[5]

Y. S. Bonneh, A. Cooperman, and D. Sagi. 2001. Motion-induced blindness in normal observers. Nature, Vol. 411, 6839 (2001), 798--801.

[6]

S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek. 2018. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Processing, Vol. 27, 1 (2018), 206--219.

[7]

D. Ghadiyaram, C. Chen, S. Inguva, and A. Kokaram. 2017. A no-reference video quality predictor for compression and scaling artifacts. In Proc. IEEE Int. Conf. Image Processing. 3445--3449.

[8]

X. Huang, J. Søgaard, and S. Forchhammer. 2017. No-reference pixel based video quality assessment for HEVC decoded video. Journal of Visual Communication and Image Representation, Vol. 43 (2017), 173--184.

Digital Library

[9]

Q. Huynh-Thu and M. Ghanbari. 2008. Temporal aspect of perceived quality in mobile video broadcasting. IEEE Trans. Broadcasting, Vol. 54, 3 (2008), 641--651.

[10]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F.-F. Li. 2014. Large-scale video classification with convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 1725--1732.

Digital Library

[11]

J. Kim and S. Lee. 2017. Fully deep blind image quality predictor. IEEE Journal of Selected Topics in Signal Processing, Vol. 11, 1 (2017), 206--220.

[12]

D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014), 1--15.

[13]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Nerual Information Processing Systems. 1097--1105.

Digital Library

[14]

X. Li, Q. Guo, and X. Lu. 2016b. Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Processing, Vol. 25, 7 (2016), 3329--3342.

[15]

Y. Li, L. Po, C. Cheung, X. Xu, L. Feng, F. Yuan, and K.-W. Cheung. 2016c. No-reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans. Circuits and Systems for Video Tech., Vol. 26, 6 (2016), 1044--1057.

[16]

Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara. 2016a. Toward a practical perceptual video quality metric. https://medium.com/netflix-techblog/toward-a-practical-perceptual-video-quality-metric-653f208b9652.

[17]

C. Liu and W. T. Freeman. 2010. A high-quality video denoising algorithm based on reliable motion estimation. In European Conf. Computer Vision. 706--719.

Digital Library

[18]

W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo. 2015. Multi-task deep visual-semantic embedding for video thumbnail selection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 3707--3715.

[19]

K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo. 2018. End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Processing, Vol. 27, 3 (2018), 1202--1213.

[20]

R. Mantiuk, A. Tomaszewska, and R. Mantiuk. 2012. Comparison of four subjective methods for image quality assessment. Computer Graphics Forum, Vol. 31, 8 (2012), 2478--2491.

Digital Library

[21]

A. Mittal, M. A Saad, and A. C. Bovik. 2016. A completely blind video integrity oracle. IEEE Trans. Image Processing, Vol. 25, 1 (2016), 289--300.

[22]

O. Nemcic, M. Vranjes, and S. Rimac-Drlje. 2007. Comparison of H. 264/AVC and MPEG-4 Part 2 coded video. In Proc. IEEE Sym. Electronics in Marine. 41--44.

[23]

S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, Vol. 22, 10 (2010), 1345--1359.

Digital Library

[24]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. 2017. Automatic differentiation in PyTorch. In Advances in Nerual Information Processing Systems Workshop. 1--4.

[25]

Y. Pitrey, M. Barkowsky, R. Pépion, P. Le Callet, and H. Hlavacs. 2012. Influence of the source content and encoding configuration on the perceived quality for scalable video coding. In Proc. SPIE 8291, Human Vision and Electronic Imaging XVII. 1--8.

[26]

A. Rehman, K. Zeng, and Z. Wang. 2015. Display device-adapted video quality-of-experience assessment. In Proc. SPIE 9394, Human Vision and Electronic Imaging XX. 1--11.

[27]

S. Rimac-Drlje, M. Vranjevs, and D. vZ agar. 2010. Foveated mean squared error-a novel video quality metric. Multimedia tools and applications, Vol. 49, 3 (2010), 425--445.

Digital Library

[28]

J. G. Robson. 1966. Spatial and temporal contrast-sensitivity functions of the visual system. Journal of Optical Society of America, Vol. 56, 8 (1966), 1141--1142.

[29]

M. A Saad, A. C. Bovik, and C. Charrier. 2014. Blind prediction of natural video quality. IEEE Trans. Image Processing, Vol. 23, 3 (2014), 1352--1365.

Digital Library

[30]

M. Shahid, A. Rossholm, B. Lö vströ m, and H.-J. Zepernick. 2014. No-reference image and video quality assessment: a classification and review of recent approaches. EURASIP Journal on Image and Video Processing, Vol. 2014, 1 (2014), 1--32.

[31]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 1--14.

[32]

J. Søgaard, S. Forchhammer, and J. Korhonen. 2015. No-reference video quality assessment using codec analysis. IEEE Trans. Circuits and Systems for Video Tech., Vol. 25, 10 (2015), 1637--1650.

[33]

G. J. Sullivan, J. Ohm, W.-J. Han, and T. Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits and Systems for Video Tech., Vol. 22, 12 (2012), 1649--1668.

Digital Library

[34]

FFmpeg team. 2017. FFmpeg. https://www.ffmpeg.org/. Retrieved Jan 18, 2018 from

[35]

P.N. Tudor. 1995. MPEG-2 video compression. Electronics & Communication Engineering Journal, Vol. 7, 6 (1995), 257--264.

[36]

VQEG. 2000. Final report from the video quality experts group on the validation of objective models of video quality assessment. http://www.vqeg.org/.

[37]

M. Vranjevs, S. Rimac-Drlje, and K. Grgic. 2013. Review of objective video quality metrics and performance comparison using different databases. Signal Processing: Image Communication, Vol. 28, 1 (2013), 1--19.

Digital Library

[38]

P. V. Vu and D. M. Chandler. 2014. ViS3: An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, Vol. 23, 1 (2014), 1--24.

[39]

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits and Systems for Video Tech., Vol. 13, 7 (2003), 560--576.

Digital Library

[40]

X. Xia, Z. Lu, L. Wang, M. Wan, and X. Wen. 2014. Blind video quality assessment using natural video spatio-temporal statistics. In Proc. IEEE Int. Conf. Multimedia and Expo. 1--6.

[41]

J. Xu, P. Ye, Y. Liu, and D. Doermann. 2014. No-reference video quality assessment via feature learning. In Proc. IEEE Int. Conf. Image Processing. 491--495.

[42]

P. Ye, J. Kumar, L. Kang, and D. Doermann. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Proc. IEEE Conf. Computer Vision and Pattern Recognition . 1098--1105.

Digital Library

[43]

K. Zeng, T. Zhao, A. Rehman, and Z. Wang. 2014. Characterizing perceptual artifacts in compressed video streams. In Proc. SPIE 9014, Human Vision and Electronic Imaging XIX. 1--10.

Cited By

Yang JWang ZHuang BAi JYang YXiao JXiong Z(2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111011
Sun WWen WMin XLan LZhai GMa K(2024)Analysis of Video Quality Datasets via Design of Minimalistic Video Quality ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338536446:11(7056-7071)Online publication date: Nov-2024
https://doi.org/10.1109/TPAMI.2024.3385364
Min QZhou ZLi ZWu M(2024)Automatic Evaluation of Instructional Videos Based on Video Features and Student Watching ExperienceIEEE Transactions on Learning Technologies10.1109/TLT.2023.329935917(54-62)Online publication date: 2024
https://doi.org/10.1109/TLT.2023.3299359
Show More Cited By

Index Terms

End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks
2017 IEEE International Conference on Robotics and Automation (ICRA)
This paper studies monocular visual odometry (VO) problem. Most of existing VO algorithms are developed under a standard pipeline including feature extraction, feature matching, motion estimation, local optimisation, etc. Although some of them have ...
End-to-End Blind Image Quality Prediction With Cascaded Deep Neural Network
The deep convolutional neural network (CNN) has achieved great success in image recognition. Many image quality assessment (IQA) methods directly use recognition-oriented CNN for quality prediction. However, the properties of IQA task is different from ...
Image Quality Assessment Using Combination of Deep Convolutional Neural Networks
Pattern Recognition and Machine Intelligence
Abstract
Easy access of hand-held image capturing devices has increased storage and transmission of visual digital data immensely. Hence, device assisted evaluation of quality of images without any prior information is of great interest. Historically, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
604
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang JWang ZHuang BAi JYang YXiao JXiong Z(2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111011
Sun WWen WMin XLan LZhai GMa K(2024)Analysis of Video Quality Datasets via Design of Minimalistic Video Quality ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338536446:11(7056-7071)Online publication date: Nov-2024
https://doi.org/10.1109/TPAMI.2024.3385364
Min QZhou ZLi ZWu M(2024)Automatic Evaluation of Instructional Videos Based on Video Features and Student Watching ExperienceIEEE Transactions on Learning Technologies10.1109/TLT.2023.329935917(54-62)Online publication date: 2024
https://doi.org/10.1109/TLT.2023.3299359
Yu XYing ZBirkbeck NWang YAdsumilli BBovik A(2024)Subjective and Objective Analysis of Streamed Gaming VideosIEEE Transactions on Games10.1109/TG.2023.329309316:2(445-458)Online publication date: Jun-2024
https://doi.org/10.1109/TG.2023.3293093
Yan JWu LFang YLiu XXia XLiu W(2024)Video Quality Assessment for Online Processing: From Spatial to Temporal SamplingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345008534:12(13441-13451)Online publication date: Dec-2024
https://doi.org/10.1109/TCSVT.2024.3450085
Shen WZhou MWei XWang HFang BJi CZhuang XWang JLuo JPu HHuang XWang SCao HFeng YXiang TShang Z(2024)A Blind Video Quality Assessment Method via Spatiotemporal Pyramid AttentionIEEE Transactions on Broadcasting10.1109/TBC.2023.334003170:1(251-264)Online publication date: Mar-2024
https://doi.org/10.1109/TBC.2023.3340031
Guo WZhang KKe X(2024)Integrates Spatiotemporal Visual Stimuli for Video Quality AssessmentIEEE Transactions on Broadcasting10.1109/TBC.2023.331293270:1(223-237)Online publication date: Mar-2024
https://doi.org/10.1109/TBC.2023.3312932
Mi YLi YShu YLiu S(2024)ZE-FESG: A Zero-Shot Feature Extraction Method Based on Semantic Guidance for No-Reference Video Quality AssessmentICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448422(3640-3644)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448422
Lu YLi XPei YYuan KXie QQu YSun MZhou CChen Z(2024)KVQ: Kwai Video Quality Assessment for Short-form Videos2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02453(25963-25973)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02453
Llorente ÁPérez JRodrigo JJiménez DMenéndez J(2024)Efficient Computational Cost Saving in Video Processing for QoE EstimationIEEE Access10.1109/ACCESS.2024.337319312(34846-34862)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3373193
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten