skip to main content
10.1145/3240508.3240643acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks

Published: 15 October 2018 Publication History

Abstract

Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.The source code of V-MEON is available at https://ece.uwaterloo.ca/~zduanmu/acmmm2018bvqa.

References

[1]
J. Ballé, V. Laparra, and E. P. Simoncelli. 2015. Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015), 1--14.
[2]
J. Ballé, V. Laparra, and E. P. Simoncelli. 2016. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016), 1--27.
[3]
M. Ben-Ezra, A. Zomet, and S. K. Nayar. 2005. Video super-resolution using controlled subpixel detector shifts. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 27, 6 (2005), 977--987.
[4]
S. Bianco, L. Celona, P. Napoletano, and R. Schettini. 2018. On the use of deep learning for blind image quality assessment. Signal, Image and Video Processing, Vol. 12, 2 (2018), 355--362.
[5]
Y. S. Bonneh, A. Cooperman, and D. Sagi. 2001. Motion-induced blindness in normal observers. Nature, Vol. 411, 6839 (2001), 798--801.
[6]
S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek. 2018. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Processing, Vol. 27, 1 (2018), 206--219.
[7]
D. Ghadiyaram, C. Chen, S. Inguva, and A. Kokaram. 2017. A no-reference video quality predictor for compression and scaling artifacts. In Proc. IEEE Int. Conf. Image Processing. 3445--3449.
[8]
X. Huang, J. Søgaard, and S. Forchhammer. 2017. No-reference pixel based video quality assessment for HEVC decoded video. Journal of Visual Communication and Image Representation, Vol. 43 (2017), 173--184.
[9]
Q. Huynh-Thu and M. Ghanbari. 2008. Temporal aspect of perceived quality in mobile video broadcasting. IEEE Trans. Broadcasting, Vol. 54, 3 (2008), 641--651.
[10]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F.-F. Li. 2014. Large-scale video classification with convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 1725--1732.
[11]
J. Kim and S. Lee. 2017. Fully deep blind image quality predictor. IEEE Journal of Selected Topics in Signal Processing, Vol. 11, 1 (2017), 206--220.
[12]
D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014), 1--15.
[13]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Nerual Information Processing Systems. 1097--1105.
[14]
X. Li, Q. Guo, and X. Lu. 2016b. Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Processing, Vol. 25, 7 (2016), 3329--3342.
[15]
Y. Li, L. Po, C. Cheung, X. Xu, L. Feng, F. Yuan, and K.-W. Cheung. 2016c. No-reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans. Circuits and Systems for Video Tech., Vol. 26, 6 (2016), 1044--1057.
[16]
Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara. 2016a. Toward a practical perceptual video quality metric. https://medium.com/netflix-techblog/toward-a-practical-perceptual-video-quality-metric-653f208b9652.
[17]
C. Liu and W. T. Freeman. 2010. A high-quality video denoising algorithm based on reliable motion estimation. In European Conf. Computer Vision. 706--719.
[18]
W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo. 2015. Multi-task deep visual-semantic embedding for video thumbnail selection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. 3707--3715.
[19]
K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo. 2018. End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Processing, Vol. 27, 3 (2018), 1202--1213.
[20]
R. Mantiuk, A. Tomaszewska, and R. Mantiuk. 2012. Comparison of four subjective methods for image quality assessment. Computer Graphics Forum, Vol. 31, 8 (2012), 2478--2491.
[21]
A. Mittal, M. A Saad, and A. C. Bovik. 2016. A completely blind video integrity oracle. IEEE Trans. Image Processing, Vol. 25, 1 (2016), 289--300.
[22]
O. Nemcic, M. Vranjes, and S. Rimac-Drlje. 2007. Comparison of H. 264/AVC and MPEG-4 Part 2 coded video. In Proc. IEEE Sym. Electronics in Marine. 41--44.
[23]
S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, Vol. 22, 10 (2010), 1345--1359.
[24]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. 2017. Automatic differentiation in PyTorch. In Advances in Nerual Information Processing Systems Workshop. 1--4.
[25]
Y. Pitrey, M. Barkowsky, R. Pépion, P. Le Callet, and H. Hlavacs. 2012. Influence of the source content and encoding configuration on the perceived quality for scalable video coding. In Proc. SPIE 8291, Human Vision and Electronic Imaging XVII. 1--8.
[26]
A. Rehman, K. Zeng, and Z. Wang. 2015. Display device-adapted video quality-of-experience assessment. In Proc. SPIE 9394, Human Vision and Electronic Imaging XX. 1--11.
[27]
S. Rimac-Drlje, M. Vranjevs, and D. vZ agar. 2010. Foveated mean squared error-a novel video quality metric. Multimedia tools and applications, Vol. 49, 3 (2010), 425--445.
[28]
J. G. Robson. 1966. Spatial and temporal contrast-sensitivity functions of the visual system. Journal of Optical Society of America, Vol. 56, 8 (1966), 1141--1142.
[29]
M. A Saad, A. C. Bovik, and C. Charrier. 2014. Blind prediction of natural video quality. IEEE Trans. Image Processing, Vol. 23, 3 (2014), 1352--1365.
[30]
M. Shahid, A. Rossholm, B. Lö vströ m, and H.-J. Zepernick. 2014. No-reference image and video quality assessment: a classification and review of recent approaches. EURASIP Journal on Image and Video Processing, Vol. 2014, 1 (2014), 1--32.
[31]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 1--14.
[32]
J. Søgaard, S. Forchhammer, and J. Korhonen. 2015. No-reference video quality assessment using codec analysis. IEEE Trans. Circuits and Systems for Video Tech., Vol. 25, 10 (2015), 1637--1650.
[33]
G. J. Sullivan, J. Ohm, W.-J. Han, and T. Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits and Systems for Video Tech., Vol. 22, 12 (2012), 1649--1668.
[34]
FFmpeg team. 2017. FFmpeg. https://www.ffmpeg.org/. Retrieved Jan 18, 2018 from
[35]
P.N. Tudor. 1995. MPEG-2 video compression. Electronics & Communication Engineering Journal, Vol. 7, 6 (1995), 257--264.
[36]
VQEG. 2000. Final report from the video quality experts group on the validation of objective models of video quality assessment. http://www.vqeg.org/.
[37]
M. Vranjevs, S. Rimac-Drlje, and K. Grgic. 2013. Review of objective video quality metrics and performance comparison using different databases. Signal Processing: Image Communication, Vol. 28, 1 (2013), 1--19.
[38]
P. V. Vu and D. M. Chandler. 2014. ViS3: An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, Vol. 23, 1 (2014), 1--24.
[39]
T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits and Systems for Video Tech., Vol. 13, 7 (2003), 560--576.
[40]
X. Xia, Z. Lu, L. Wang, M. Wan, and X. Wen. 2014. Blind video quality assessment using natural video spatio-temporal statistics. In Proc. IEEE Int. Conf. Multimedia and Expo. 1--6.
[41]
J. Xu, P. Ye, Y. Liu, and D. Doermann. 2014. No-reference video quality assessment via feature learning. In Proc. IEEE Int. Conf. Image Processing. 491--495.
[42]
P. Ye, J. Kumar, L. Kang, and D. Doermann. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Proc. IEEE Conf. Computer Vision and Pattern Recognition . 1098--1105.
[43]
K. Zeng, T. Zhao, A. Rehman, and Z. Wang. 2014. Characterizing perceptual artifacts in compressed video streams. In Proc. SPIE 9014, Human Vision and Electronic Imaging XIX. 1--10.

Cited By

View all
  • (2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
  • (2024)Analysis of Video Quality Datasets via Design of Minimalistic Video Quality ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338536446:11(7056-7071)Online publication date: Nov-2024
  • (2024)Automatic Evaluation of Instructional Videos Based on Video Features and Student Watching ExperienceIEEE Transactions on Learning Technologies10.1109/TLT.2023.329935917(54-62)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '18: Proceedings of the 26th ACM international conference on Multimedia
      October 2018
      2167 pages
      ISBN:9781450356657
      DOI:10.1145/3240508
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. blind video quality assessment
      2. convolutional neural network
      3. multi-task learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '18
      Sponsor:
      MM '18: ACM Multimedia Conference
      October 22 - 26, 2018
      Seoul, Republic of Korea

      Acceptance Rates

      MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
      • (2024)Analysis of Video Quality Datasets via Design of Minimalistic Video Quality ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338536446:11(7056-7071)Online publication date: Nov-2024
      • (2024)Automatic Evaluation of Instructional Videos Based on Video Features and Student Watching ExperienceIEEE Transactions on Learning Technologies10.1109/TLT.2023.329935917(54-62)Online publication date: 2024
      • (2024)Subjective and Objective Analysis of Streamed Gaming VideosIEEE Transactions on Games10.1109/TG.2023.329309316:2(445-458)Online publication date: Jun-2024
      • (2024)Video Quality Assessment for Online Processing: From Spatial to Temporal SamplingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345008534:12(13441-13451)Online publication date: Dec-2024
      • (2024)A Blind Video Quality Assessment Method via Spatiotemporal Pyramid AttentionIEEE Transactions on Broadcasting10.1109/TBC.2023.334003170:1(251-264)Online publication date: Mar-2024
      • (2024)Integrates Spatiotemporal Visual Stimuli for Video Quality AssessmentIEEE Transactions on Broadcasting10.1109/TBC.2023.331293270:1(223-237)Online publication date: Mar-2024
      • (2024)ZE-FESG: A Zero-Shot Feature Extraction Method Based on Semantic Guidance for No-Reference Video Quality AssessmentICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448422(3640-3644)Online publication date: 14-Apr-2024
      • (2024)KVQ: Kwai Video Quality Assessment for Short-form Videos2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02453(25963-25973)Online publication date: 16-Jun-2024
      • (2024)Efficient Computational Cost Saving in Video Processing for QoE EstimationIEEE Access10.1109/ACCESS.2024.337319312(34846-34862)Online publication date: 2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media