research-article

Perceptual Quality Assessment of Internet Videos

Authors:
Jiahua Xu

Alibaba Group, Hefei, China

Alibaba Group, Hefei, China
View Profile

,
Jing Li

Alibaba Group, Beijing, China

Alibaba Group, Beijing, China
View Profile

,
Xingguang Zhou

Alibaba Group, Beijing, China

Alibaba Group, Beijing, China
View Profile

,
Wei Zhou

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Baichao Wang

Alibaba Group, Beijing, China

Alibaba Group, Beijing, China
View Profile

,
Zhibo Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 1248–1257https://doi.org/10.1145/3474085.3475486

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1248–1257

ABSTRACT

With the fast proliferation of online video sites and social media platforms, user, professionally and occupationally generated content (UGC, PGC, OGC) videos are streamed and explosively shared over the Internet. Consequently, it is urgent to monitor the content quality of these Internet videos to guarantee the user experience. However, most existing modern video quality assessment (VQA) databases only include UGC videos and cannot meet the demands for other kinds of Internet videos with real-world distortions. To this end, we collect 1,072 videos from Youku, a leading Chinese video hosting service platform, to establish the Internet video quality assessment database (Youku-V1K). A special sampling method based on several quality indicators is adopted to maximize the content and distortion diversities within a limited database, and a probabilistic graphical model is applied to recover reliable labels from noisy crowdsourcing annotations. Based on the properties of Internet videos originated from Youku, we propose a spatio-temporal distortion-aware model (STDAM). First, the model works blindly which means the pristine video is unnecessary. Second, the model is familiar with diverse contents by pre-training on the large-scale image quality assessment databases. Third, to measure spatial and temporal distortions, we introduce the graph convolution and attention module to extract and enhance the features of the input video. Besides, we leverage the motion information and integrate the frame-level features into video-level features via a bi-directional long short-term memory network. Experimental results on the self-built database and the public VQA databases demonstrate that our model outperforms the state-of-the-art methods and exhibits promising generalization ability.

References

AGH University of Science and Technology. [n. d.]. Video Quality Indicators. http://vq.kt.agh.edu.pl/metrics.html.Google Scholar
Christos G Bampis, Zhi Li, and Alan C Bovik. 2018. Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 8 (2018), 2256--2270.Google ScholarCross Ref
BT, RECOMMENDATION ITU-R. 2002. Methodology for the subjective assessment of the quality of television pictures. International Telecommunication Union (2002).Google Scholar
Zhibo Chen, Wei Zhou, and Weiping Li. 2017. Blind stereoscopic video quality assessment: From depth perception to overall experience. IEEE Transactions on Image Processing, Vol. 27, 2 (2017), 721--734.Google ScholarCross Ref
Sathya Veera Reddy Dendi and Sumohana S Channappayya. 2020. No-Reference Video Quality Assessment Using Natural Spatiotemporal Scene Statistics. IEEE Transactions on Image Processing, Vol. 29 (2020), 5612--5624.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
D.G.Eugene. [n. d.]. Understanding the language of the camera. https://www.thehindu.com/in-school/signpost/understanding-the-language-of-the-camera/article8580792.ece.Google Scholar
Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. 2020. Perceptual Quality Assessment of Smartphone Photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3677--3686.Google ScholarCross Ref
Gunnar Farneback. 2003. Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis. Springer, 363--370. Google ScholarDigital Library
Ganlu(Z5130000). [n. d.]. UGC, PGC and OGC? https://z5130000.wordpress.com/2018/06/01/ugc-pgc-and-ogc/.Google Scholar
Deepti Ghadiyaram and Alan C Bovik. 2017. Perceptual quality prediction on authentically distorted images using a bag of features approach. Journal of vision, Vol. 17, 1 (2017), 32--32.Google ScholarCross Ref
Deepti Ghadiyaram, Janice Pan, Alan C Bovik, Anush Krishna Moorthy, Prasanjit Panda, and Kai-Chieh Yang. 2017. In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 9 (2017), 2061--2077.Google ScholarCross Ref
Franz Götz-Hahn, Vlad Hosu, Hanhe Lin, and Dietmar Saupe. 2019. No-reference video quality assessment using multi-level spatially pooled features. arXiv preprint arXiv:1912.07966 (2019).Google Scholar
Video Quality Experts Group et al. 2000. Final report from the video quality experts group on the validation of objective models of video quality assessment. In VQEG meeting, Ottawa, Canada, March, 2000 .Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Raimund Schatz, and Sebastian Egger. 2011. SOS: The MOS is not enough!. In 2011 third international workshop on quality of multimedia experience. IEEE, 131--136.Google Scholar
Vlad Hosu, Franz Hahn, Mohsen Jenadeleh, Hanhe Lin, Hui Men, Tamás Szirányi, Shujun Li, and Dietmar Saupe. 2017. The Konstanz natural video database (KoNViD-1k). In 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 1--6.Google ScholarCross Ref
P ITU-T RECOMMENDATION. 1999. Subjective video quality assessment methods for multimedia applications. International telecommunication union (1999).Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Jari Korhonen. 2019. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing, Vol. 28, 12 (2019), 5923--5938.Google ScholarCross Ref
Debarati Kundu, Deepti Ghadiyaram, Alan C Bovik, and Brian L Evans. 2017. No-reference quality assessment of tone-mapped HDR pictures. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2957--2971. Google ScholarDigital Library
Dingquan Li, Tingting Jiang, and Ming Jiang. 2019. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia. 2351--2359. Google ScholarDigital Library
Jing Li, Suiyi Ling, Junle Wang, Zhi Li, and Patrick Le Callet. 2020 b. A probabilistic graphical model for analyzing the subjective visual quality assessment data from crowdsourcing. In Proceedings of the 28th ACM International Conference on Multimedia . Google ScholarDigital Library
Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, and Qi Tian. 2020 a. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 214--223.Google ScholarCross Ref
Hanhe Lin, Vlad Hosu, and Dietmar Saupe. 2018. KonIQ-10K: Towards an ecologically valid and large-scale IQA database. arXiv preprint arXiv:1803.08489 (2018).Google Scholar
Kwan-Yee Lin and Guanxiang Wang. 2018. Hallucinated-IQA: No-reference image quality assessment via adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 732--741.Google ScholarCross Ref
Dong Liu, Rohit Puri, Nagendra Kamath, and Subhabrata Bhattacharya. 2020. Composition-Aware Image Aesthetics Assessment. In The IEEE Winter Conference on Applications of Computer Vision. 3569--3578.Google Scholar
Wentao Liu, Zhengfang Duanmu, and Zhou Wang. 2018. End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks.. In ACM Multimedia. 546--554. Google ScholarDigital Library
Wen Lu, Ran He, Jiachen Yang, Changcheng Jia, and Xinbo Gao. 2019. A spatiotemporal model of video quality assessment via 3D gradient differencing. Information Sciences, Vol. 478 (2019), 141--151.Google ScholarCross Ref
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. 2012a. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing, Vol. 21, 12 (2012), 4695--4708. Google ScholarDigital Library
Anish Mittal, Michele A Saad, and Alan C Bovik. 2015. A completely blind video integrity oracle. IEEE Transactions on Image Processing, Vol. 25, 1 (2015), 289--300.Google ScholarDigital Library
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012b. Making a "completely blind" image quality analyzer. IEEE Signal processing letters, Vol. 20, 3 (2012), 209--212.Google Scholar
Anush Krishna Moorthy, Lark Kwon Choi, Alan Conrad Bovik, and Gustavo De Veciana. 2012. Video quality assessment on mobile devices: Subjective, behavioral and objective studies. IEEE Journal of Selected Topics in Signal Processing, Vol. 6, 6 (2012), 652--671.Google ScholarCross Ref
Mikko Nuutinen, Toni Virtanen, Mikko Vaahteranoksa, Tero Vuori, Pirkko Oittinen, and Jukka H"akkinen. 2016. CVD2014?? database for evaluating no-reference video quality assessment algorithms. IEEE Transactions on Image Processing, Vol. 25, 7 (2016), 3073--3086.Google Scholar
Stéphane Péchard, Romuald Pépion, and Patrick Le Callet. 2008. Suitable methodology in subjective video quality assessment: a resolution dependent paradigm.Google Scholar
Michele A Saad, Alan C Bovik, and Christophe Charrier. 2014. Blind prediction of natural video quality. IEEE Transactions on Image Processing, Vol. 23, 3 (2014), 1352--1365. Google ScholarDigital Library
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, Vol. 45, 11 (1997), 2673--2681. Google ScholarDigital Library
Kalpana Seshadrinathan and Alan Conrad Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE transactions on image processing, Vol. 19, 2 (2009), 335--350. Google ScholarDigital Library
K. Seshadrinathan and A. C. Bovik. 2011. Temporal hysteresis model of time varying subjective video quality. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1153--1156.Google Scholar
Kalpana Seshadrinathan, Rajiv Soundararajan, Alan Conrad Bovik, and Lawrence K Cormack. 2010. Study of subjective and objective quality assessment of video. IEEE transactions on Image Processing, Vol. 19, 6 (2010), 1427--1441. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Zeina Sinno and Alan Conrad Bovik. 2018. Large-scale study of perceptual video quality. IEEE Transactions on Image Processing, Vol. 28, 2 (2018), 612--627.Google ScholarDigital Library
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. 2020. Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667--3676.Google ScholarCross Ref
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, and Alan C Bovik. 2020. UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content. arXiv preprint arXiv:2005.14354 (2020).Google Scholar
Phong V Vu and Damon M Chandler. 2014. ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, Vol. 23, 1 (2014), 013016.Google ScholarCross Ref
Yilin Wang, Sasi Inguva, and Balu Adsumilli. 2019. Youtube UGC dataset for video compression research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--5.Google ScholarCross Ref
Zhou Wang, Ligang Lu, and Alan C Bovik. 2004. Video quality assessment based on structural distortion measurement. Signal processing: Image communication, Vol. 19, 2 (2004), 121--132.Google Scholar
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarDigital Library
Jingtao Xu, Peng Ye, Qiaohong Li, Haiqing Du, Yong Liu, and David Doermann. 2016. Blind image quality assessment based on high order statistics aggregation. IEEE Transactions on Image Processing, Vol. 25, 9 (2016), 4444--4457.Google ScholarDigital Library
Wufeng Xue, Xuanqin Mou, Lei Zhang, Alan C Bovik, and Xiangchu Feng. 2014. Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Transactions on Image Processing, Vol. 23, 11 (2014), 4850--4862.Google ScholarCross Ref
Peng Ye, Jayant Kumar, Le Kang, and David Doermann. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 1098--1105. Google ScholarDigital Library
Junyong You, Touradj Ebrahimi, and Andrew Perkis. 2013. Attention driven foveated video quality assessment. IEEE Transactions on Image Processing, Vol. 23, 1 (2013), 200--213. Google ScholarDigital Library
Lin Zhang, Lei Zhang, and Alan C Bovik. 2015. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, Vol. 24, 8 (2015), 2579--2591.Google ScholarDigital Library
Yu Zhang, Xinbo Gao, Lihuo He, Wen Lu, and Ran He. 2018. Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 8 (2018), 2244--2255.Google ScholarCross Ref
Wei Zhou and Zhibo Chen. 2020. Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment. arXiv preprint arXiv:2009.03411 (2020).Google Scholar
Wei Zhou, Zhibo Chen, and Weiping Li. 2018. Stereoscopic video quality prediction based on end-to-end dual stream deep neural networks. In Pacific Rim Conference on Multimedia. Springer, 482--492.Google ScholarCross Ref
Wei Zhou, Qiuping Jiang, Yuwang Wang, Zhibo Chen, and Weiping Li. 2020. Blind quality assessment for image superresolution using deep two-stream convolutional networks. Information Sciences (2020).Google Scholar
Hancheng Zhu, Leida Li, Jinjian Wu, Weisheng Dong, and Guangming Shi. 2020. MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14143--14152.Google ScholarCross Ref

Index Terms

Perceptual Quality Assessment of Internet Videos
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

Enriching and localizing semantic tags in internet videos
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to ...
Read More
Motion tuned spatio-temporal quality assessment of natural videos

There has recently been a great deal of interest in the development of algorithms that objectively measure the integrity of video signals. Since video signals are being delivered to human end users in an increasingly wide array of applications and ...
Read More
Social and automatic annotation of videos for semantic profiling and content discovery
MM '12: Proceedings of the 20th ACM international conference on Multimedia

This demo presents a system based on social relationships, social knowledge and automatic video and textual content analysis for the discovery of videos in social networks. The system, developed as a web application, allows users to annotate, manually ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
database
internet videos
model
perceptual quality
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 527
  Total Downloads
- Downloads (Last 12 months)99
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Perceptual Quality Assessment of Internet Videos

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Enriching and localizing semantic tags in internet videos

Motion tuned spatio-temporal quality assessment of natural videos

Social and automatic annotation of videos for semantic profiling and content discovery