skip to main content
10.1145/3394171.3413717acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment

Published: 12 October 2020 Publication History

Abstract

Video quality assessment (VQA), which is capable of automatically predicting the perceptual quality of source videos especially when reference information is not available, has become a major concern for video service providers due to the growing demand for video quality of experience (QoE) by end users. While significant advances have been achieved from the recent deep learning techniques, they often lead to misleading results in VQA tasks given their limitations on describing 3D spatio-temporal regularities using only fixed temporal frequency. Partially inspired by psychophysical and vision science studies revealing the speed tuning property of neurons in visual cortex when performing motion perception (i.e., sensitive to different temporal frequencies), we propose a novel no-reference (NR) VQA framework named Recurrent-In-Recurrent Network (RIRNet) to incorporate this characteristic to prompt an accurate representation of motion perception in VQA task. By fusing motion information derived from different temporal frequencies in a more efficient way, the resulting temporal modeling scheme is formulated to quantify the temporal motion effect via a hierarchical distortion description. It is found that the proposed framework is in closer agreement with quality perception of the distorted videos since it integrates concepts from motion perception in human visual system (HVS), which is manifested in the designed network structure composed of low- and high- level processing. A holistic validation of our methods on four challenging video quality databases demonstrates the superior performances over the state-of-the-art methods.

Supplementary Material

MP4 File (3394171.3413717.mp4)
This is a video presentation of oral paper ?RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment? in MM?20 conference. We have carried out a brief background introduction and framework description in this video. For more detailed contents, please refer to our paper.

References

[1]
S. Bae and T. Lee. 2011. Product type and consumers' perception of online consumer reviews. Electronic Markets, Vol. 21, 4 (2011), 255--266.
[2]
C. G. Bampis, Z. Li, and A. C. Bovik. 2018. Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Trans. Circuits Syst. Video Technol., Vol. 29, 8 (2018), 2256--2270.
[3]
T. Brand ao and M. P. Queluz. 2010. No-reference quality assessment of H. 264/AVC encoded video. IEEE Trans. Circuits Syst. Video Technol., Vol. 20, 11 (2010), 1437--1447.
[4]
P. Chen, L. Li, Y. Huang, F. Tan, and W. Chen. 2019. QoE Evaluation for Live Broadcasting Video. In Proc. IEEE Int. Conf. Image Process. (ICIP). IEEE, 454--458.
[5]
P. Chen, L. Li, X. Zhang, S. Wang, and A. Tan. 2019. Blind quality index for tone-mapped images based on luminance partition. Pattern Recognit., Vol. 89 (2019), 108--118.
[6]
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[7]
P. Dayan, L. F. Abbott, and L. Abbott. 2001. Theoretical neuroscience: computational and mathematical modeling of neural systems. (2001).
[8]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR). IEEE, 248--255.
[9]
S. Gehlen. [n.d.]. Experts agree: live streaming video for brands is one of 2016's biggest marketing trends. [Online]. Available: www.yourbrandlive.com/blog/live-streaming-video-marketingtrends-2016. Accessed: Nov.5, 2019.
[10]
K. He, X. Zhang, S. Ren, and J. Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 37, 9 (2015), 1904--1916.
[11]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR). IEEE, 770--778.
[12]
V. Hosu, F. Hahn, M. Jenadeleh, H. Lin, H. Men, T. Szirányi, S. Li, and D. Saupe. 2017. The Konstanz natural video database (KoNViD-1k). In Proc. Int. Conf. Quality Multimedia Exper. (QoMEx). IEEE, 1--6.
[13]
B. Hu, L. Li, and J. Qian. 2018. Internal Generative Mechanism Driven Blind Quality Index for Deblocked Images. In Proc. IEEE Int. Conf. Image Process. (ICIP). IEEE, 2476--2480.
[14]
ITU. 2011. Recommendation ITU-R BT.500.13: Methodology for the subjective assessment of the quality of television pictures. International Telecommunications Union (2011).
[15]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. IEEE, 1725--1732.
[16]
D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[17]
C. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. 2015. Deeply-supervised nets. In Artificial intelligence and statistics. 562--570.
[18]
D. Li, T. Jiang, and M. Jiang. 2019. Quality Assessment of In-the-Wild Videos. In Proc. ACM Int. Conf. Multimedia (ACM MM). ACM, 2351--2359.
[19]
L. Li, Y. Zhou, K. Gu, W. Lin, and S. Wang. 2017. Quality assessment of DBIR-synthesized images by measuring local geometric distortions and global sharpness. IEEE Trans. Multimedia, Vol. 20, 4 (2017), 914--926.
[20]
Y. Li, L. Po, C. Cheung, X. Xu, L. Feng, F. Yuan, and K. Cheung. 2015. No-reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol., Vol. 26, 6 (2015), 1044--1057.
[21]
W. Liu, Z. Duanmu, and Z. Wang. 2018. End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks. In Proc. ACM Int. Conf. Multimedia (ACM MM). ACM, 546--554.
[22]
A. Mittal, A. K. Moorthy, and A. C. Bovik. 2012a. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process., Vol. 21, 12 (2012), 4695--4708.
[23]
A. Mittal, M. A. Saad, and A. C. Bovik. 2015. A completely blind video integrity oracle. IEEE Trans. Image Process., Vol. 25, 1 (2015), 289--300.
[24]
A. Mittal, R. Soundararajan, and A. C. Bovik. 2012b. Making a 'Completely Blind' Image Quality Analyzer. IEEE Signal Process. Lett., Vol. 20, 3 (2012), 209--212.
[25]
J. A. Movshon and W. T. Newsome. 1996. Visual response properties of striate cortical neurons projecting to area MT in Macaque monkeys. J. Neurosci., Vol. 16, 23 (1996), 7733--7741.
[26]
M. Nuutinen, T. Virtanen, M. Vaahteranoksa, T. Vuori, P. Oittinen, and J. Häkkinen. 2016. CVD2014-A database for evaluating no-reference video quality assessment algorithms. IEEE Trans. Image Process., Vol. 25, 7 (2016), 3073--3086.
[27]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. 2017. Automatic differentiation in pytorch. (2017).
[28]
J. A. Perrone and A. Thiele. 2001. Speed skills: Measuring the visual speed analyzing properties of primate MT neurons. Nature Neurosci., Vol. 4, 5 (2001), 526--532.
[29]
N. J. Priebe, S. G. Lisberger, and J. A. Movshon. 2006. Tuning for spatiotemporal frequency and speed in directionally selective neurons of macaque striate cortex. J. Neurosci., Vol. 26, 11 (2006), 2941--2950.
[30]
M. A. Saad, A. C. Bovik, and C. Charrier. 2014. Blind prediction of natural video quality. IEEE Trans. Image Process., Vol. 23, 3 (2014), 1352--1365.
[31]
K. Seshadrinathan and A. C. Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process., Vol. 19, 2 (2009), 335--350.
[32]
K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of subjective and objective quality assessment of video. IEEE Trans. Image Process., Vol. 19, 6 (2010), 1427--1441.
[33]
E. Siahaan, A. Hanjalic, and J. A. Redi. 2018. Semantic-aware blind image quality assessment. Sig. Process.: Image Commun., Vol. 60 (2018), 237--252.
[34]
A. J. Smola and B. Schölkopf. 2004. A tutorial on support vector regression. Statistics and computing, Vol. 14, 3 (2004), 199--222.
[35]
Facebook Live Statistics. [n.d.]. Facebook Video Statistics. [Online]. Available: http://mediakix.com/2016/08/facebook-video-statistics-everyone-needs-know. Accessed: Nov.5, 2019.
[36]
G. Valenzise, S. Magni, M. Tagliasacchi, and S. Tubaro. 2011. No-reference pixel video quality monitoring of channel-induced distortion. IEEE Trans. Circuits Syst. Video Technol., Vol. 22, 4 (2011), 605--618.
[37]
VQEG. 2000. FINAL REPORT FROM THE VIDEO QUALITY EXPERTS GROUP ON THE VALIDATION OF OBJECTIVE MODELS OF VIDEO QUALITY ASSESSMENT March 2000. (2000).
[38]
Phong V Vu and Damon M Chandler. 2014. ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, Vol. 23, 1 (2014), 013016.
[39]
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proc. Eur. Conf. Comput. Vis. (ECCV). Springer, 20--36.
[40]
J. Wu, Y. Liu, W. Dong, G. Shi, and W. Lin. 2019. Quality Assessment for Video with Degradation Along Salient Trajectories. IEEE Trans. Multimedia, Vol. 21, 11 (2019), 2738--2749.
[41]
J. Wu, J. Zeng, W. Dong, G. Shi, and W. Lin. 2019. Blind image quality assessment with hierarchy: Degradation from local structure to deep semantics. J. Vis. Commun. Image Represent., Vol. 58 (2019), 353--362.
[42]
Q. Wu, H. Li, F. Meng, and K. N. Ngan. 2018. Toward a blind quality metric for temporally distorted streaming video. IEEE Trans. Broadcast., Vol. 64, 2 (2018), 367--378.
[43]
J. Xu, P. Ye, Y. Liu, and D. Doermann. 2014. No-reference video quality assessment via feature learning. In Proc. IEEE Int. Conf. Image Process. (ICIP). IEEE, 491--495.
[44]
Y. Zhang, X. Gao, L. He, W. Lu, and R. He. 2018. Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol., Vol. 29, 8 (2018), 2224--22255.
[45]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR). IEEE, 2921--2929.
[46]
Y. Zhou, L. Li, S. Wang, J. Wu, and Y. Zhang. 2018. No-reference quality assessment of DIBR-synthesized videos by measuring temporal flickering. J. Vis. Commun. Image Represent., Vol. 55 (2018), 30--39.

Cited By

View all
  • (2025)Subjective and Objective Quality Assessment of Colonoscopy VideosIEEE Transactions on Medical Imaging10.1109/TMI.2024.346173744:2(841-854)Online publication date: Feb-2025
  • (2025)A no-reference video quality assessment method with bidirectional hierarchical semantic representationSignal Processing10.1016/j.sigpro.2024.109819230(109819)Online publication date: May-2025
  • (2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. motion perception
  2. speed tuning
  3. temporal frequency
  4. video quality assessment

Qualifiers

  • Research-article

Funding Sources

  • Key Project of Shanxi Provincial De- partment of Education (Collaborative Innovation Center)
  • Science and Technology Plan of Xian
  • National Natural Science Foundation of China
  • Natural Science Foundation of Jiangsu Province
  • Six Talent Peaks High-level Talents in Jiangsu Province

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Subjective and Objective Quality Assessment of Colonoscopy VideosIEEE Transactions on Medical Imaging10.1109/TMI.2024.346173744:2(841-854)Online publication date: Feb-2025
  • (2025)A no-reference video quality assessment method with bidirectional hierarchical semantic representationSignal Processing10.1016/j.sigpro.2024.109819230(109819)Online publication date: May-2025
  • (2025)Luminance decomposition and reconstruction for high dynamic range Video Quality AssessmentPattern Recognition10.1016/j.patcog.2024.111011158(111011)Online publication date: Feb-2025
  • (2024)Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training StrategyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680907(9913-9922)Online publication date: 28-Oct-2024
  • (2024)Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680598(9970-9979)Online publication date: 28-Oct-2024
  • (2024)Quantizing Neural Networks with Knowledge Distillation for Efficient Video Quality Assessment2024 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP63160.2024.10849844(1-5)Online publication date: 8-Dec-2024
  • (2024)Blind Video Quality Prediction by Uncovering Human Video Perceptual RepresentationIEEE Transactions on Image Processing10.1109/TIP.2024.344573833(4998-5013)Online publication date: 2024
  • (2024)A Spatial–Temporal Video Quality Assessment Method via Comprehensive HVS SimulationIEEE Transactions on Cybernetics10.1109/TCYB.2023.333861554:8(4749-4762)Online publication date: Aug-2024
  • (2024)Video Quality Assessment for Online Processing: From Spatial to Temporal SamplingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345008534:12(13441-13451)Online publication date: Dec-2024
  • (2024)Deep Learning Approach for No-Reference Screen Content Video Quality AssessmentIEEE Transactions on Broadcasting10.1109/TBC.2024.337404270:2(555-569)Online publication date: Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media