Abstract
Due to the distortion of projection generated during the production of \(360^{\circ }\) video, most quality assessment algorithms used for 2D video have the problem of performance degradation. In this paper, we propose a full-reference \(360^{\circ }\) video quality assessment method, utilizing saliency to guide viewport extraction to eliminate the projection distortion. To be more specific, we first predict the visual saliency of each frame with a \(360^{\circ }\) saliency prediction network and then select the viewport that optimally represents the video frame through the optimal viewport positioning module (OVPM). Furthermore, we propose the attention-based three-dimensional convolutional neural network (3D CNN) quality assessment network to evaluate the video quality, in which 3D CNN convolution and attention modules can better capture the quality degradation of distorted viewports. Experimental results show that our method achieves superior performance in \(360^{\circ }\) video quality assessment tasks.






Similar content being viewed by others
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
Available: https://github.com/Samsung/360tools.
References
Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)
Martin, D., Serrano, A., Masia, B.: Panoramic convolutions for 360 single-image saliency prediction. In: CVPR workshop on computer vision for augmented and virtual reality, vol. 2 (2020)
Seshadrinathan, K., Bovik, A.C.: Temporal hysteresis model of time varying subjective video quality. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1153–1156, IEEE (2011)
Wang, Y., Jiang, T., Ma, S., Gao, W.: Novel spatio-temporal structural information based video quality metric. IEEE Trans. Circuits Syst. Video Technol. 22(7), 989–998 (2012)
Seshadrinathan, K., Bovik, A.C.: Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process. 19(2), 335–350 (2009)
Vu, P.V., Vu, C.T., Chandler, D.M.: A spatiotemporal most-apparent-distortion model for video quality assessment. In: 2011 18th IEEE international conference on image processing, pp. 2505–2508 (2011). IEEE
Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006–011006 (2010)
Moorthy, A.K., Bovik, A.C.: Efficient video quality assessment along temporal trajectories. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1653–1658 (2010)
Manasa, K., Channappayya, S.S.: An optical flow-based full reference video quality assessment algorithm. IEEE Trans. Image Process. 25(6), 2480–2492 (2016)
You, J., Ebrahimi, T., Perkis, A.: Attention driven foveated video quality assessment. IEEE Trans. Image Process. 23(1), 200–213 (2013)
He, L., Lu, W., Jia, C., Hao, L.: Video quality assessment by compact representation of energy in 3d-dct domain. Neurocomputing 269, 108–116 (2017)
Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE Trans. Image Process. 30, 4449–4464 (2021)
Ebenezer, J.P., Shang, Z., Wu, Y., Wei, H., Sethuraman, S., Bovik, A.C.: Chipqa: No-reference video quality prediction via space-time chips. IEEE Trans. Image Process. 30, 8059–8074 (2021)
Wu, J., Liu, Y., Dong, W., Shi, G., Lin, W.: Quality assessment for video with degradation along salient trajectories. IEEE Trans. Multimedia 21(11), 2738–2749 (2019)
Rassool, R.: Vmaf reproducibility: Validating a perceptual practical video quality metric, pp. 1–2 (2017). IEEE
Li, Y., Po, L.-M., Cheung, C.-H., Xu, X., Feng, L., Yuan, F., Cheung, K.-W.: No-reference video quality assessment with 3d shearlet transform and convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 26(6), 1044–1057 (2015)
Liu, W., Duanmu, Z., Wang, Z.: End-to-end blind quality assessment of compressed videos using deep neural networks., pp. 546–554 (2018)
Zhang, Y., Gao, X., He, L., Lu, W., He, R.: Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol. 29(8), 2244–2255 (2018)
Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM international conference on multimedia, pp. 2351–2359 (2019)
Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019)
Ying, Z., Mandal, M., Ghadiyaram, D., Bovik, A.: Patch-vq:’patching up’the video quality problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14019–14029 (2021)
Chen, P., Li, L., Ma, L., Wu, J., Shi, G.: Rirnet: Recurrent-in-recurrent network for video quality assessment. In: Proceedings of the 28th ACM international conference on multimedia, pp. 834–842 (2020)
Xu, M., Chen, J., Wang, H., Liu, S., Li, G., Bai, Z.: C3dvqa: Full-reference video quality assessment with 3d convolutional neural network. In: ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 4447–4451. IEEE (2020)
Sun, Y., Lu, A., Yu, L.: Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Process. Lett. 24(9), 1408–1412 (2017)
Xiu, X., He, Y., Ye, Y., Vishwanath, B.: An evaluation framework for 360-degree video compression. In: 2017 IEEE visual communications and image processing (VCIP), pp. 1–4 (2017). IEEE
Yu, M., Lakshman, H., Girod, B.: A framework to evaluate omnidirectional video coding schemes. In: 2015 IEEE international symposium on mixed and augmented reality, pp. 31–36 (2015). IEEE
Zakharchenko, V., Choi, K.P., Park, J.H.: Quality metric for spherical panoramic video. Opt. Photon. Inform. Process. X 9970, 57–65 (2016)
Xu, M., Li, C., Chen, Z., Wang, Z., Guan, Z.: Assessing visual quality of omnidirectional videos. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3516–3530 (2018)
Gao, P., Zhang, P., Smolic, A.: Quality assessment for omnidirectional video: a spatio-temporal distortion modeling approach. IEEE Trans. Multimedia 24, 1–16 (2020)
Yang, S., Zhao, J., Jiang, T., Wang, J., Rahim, T., Zhang, B., Xu, Z., Fei, Z.: An objective assessment method based on multi-level factors for panoramic videos. In: 2017 IEEE visual communications and image processing (VCIP), pp. 1–4 (2017). IEEE
Jiang, Z., Xu, Y., Sun, J., Hwang, J.-N., Zhang, Y., Appleby, S.C.: Tile-based panoramic video quality assessment. IEEE Trans. Broadcast. 68(2), 530–544 (2021)
Li, C., Xu, M., Jiang, L., Zhang, S., Tao, X.: Viewport proposal cnn for \(360^{\circ }\) video quality assessment. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10169–10178 (2019). IEEE
Xu, M., Jiang, L., Li, C., Wang, Z., Tao, X.: Viewport-based CNN: a multi-task approach for assessing \(360^{\circ }\) video quality. IEEE Trans. Pattern Anal. Mach. Intell. 44(4), 2198–2215 (2020)
Meng, Y., Ma, Z.: Viewport-based omnidirectional video quality assessment: database, modeling and inference. IEEE Trans. Circuits Syst. Video Technol. 32(1), 120–134 (2021)
Chai, X., Shao, F.: Blind quality assessment of omnidirectional videos using spatio-temporal convolutional neural networks. Optik 226, 165887 (2021)
Kim, H.G., Lim, H.-T., Ro, Y.M.: Deep virtual reality image quality assessment with human perception guider for omnidirectional image. IEEE Trans. Circuits Syst. Video Technol. 30(4), 917–928 (2019)
Zhou, Y., Sun, Y., Li, L., Gu, K., Fang, Y.: Omnidirectional image quality assessment by distortion discrimination assisted multi-stream network. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1767–1777 (2021)
Chai, X., Shao, F., Jiang, Q., Meng, X., Ho, Y.-S.: Monocular and binocular interactions oriented deformable convolutional networks for blind quality assessment of stereoscopic omnidirectional images. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3407–3421 (2021)
Sun, W., Min, X., Zhai, G., Gu, K., Duan, H., Ma, S.: Mc360iqa: a multi-channel CNN for blind 360-degree image quality assessment. IEEE J. Sel. Top. Signal Process. 14(1), 64–77 (2019)
Xu, J., Zhou, W., Chen, Z.: Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1724–1737 (2020)
Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment? In: 2017 Ninth international conference on quality of multimedia experience (QoMEX), pp. 1–6 (2017). IEEE
Sitzmann, V., Serrano, A., Pavel, A., Agrawala, M., Gutierrez, D., Masia, B., Wetzstein, G.: Saliency in vr: How do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)
Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., Gao, S.: Gaze prediction in dynamic 360 immersive videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5333–5342 (2018)
Cheng, H.-T., Chao, C.-H., Dong, J.-D., Wen, H.-K., Liu, T.-L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429 (2018)
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Li, F., Bai, H., Zhao, Y.: Visual attention guided eye movements for 360 degree images. In: 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp. 506–511 (2017). IEEE
Assens Reina, M., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E.: Saltinet: Scan-path prediction on 360 degree images using saliency volumes. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 2331–2338 (2017)
Zhang, L., Shen, Y., Li, H.: Vsi: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 23(10), 4270–4281 (2014)
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Coors, B., Condurache, A.P., Geiger, A.: Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)
Monroy, R., Lutz, S., Chalasani, T., Smolic, A.: Salnet360: Saliency maps for omni-directional images with CNN. Sig. Process. Image Commun. 69, 26–34 (2018)
Vu, P.V., Chandler, D.M.: Vis 3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imaging 23(1), 013016–013016 (2014)
Acknowledgements
This work was supported in part by the NSFC under Grant 62371279, 62171002, 61901252, 62071287, 62020106011, 62371278, and Science and Technology Commission of Shanghai Municipality under Grant 22ZR1424300.
Author information
Authors and Affiliations
Contributions
Fanxi Yang: contributed to the conception of the study, performed the experiment, and wrote the manuscript text Chao Yang: contributed significantly to the analysis and wrote the manuscript text Ping An: helped perform the analysis with constructive discussions, and reviewed the manuscript. Xinpeng Huang: helped perform the analysis with constructive discussions, and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by Q. Shen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, F., Yang, C., An, P. et al. 360° video quality assessment based on saliency-guided viewport extraction. Multimedia Systems 30, 89 (2024). https://doi.org/10.1007/s00530-024-01285-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01285-0