Pseudo-global strategy-based visual comfort assessment considering attention mechanism

Li, Sumei; Zhang, Huilin; Zhou, Mingyue

doi:10.1007/s00530-024-01570-y

Pseudo-global strategy-based visual comfort assessment considering attention mechanism

Regular Paper
Published: 22 November 2024

Volume 30, article number 356, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Sumei Li¹,
Huilin Zhang¹ &
Mingyue Zhou¹

63 Accesses
Explore all metrics

Abstract

Assessing the comfort of stereo images contributes significantly to crafting immersive stereo scenes, thereby enriching the viewer’s perceptual experience. However, deep learning-based visual comfort assessment (VCA) has encountered challenges due to data deficiency. To address this problem and maximize the potential of deep learning in the VCA task, this paper proposes a pseudo-global strategy-based convolutional neural network (CNN), considering the attention mechanism. Our data augmentation method utilizes random cropping and permutation, coupled with a pseudo-global strategy that fuses multi-region local features as pseudo-global features to substitute global features, effectively expanding databases while aligning input patches and labels during training. We also introduce attention mechanisms to focus on the different impacts of disparities in various regions on the overall comfort of a stereo image. Specifically, dilated spatial attention and channel self-attention are designed in the local and pseudo-global feature extraction stages, respectively, simulating the saliency of human perception. Experimental results show that the proposed method is superior to the state-of-the-art VCA approaches and has excellent generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task visual discomfort prediction model for stereoscopic images based on multi-view feature representation

Article 26 September 2022

Visual comfort prediction for stereoscopic image using stereoscopic visual saliency

Article 19 November 2016

Leveraging visual attention and neural activity for stereoscopic 3D visual comfort assessment

Article 22 April 2016

Data availibility

No datasets were generated or analysed during the current study.

References

Urvoy, M., Barkowsky, M., Le Callet, P.: How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors. Ann. Telecommun. 68(11–12), 641–655 (2013). https://doi.org/10.1007/s12243-013-0394-3
Article Google Scholar
Hoffman, D.M., Girshick, A.R., Akeley, K., Banks, M.S.: Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. J. Vis. (2008). https://doi.org/10.1167/8.3.33
Article Google Scholar
Park, J., Oh, H., Lee, S., Bovik, A.C.: 3D visual discomfort predictor: analysis of disparity and neural activity statistics. IEEE Trans. Image Process. 24(3), 1101–1114 (2015). https://doi.org/10.1109/TIP.2014.2383327
Article MathSciNet Google Scholar
Oh, H., Lee, S., Bovik, A.C.: Stereoscopic 3D visual discomfort prediction: a dynamic accommodation and vergence interaction model. IEEE Trans. Image Process. 25(2), 615–629 (2016). https://doi.org/10.1109/TIP.2015.2506340
Article MathSciNet Google Scholar
Kim, T., Lee, S., Bovik, A.C.: Transfer function model of physiological mechanisms underlying temporal visual discomfort experienced when viewing stereoscopic 3D images. IEEE Trans. Image Process. 24(11), 4335–4347 (2015). https://doi.org/10.1109/TIP.2015.2462026
Article MathSciNet Google Scholar
Yang, F., Yang, C., An, P., Huang, X.: $360^\circ$ video quality assessment based on saliency-guided viewport extraction. Multimed. Syst. (2024). https://doi.org/10.1007/s00530-024-01285-0
Article Google Scholar
Dumic, E., Sakic, K., Silva Cruz, L.A.: Crowdsourced subjective 3D video quality assessment. Multimed. Syst. 25, 673–694 (2019). https://doi.org/10.1007/s00530-019-00619-7
Article Google Scholar
Shi, J., Gao, P., Qin, J.: Transformer-based no-reference image quality assessment via supervised contrastive learning. Proc. AAAI Conf. Artif. Intell. 38(5), 4829–4837 (2024). https://doi.org/10.1609/aaai.v38i5.28285
Article Google Scholar
Nojiri, Y., Yamanoue, H., Hanazato, A., Okano, F.: Measurement of parallax distribution, and its application to the analysis of visual comfort for stereoscopic HDTV, vol. 5006. Santa Clara, pp. 195–205 (2003). Phase correlation. https://doi.org/10.1117/12.474146
Choi, J., Kim, D., Choi, S., Sohn, K.: Visual fatigue modeling and analysis for stereoscopic video. Opt. Eng. (2012). https://doi.org/10.1117/1.OE.51.1.017206
Article Google Scholar
Kim, D., Sohn, K.: Visual fatigue prediction for stereoscopic image. IEEE Trans. Circuits Syst. Video Technol. 21(2), 231–236 (2011). https://doi.org/10.1109/TCSVT.2011.2106275
Article Google Scholar
Su, Z.-B., Li, D.-R., Li, B., Ren, H.: Objective visual comfort assessment model of stereoscopic images based on bp neural network. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), pp. 426–431 (2018). https://doi.org/10.1109/ICACI.2018.8377497
Yue, G., Cheng, D., Li, L., Zhou, T., Liu, H., Wang, T.: Semi-supervised authentically distorted image quality assessment with consistency-preserving dual-branch convolutional neural network. IEEE Trans. Multimed. 25, 6499–6511 (2023). https://doi.org/10.1109/TMM.2022.3209889
Article Google Scholar
Oh, H., Ahn, S., Lee, S., Bovik, A.C.: Deep visual discomfort predictor for stereoscopic 3D images. IEEE Trans. Image Process. 27(11), 5420–5432 (2018). https://doi.org/10.1109/TIP.2018.2851670
Article MathSciNet Google Scholar
Kim, H.G., Jeong, H., Lim, H.-T., Ro, Y.M.: Binocular fusion net: deep learning visual comfort assessment for stereoscopic 3D. IEEE Trans. Circuits Syst. Video Technol. 29(4), 956–967 (2019). https://doi.org/10.1109/TCSVT.2018.2817250
Article Google Scholar
Sohn, H., Jung, Y.J., Lee, S.-I., Park, H.W., Ro, Y.M.: Attention model-based visual comfort assessment for stereoscopic depth perception. (2011). https://doi.org/10.1109/ICDSP.2011.6004985
Jung, Y.J., Lee, S.-I., Sohn, H., Park, H.W., Ro, Y.M.: Visual comfort assessment metric based on salient object motion information in stereoscopic video. J. Electron. Imaging (2012). https://doi.org/10.1117/1.JEI.21.1.011008
Article Google Scholar
Jung, Y.J., Sohn, H., Lee, S.-I., Park, H.W., Ro, Y.M.: Predicting visual discomfort of stereoscopic images using human attention model. IEEE Trans. Circuits Syst. Video Technol. 23(12), 2077–2082 (2013). https://doi.org/10.1109/TCSVT.2013.2270394
Article Google Scholar
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision—ECCV 2012, pp. 611–625. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44
Chapter Google Scholar
Burt, P., Julesz, B.: A disparity gradient limit for binocular fusion. Science 208(4444), 615–617 (1980). https://doi.org/10.1126/science.7367885
Article Google Scholar
Thatte, J., Boin, J.-B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016). https://doi.org/10.1109/ICME.2016.7552858
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008). https://doi.org/10.1109/TPAMI.2007.1166
Article Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Computer Vision—ECCV 2018, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Jiang, Q., Shao, F., Jiang, G., Yu, M., Peng, Z.: Three-dimensional visual comfort assessment via preference learning. J. Electron. Imaging (2015). https://doi.org/10.1117/1.JEI.24.4.043002
Article Google Scholar
Methodology for the subjective assessment of the quality of television pictures (ITU-R BT.500-11, 2002). https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-11-200206-S!!PDF-E.pdf
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster). (2015). arXiv:1412.6980
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
Jung, C., Liu, H., Cui, Y.: Visual comfort assessment for stereoscopic 3D images based on salient discomfort regions. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4047–4051 (2015). https://doi.org/10.1109/ICIP.2015.7351566
Ying, H., Jiang, G., Yu, M., Shao, F., Peng, Z., Yang, Y.: New stereo visual comfort assessment method based on scene mode classification. In: 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2015). https://doi.org/10.1109/QoMEX.2015.7148082
Xu, H., Jiang, G., Yu, M., Luo, T., Peng, Z., Shao, F., Jiang, H.: 3D visual discomfort predictor based on subjective perceived-constraint sparse representation in 3D display system. Futur. Gener. Comput. Syst. 83, 85–94 (2018). https://doi.org/10.1016/j.future.2018.01.021
Article Google Scholar
Jiang, Q., Shao, F., Gao, W., Li, H., Ho, Y.-S.: A risk-aware pairwise rank learning approach for visual discomfort prediction of stereoscopic 3D. IEEE Signal Process. Lett. 26(11), 1588–1592 (2019). https://doi.org/10.1109/LSP.2019.2940105
Article Google Scholar
Su, Z., Li, D., Liu, B., Li, W., Ren, H.: A visual comfort assessment approach of stereoscopic images based on random forest regressor. In: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol. 1, pp. 1456–1461 (2020). https://doi.org/10.1109/ITNEC48623.2020.9085021
Karimi, M., Nejati, M., Lin, W.: Bi-disparity sparse feature learning for 3D visual discomfort prediction. Signal Process. (2021). https://doi.org/10.1016/j.sigpro.2021.108179
Article Google Scholar
Yang, J., Nguyen, V., Sim, K., Zhao, Y., Lu, W.: 3-D visual discomfort assessment considering optical and neural attention models. IEEE Trans. Broadcast. 66(2), 279–291 (2020). https://doi.org/10.1109/TBC.2019.2932293
Article Google Scholar
Sun, H., Quan, W., Liang, Z., Zheng, M.: Comfort assessment of stereo images considering edge objects. In: 2023 4th International Conference on Computer Engineering and Application (ICCEA), pp. 879–883 (2023). https://doi.org/10.1109/ICCEA58433.2023.10135540
Jiang, Q., Shao, F., Lin, W., Jiang, G.: On predicting visual comfort of stereoscopic images: a learning to rank based approach. IEEE Signal Process. Lett. 23(2), 302–306 (2016). https://doi.org/10.1109/LSP.2016.2516521
Article Google Scholar
Zhou, Y., Yu, W., Li, Z., Yin, H.: Stereoscopic visual discomfort prediction using multi-scale DCT features. In: Proceedings of the 27th ACM International Conference on Multimedia. MM ’19, pp. 184–191. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3343031.3350848
Zhou, Y., Chen, P., Yin, H., Huang, X., Li, Z.: Stereoscopic image discomfort prediction using dual-stream multi-level interactive network. Displays 78, 102444 (2023). https://doi.org/10.1016/j.displa.2023.102444
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–519 (2019). https://doi.org/10.1109/CVPR.2019.00060
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Shan, Y., Hu, D., Wang, Z.: A novel truncated norm regularization method for multi-channel color image denoising. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3382306
Article Google Scholar
Liu, Y., Yan, Z., Tan, J., Li, Y.: Multi-purpose oriented single nighttime image haze removal based on unified variational Retinex model. IEEE Trans. Circuits Syst. Video Technol. 33(4), 1643–1657 (2023). https://doi.org/10.1109/TCSVT.2022.3214430
Article Google Scholar
Liu, Y., Yan, Z., Chen, S., Ye, T., Ren, W., Chen, E.: NightHazeFormer: single nighttime haze removal using prior query transformer. In: Proceedings of the 31st ACM International Conference on Multimedia. MM ’23, pp. 4119–4128. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3581783.3611744

Download references

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin, 300072, China
Sumei Li, Huilin Zhang & Mingyue Zhou

Authors

Sumei Li
View author publications
You can also search for this author inPubMed Google Scholar
Huilin Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Mingyue Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Sumei Li: Methodology, Resources, Supervision, Reviewing & Editing. Huilin Zhang: Conceptualization, Implementation of the code, Conducting a research and investigation process, Writing-Original draft preparation. Mingyue Zhou: Validation, Visualization.

Corresponding author

Correspondence to Huilin Zhang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, S., Zhang, H. & Zhou, M. Pseudo-global strategy-based visual comfort assessment considering attention mechanism. Multimedia Systems 30, 356 (2024). https://doi.org/10.1007/s00530-024-01570-y

Download citation

Received: 03 May 2024
Accepted: 07 November 2024
Published: 22 November 2024
DOI: https://doi.org/10.1007/s00530-024-01570-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pseudo-global strategy-based visual comfort assessment considering attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-task visual discomfort prediction model for stereoscopic images based on multi-view feature representation

Visual comfort prediction for stereoscopic image using stereoscopic visual saliency

Leveraging visual attention and neural activity for stereoscopic 3D visual comfort assessment

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now