Skip to main content
Log in

Pseudo-global strategy-based visual comfort assessment considering attention mechanism

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Assessing the comfort of stereo images contributes significantly to crafting immersive stereo scenes, thereby enriching the viewer’s perceptual experience. However, deep learning-based visual comfort assessment (VCA) has encountered challenges due to data deficiency. To address this problem and maximize the potential of deep learning in the VCA task, this paper proposes a pseudo-global strategy-based convolutional neural network (CNN), considering the attention mechanism. Our data augmentation method utilizes random cropping and permutation, coupled with a pseudo-global strategy that fuses multi-region local features as pseudo-global features to substitute global features, effectively expanding databases while aligning input patches and labels during training. We also introduce attention mechanisms to focus on the different impacts of disparities in various regions on the overall comfort of a stereo image. Specifically, dilated spatial attention and channel self-attention are designed in the local and pseudo-global feature extraction stages, respectively, simulating the saliency of human perception. Experimental results show that the proposed method is superior to the state-of-the-art VCA approaches and has excellent generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availibility

No datasets were generated or analysed during the current study.

References

  1. Urvoy, M., Barkowsky, M., Le Callet, P.: How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors. Ann. Telecommun. 68(11–12), 641–655 (2013). https://doi.org/10.1007/s12243-013-0394-3

    Article  Google Scholar 

  2. Hoffman, D.M., Girshick, A.R., Akeley, K., Banks, M.S.: Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. J. Vis. (2008). https://doi.org/10.1167/8.3.33

    Article  Google Scholar 

  3. Park, J., Oh, H., Lee, S., Bovik, A.C.: 3D visual discomfort predictor: analysis of disparity and neural activity statistics. IEEE Trans. Image Process. 24(3), 1101–1114 (2015). https://doi.org/10.1109/TIP.2014.2383327

    Article  MathSciNet  Google Scholar 

  4. Oh, H., Lee, S., Bovik, A.C.: Stereoscopic 3D visual discomfort prediction: a dynamic accommodation and vergence interaction model. IEEE Trans. Image Process. 25(2), 615–629 (2016). https://doi.org/10.1109/TIP.2015.2506340

    Article  MathSciNet  Google Scholar 

  5. Kim, T., Lee, S., Bovik, A.C.: Transfer function model of physiological mechanisms underlying temporal visual discomfort experienced when viewing stereoscopic 3D images. IEEE Trans. Image Process. 24(11), 4335–4347 (2015). https://doi.org/10.1109/TIP.2015.2462026

    Article  MathSciNet  Google Scholar 

  6. Yang, F., Yang, C., An, P., Huang, X.: \(360^\circ\) video quality assessment based on saliency-guided viewport extraction. Multimed. Syst. (2024). https://doi.org/10.1007/s00530-024-01285-0

    Article  Google Scholar 

  7. Dumic, E., Sakic, K., Silva Cruz, L.A.: Crowdsourced subjective 3D video quality assessment. Multimed. Syst. 25, 673–694 (2019). https://doi.org/10.1007/s00530-019-00619-7

    Article  Google Scholar 

  8. Shi, J., Gao, P., Qin, J.: Transformer-based no-reference image quality assessment via supervised contrastive learning. Proc. AAAI Conf. Artif. Intell. 38(5), 4829–4837 (2024). https://doi.org/10.1609/aaai.v38i5.28285

    Article  Google Scholar 

  9. Nojiri, Y., Yamanoue, H., Hanazato, A., Okano, F.: Measurement of parallax distribution, and its application to the analysis of visual comfort for stereoscopic HDTV, vol. 5006. Santa Clara, pp. 195–205 (2003). Phase correlation. https://doi.org/10.1117/12.474146

  10. Choi, J., Kim, D., Choi, S., Sohn, K.: Visual fatigue modeling and analysis for stereoscopic video. Opt. Eng. (2012). https://doi.org/10.1117/1.OE.51.1.017206

    Article  Google Scholar 

  11. Kim, D., Sohn, K.: Visual fatigue prediction for stereoscopic image. IEEE Trans. Circuits Syst. Video Technol. 21(2), 231–236 (2011). https://doi.org/10.1109/TCSVT.2011.2106275

    Article  Google Scholar 

  12. Su, Z.-B., Li, D.-R., Li, B., Ren, H.: Objective visual comfort assessment model of stereoscopic images based on bp neural network. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), pp. 426–431 (2018). https://doi.org/10.1109/ICACI.2018.8377497

  13. Yue, G., Cheng, D., Li, L., Zhou, T., Liu, H., Wang, T.: Semi-supervised authentically distorted image quality assessment with consistency-preserving dual-branch convolutional neural network. IEEE Trans. Multimed. 25, 6499–6511 (2023). https://doi.org/10.1109/TMM.2022.3209889

    Article  Google Scholar 

  14. Oh, H., Ahn, S., Lee, S., Bovik, A.C.: Deep visual discomfort predictor for stereoscopic 3D images. IEEE Trans. Image Process. 27(11), 5420–5432 (2018). https://doi.org/10.1109/TIP.2018.2851670

    Article  MathSciNet  Google Scholar 

  15. Kim, H.G., Jeong, H., Lim, H.-T., Ro, Y.M.: Binocular fusion net: deep learning visual comfort assessment for stereoscopic 3D. IEEE Trans. Circuits Syst. Video Technol. 29(4), 956–967 (2019). https://doi.org/10.1109/TCSVT.2018.2817250

    Article  Google Scholar 

  16. Sohn, H., Jung, Y.J., Lee, S.-I., Park, H.W., Ro, Y.M.: Attention model-based visual comfort assessment for stereoscopic depth perception. (2011). https://doi.org/10.1109/ICDSP.2011.6004985

  17. Jung, Y.J., Lee, S.-I., Sohn, H., Park, H.W., Ro, Y.M.: Visual comfort assessment metric based on salient object motion information in stereoscopic video. J. Electron. Imaging (2012). https://doi.org/10.1117/1.JEI.21.1.011008

    Article  Google Scholar 

  18. Jung, Y.J., Sohn, H., Lee, S.-I., Park, H.W., Ro, Y.M.: Predicting visual discomfort of stereoscopic images using human attention model. IEEE Trans. Circuits Syst. Video Technol. 23(12), 2077–2082 (2013). https://doi.org/10.1109/TCSVT.2013.2270394

    Article  Google Scholar 

  19. Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision—ECCV 2012, pp. 611–625. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44

    Chapter  Google Scholar 

  20. Burt, P., Julesz, B.: A disparity gradient limit for binocular fusion. Science 208(4444), 615–617 (1980). https://doi.org/10.1126/science.7367885

    Article  Google Scholar 

  21. Thatte, J., Boin, J.-B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016). https://doi.org/10.1109/ICME.2016.7552858

  22. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008). https://doi.org/10.1109/TPAMI.2007.1166

    Article  Google Scholar 

  23. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Computer Vision—ECCV 2018, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  24. Jiang, Q., Shao, F., Jiang, G., Yu, M., Peng, Z.: Three-dimensional visual comfort assessment via preference learning. J. Electron. Imaging (2015). https://doi.org/10.1117/1.JEI.24.4.043002

    Article  Google Scholar 

  25. Methodology for the subjective assessment of the quality of television pictures (ITU-R BT.500-11, 2002). https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-11-200206-S!!PDF-E.pdf

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster). (2015). arXiv:1412.6980

  27. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

  28. Jung, C., Liu, H., Cui, Y.: Visual comfort assessment for stereoscopic 3D images based on salient discomfort regions. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4047–4051 (2015). https://doi.org/10.1109/ICIP.2015.7351566

  29. Ying, H., Jiang, G., Yu, M., Shao, F., Peng, Z., Yang, Y.: New stereo visual comfort assessment method based on scene mode classification. In: 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2015). https://doi.org/10.1109/QoMEX.2015.7148082

  30. Xu, H., Jiang, G., Yu, M., Luo, T., Peng, Z., Shao, F., Jiang, H.: 3D visual discomfort predictor based on subjective perceived-constraint sparse representation in 3D display system. Futur. Gener. Comput. Syst. 83, 85–94 (2018). https://doi.org/10.1016/j.future.2018.01.021

    Article  Google Scholar 

  31. Jiang, Q., Shao, F., Gao, W., Li, H., Ho, Y.-S.: A risk-aware pairwise rank learning approach for visual discomfort prediction of stereoscopic 3D. IEEE Signal Process. Lett. 26(11), 1588–1592 (2019). https://doi.org/10.1109/LSP.2019.2940105

    Article  Google Scholar 

  32. Su, Z., Li, D., Liu, B., Li, W., Ren, H.: A visual comfort assessment approach of stereoscopic images based on random forest regressor. In: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol. 1, pp. 1456–1461 (2020). https://doi.org/10.1109/ITNEC48623.2020.9085021

  33. Karimi, M., Nejati, M., Lin, W.: Bi-disparity sparse feature learning for 3D visual discomfort prediction. Signal Process. (2021). https://doi.org/10.1016/j.sigpro.2021.108179

    Article  Google Scholar 

  34. Yang, J., Nguyen, V., Sim, K., Zhao, Y., Lu, W.: 3-D visual discomfort assessment considering optical and neural attention models. IEEE Trans. Broadcast. 66(2), 279–291 (2020). https://doi.org/10.1109/TBC.2019.2932293

    Article  Google Scholar 

  35. Sun, H., Quan, W., Liang, Z., Zheng, M.: Comfort assessment of stereo images considering edge objects. In: 2023 4th International Conference on Computer Engineering and Application (ICCEA), pp. 879–883 (2023). https://doi.org/10.1109/ICCEA58433.2023.10135540

  36. Jiang, Q., Shao, F., Lin, W., Jiang, G.: On predicting visual comfort of stereoscopic images: a learning to rank based approach. IEEE Signal Process. Lett. 23(2), 302–306 (2016). https://doi.org/10.1109/LSP.2016.2516521

    Article  Google Scholar 

  37. Zhou, Y., Yu, W., Li, Z., Yin, H.: Stereoscopic visual discomfort prediction using multi-scale DCT features. In: Proceedings of the 27th ACM International Conference on Multimedia. MM ’19, pp. 184–191. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3343031.3350848

  38. Zhou, Y., Chen, P., Yin, H., Huang, X., Li, Z.: Stereoscopic image discomfort prediction using dual-stream multi-level interactive network. Displays 78, 102444 (2023). https://doi.org/10.1016/j.displa.2023.102444

    Article  Google Scholar 

  39. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  40. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–519 (2019). https://doi.org/10.1109/CVPR.2019.00060

  41. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155

  42. Shan, Y., Hu, D., Wang, Z.: A novel truncated norm regularization method for multi-channel color image denoising. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3382306

    Article  Google Scholar 

  43. Liu, Y., Yan, Z., Tan, J., Li, Y.: Multi-purpose oriented single nighttime image haze removal based on unified variational Retinex model. IEEE Trans. Circuits Syst. Video Technol. 33(4), 1643–1657 (2023). https://doi.org/10.1109/TCSVT.2022.3214430

    Article  Google Scholar 

  44. Liu, Y., Yan, Z., Chen, S., Ye, T., Ren, W., Chen, E.: NightHazeFormer: single nighttime haze removal using prior query transformer. In: Proceedings of the 31st ACM International Conference on Multimedia. MM ’23, pp. 4119–4128. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3581783.3611744

Download references

Author information

Authors and Affiliations

Authors

Contributions

Sumei Li: Methodology, Resources, Supervision, Reviewing & Editing. Huilin Zhang: Conceptualization, Implementation of the code, Conducting a research and investigation process, Writing-Original draft preparation. Mingyue Zhou: Validation, Visualization.

Corresponding author

Correspondence to Huilin Zhang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Zhang, H. & Zhou, M. Pseudo-global strategy-based visual comfort assessment considering attention mechanism. Multimedia Systems 30, 356 (2024). https://doi.org/10.1007/s00530-024-01570-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01570-y

Keywords