Abstract
Assessing the comfort of stereo images contributes significantly to crafting immersive stereo scenes, thereby enriching the viewer’s perceptual experience. However, deep learning-based visual comfort assessment (VCA) has encountered challenges due to data deficiency. To address this problem and maximize the potential of deep learning in the VCA task, this paper proposes a pseudo-global strategy-based convolutional neural network (CNN), considering the attention mechanism. Our data augmentation method utilizes random cropping and permutation, coupled with a pseudo-global strategy that fuses multi-region local features as pseudo-global features to substitute global features, effectively expanding databases while aligning input patches and labels during training. We also introduce attention mechanisms to focus on the different impacts of disparities in various regions on the overall comfort of a stereo image. Specifically, dilated spatial attention and channel self-attention are designed in the local and pseudo-global feature extraction stages, respectively, simulating the saliency of human perception. Experimental results show that the proposed method is superior to the state-of-the-art VCA approaches and has excellent generalization ability.





Similar content being viewed by others
Data availibility
No datasets were generated or analysed during the current study.
References
Urvoy, M., Barkowsky, M., Le Callet, P.: How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors. Ann. Telecommun. 68(11–12), 641–655 (2013). https://doi.org/10.1007/s12243-013-0394-3
Hoffman, D.M., Girshick, A.R., Akeley, K., Banks, M.S.: Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. J. Vis. (2008). https://doi.org/10.1167/8.3.33
Park, J., Oh, H., Lee, S., Bovik, A.C.: 3D visual discomfort predictor: analysis of disparity and neural activity statistics. IEEE Trans. Image Process. 24(3), 1101–1114 (2015). https://doi.org/10.1109/TIP.2014.2383327
Oh, H., Lee, S., Bovik, A.C.: Stereoscopic 3D visual discomfort prediction: a dynamic accommodation and vergence interaction model. IEEE Trans. Image Process. 25(2), 615–629 (2016). https://doi.org/10.1109/TIP.2015.2506340
Kim, T., Lee, S., Bovik, A.C.: Transfer function model of physiological mechanisms underlying temporal visual discomfort experienced when viewing stereoscopic 3D images. IEEE Trans. Image Process. 24(11), 4335–4347 (2015). https://doi.org/10.1109/TIP.2015.2462026
Yang, F., Yang, C., An, P., Huang, X.: \(360^\circ\) video quality assessment based on saliency-guided viewport extraction. Multimed. Syst. (2024). https://doi.org/10.1007/s00530-024-01285-0
Dumic, E., Sakic, K., Silva Cruz, L.A.: Crowdsourced subjective 3D video quality assessment. Multimed. Syst. 25, 673–694 (2019). https://doi.org/10.1007/s00530-019-00619-7
Shi, J., Gao, P., Qin, J.: Transformer-based no-reference image quality assessment via supervised contrastive learning. Proc. AAAI Conf. Artif. Intell. 38(5), 4829–4837 (2024). https://doi.org/10.1609/aaai.v38i5.28285
Nojiri, Y., Yamanoue, H., Hanazato, A., Okano, F.: Measurement of parallax distribution, and its application to the analysis of visual comfort for stereoscopic HDTV, vol. 5006. Santa Clara, pp. 195–205 (2003). Phase correlation. https://doi.org/10.1117/12.474146
Choi, J., Kim, D., Choi, S., Sohn, K.: Visual fatigue modeling and analysis for stereoscopic video. Opt. Eng. (2012). https://doi.org/10.1117/1.OE.51.1.017206
Kim, D., Sohn, K.: Visual fatigue prediction for stereoscopic image. IEEE Trans. Circuits Syst. Video Technol. 21(2), 231–236 (2011). https://doi.org/10.1109/TCSVT.2011.2106275
Su, Z.-B., Li, D.-R., Li, B., Ren, H.: Objective visual comfort assessment model of stereoscopic images based on bp neural network. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), pp. 426–431 (2018). https://doi.org/10.1109/ICACI.2018.8377497
Yue, G., Cheng, D., Li, L., Zhou, T., Liu, H., Wang, T.: Semi-supervised authentically distorted image quality assessment with consistency-preserving dual-branch convolutional neural network. IEEE Trans. Multimed. 25, 6499–6511 (2023). https://doi.org/10.1109/TMM.2022.3209889
Oh, H., Ahn, S., Lee, S., Bovik, A.C.: Deep visual discomfort predictor for stereoscopic 3D images. IEEE Trans. Image Process. 27(11), 5420–5432 (2018). https://doi.org/10.1109/TIP.2018.2851670
Kim, H.G., Jeong, H., Lim, H.-T., Ro, Y.M.: Binocular fusion net: deep learning visual comfort assessment for stereoscopic 3D. IEEE Trans. Circuits Syst. Video Technol. 29(4), 956–967 (2019). https://doi.org/10.1109/TCSVT.2018.2817250
Sohn, H., Jung, Y.J., Lee, S.-I., Park, H.W., Ro, Y.M.: Attention model-based visual comfort assessment for stereoscopic depth perception. (2011). https://doi.org/10.1109/ICDSP.2011.6004985
Jung, Y.J., Lee, S.-I., Sohn, H., Park, H.W., Ro, Y.M.: Visual comfort assessment metric based on salient object motion information in stereoscopic video. J. Electron. Imaging (2012). https://doi.org/10.1117/1.JEI.21.1.011008
Jung, Y.J., Sohn, H., Lee, S.-I., Park, H.W., Ro, Y.M.: Predicting visual discomfort of stereoscopic images using human attention model. IEEE Trans. Circuits Syst. Video Technol. 23(12), 2077–2082 (2013). https://doi.org/10.1109/TCSVT.2013.2270394
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision—ECCV 2012, pp. 611–625. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44
Burt, P., Julesz, B.: A disparity gradient limit for binocular fusion. Science 208(4444), 615–617 (1980). https://doi.org/10.1126/science.7367885
Thatte, J., Boin, J.-B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016). https://doi.org/10.1109/ICME.2016.7552858
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008). https://doi.org/10.1109/TPAMI.2007.1166
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Computer Vision—ECCV 2018, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Jiang, Q., Shao, F., Jiang, G., Yu, M., Peng, Z.: Three-dimensional visual comfort assessment via preference learning. J. Electron. Imaging (2015). https://doi.org/10.1117/1.JEI.24.4.043002
Methodology for the subjective assessment of the quality of television pictures (ITU-R BT.500-11, 2002). https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-11-200206-S!!PDF-E.pdf
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster). (2015). arXiv:1412.6980
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
Jung, C., Liu, H., Cui, Y.: Visual comfort assessment for stereoscopic 3D images based on salient discomfort regions. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4047–4051 (2015). https://doi.org/10.1109/ICIP.2015.7351566
Ying, H., Jiang, G., Yu, M., Shao, F., Peng, Z., Yang, Y.: New stereo visual comfort assessment method based on scene mode classification. In: 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2015). https://doi.org/10.1109/QoMEX.2015.7148082
Xu, H., Jiang, G., Yu, M., Luo, T., Peng, Z., Shao, F., Jiang, H.: 3D visual discomfort predictor based on subjective perceived-constraint sparse representation in 3D display system. Futur. Gener. Comput. Syst. 83, 85–94 (2018). https://doi.org/10.1016/j.future.2018.01.021
Jiang, Q., Shao, F., Gao, W., Li, H., Ho, Y.-S.: A risk-aware pairwise rank learning approach for visual discomfort prediction of stereoscopic 3D. IEEE Signal Process. Lett. 26(11), 1588–1592 (2019). https://doi.org/10.1109/LSP.2019.2940105
Su, Z., Li, D., Liu, B., Li, W., Ren, H.: A visual comfort assessment approach of stereoscopic images based on random forest regressor. In: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol. 1, pp. 1456–1461 (2020). https://doi.org/10.1109/ITNEC48623.2020.9085021
Karimi, M., Nejati, M., Lin, W.: Bi-disparity sparse feature learning for 3D visual discomfort prediction. Signal Process. (2021). https://doi.org/10.1016/j.sigpro.2021.108179
Yang, J., Nguyen, V., Sim, K., Zhao, Y., Lu, W.: 3-D visual discomfort assessment considering optical and neural attention models. IEEE Trans. Broadcast. 66(2), 279–291 (2020). https://doi.org/10.1109/TBC.2019.2932293
Sun, H., Quan, W., Liang, Z., Zheng, M.: Comfort assessment of stereo images considering edge objects. In: 2023 4th International Conference on Computer Engineering and Application (ICCEA), pp. 879–883 (2023). https://doi.org/10.1109/ICCEA58433.2023.10135540
Jiang, Q., Shao, F., Lin, W., Jiang, G.: On predicting visual comfort of stereoscopic images: a learning to rank based approach. IEEE Signal Process. Lett. 23(2), 302–306 (2016). https://doi.org/10.1109/LSP.2016.2516521
Zhou, Y., Yu, W., Li, Z., Yin, H.: Stereoscopic visual discomfort prediction using multi-scale DCT features. In: Proceedings of the 27th ACM International Conference on Multimedia. MM ’19, pp. 184–191. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3343031.3350848
Zhou, Y., Chen, P., Yin, H., Huang, X., Li, Z.: Stereoscopic image discomfort prediction using dual-stream multi-level interactive network. Displays 78, 102444 (2023). https://doi.org/10.1016/j.displa.2023.102444
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–519 (2019). https://doi.org/10.1109/CVPR.2019.00060
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Shan, Y., Hu, D., Wang, Z.: A novel truncated norm regularization method for multi-channel color image denoising. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3382306
Liu, Y., Yan, Z., Tan, J., Li, Y.: Multi-purpose oriented single nighttime image haze removal based on unified variational Retinex model. IEEE Trans. Circuits Syst. Video Technol. 33(4), 1643–1657 (2023). https://doi.org/10.1109/TCSVT.2022.3214430
Liu, Y., Yan, Z., Chen, S., Ye, T., Ren, W., Chen, E.: NightHazeFormer: single nighttime haze removal using prior query transformer. In: Proceedings of the 31st ACM International Conference on Multimedia. MM ’23, pp. 4119–4128. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3581783.3611744
Author information
Authors and Affiliations
Contributions
Sumei Li: Methodology, Resources, Supervision, Reviewing & Editing. Huilin Zhang: Conceptualization, Implementation of the code, Conducting a research and investigation process, Writing-Original draft preparation. Mingyue Zhou: Validation, Visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Zhang, H. & Zhou, M. Pseudo-global strategy-based visual comfort assessment considering attention mechanism. Multimedia Systems 30, 356 (2024). https://doi.org/10.1007/s00530-024-01570-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01570-y