skip to main content
research-article

Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos

Published:17 February 2023Publication History
Skip Abstract Section

Abstract

Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, rendering, and 3D registration on the areas of interest. However, there is inadequate research to help in understanding the visual explorations of the users when using an AR system or modeling AR visual attention. To bridge the gap between the saliency prediction on real-world scenes and on scenes augmented by virtual information, we construct the ARVR saliency dataset. The virtual reality (VR) technique is employed to simulate the real-world. Annotations of object recognition and tracking as augmented contents are blended into omnidirectional videos. The saliency annotations of head and eye movements for both original and augmented videos are collected and together constitute the ARVR dataset. We also design a model that is capable of solving the saliency prediction problem in AR. Local block images are extracted to simulate the viewport and offset the projection distortion. Conspicuous visual cues in the local block images are extracted to constitute the spatial features. The optical flow information is estimated as an important temporal feature. We also consider the interplay between virtual information and reality. The composition of the augmentation information is distinguished, and the joint effects of adversarial augmentation and complementary augmentation are estimated. The Markov chain is constructed with block images as graph nodes. In the determination of the edge weights, both the characteristics of the viewing behaviors and the visual saliency mechanisms are considered. The order of importance for block images is estimated through the state of equilibrium of the Markov chain. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method.

REFERENCES

  1. [1] Azuma Ronald. 1993. Tracking requirements for augmented reality. Commun. ACM 36, 7 (1993), 5052.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Krevelen D. W. F. Van and Poelman Ronald. 2010. A survey of augmented reality technologies, applications and limitations. Int. J. Virtual Real. 9, 2 (2010), 120.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Furht Borko. 2011. Handbook of Augmented Reality. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Billinghurst Mark and Kato Hirokazu. 2002. Collaborative augmented reality. Commun. ACM 45, 7 (2002), 6470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Kruijff E., Swan J. E., and Feiner S.. 2010. Perceptual issues in augmented reality revisited. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. 312.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Itti L. and Borji A.. 2015. Computational models: Bottom-up and top-down aspects. Retrieved from https://arXiv:cs.CV/1510.07748.Google ScholarGoogle Scholar
  7. [7] Duan Huiyu, Min Xiongkuo, Fang Yi, Fan Lei, Yang Xiaokang, and Zhai Guangtao. 2019. Visual attention analysis and prediction on human faces for children with autism spectrum disorder. ACM Trans. Multimedia Comput. Commun. Appl. 15, 3s (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Jiang Qiuping, Shao Feng, Lin Weisi, and Jiang Gangyi. 2017. Learning sparse representation for objective image retargeting quality assessment. IEEE Trans. Cybernet. 48, 4 (2017), 12761289.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Rai Yashas, Callet Patrick Le, and Guillotel Philippe. 2017. Which saliency weighting for omni directional image quality assessment? In Proceedings of the International Conference on Quality of Multimedia Experience. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Startsev Mikhail and Dorr Michael. 2018. 360-aware saliency estimation with conventional image saliency predictors. Signal Process.: Image Commun. 69 (2018), 4352.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Battisti Federica, Baldoni Sara, Brizzi Michele, and Carli Marco. 2018. A feature-based approach for saliency estimation of omni-directional images. Signal Process.: Image Commun. 69 (2018), 5359.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Ling Jing, Zhang Kao, Zhang Yingxue, Yang Daiqin, and Chen Zhenzhong. 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Process.: Image Commun. 69 (2018), 6068.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Lebreton Pierre and Raake Alexander. 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Process.: Image Commun. 69 (2018), 6978.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Zhu Yucheng, Zhai Guangtao, and Min Xiongkuo. 2018. The prediction of head and eye movement for 360 degree images. Signal Process.: Image Commun. 69 (2018), 1525.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cheng Hsien-Tzu, Chao Chun-Hung, Dong Jin-Dong, Wen Hao-Kai, Liu Tyng-Luh, and Sun Min. 2018. Cube padding for weakly supervised saliency prediction in 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 14201429.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Xu Yanyu, Dong Yanbing, Wu Junru, Sun Zhengzhong, Shi Zhiru, Yu Jingyi, and Gao Shenghua. 2018. Gaze prediction in dynamic \(360^\circ\) immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 53335342.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Shenhav Amitai, Musslick Sebastian, Lieder Falk, Kool Wouter, Griffiths Thomas L., Cohen Jonathan D., Botvinick Matthew M., et al. 2017. Toward a rational and mechanistic account of mental effort. Annu. Rev. Neurosci. 40, 1 (2017), 99124.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Deng Chenwei, Wang Shuigen, Bovik Alan C., Huang Guang-Bin, and Zhao Baojun. 2019. Blind noisy image quality assessment using sub-band kurtosis. IEEE Trans. Cybernet. 50, 3 (2019), 11461156.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Harel Jonathan, Koch Christof, and Perona Pietro. 2007. Graph-based visual saliency. In Advances in Neural Information Processing Systems. MIT Press, 545552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Dataset. 2017. Large-scale scene understanding (LSUN) database. Retrieved from http://salicon.net/challenge-2017/.Google ScholarGoogle Scholar
  21. [21] Dosovitskiy Alexey, Fischer Philipp, Ilg Eddy, Hausser Philip, Hazirbas Caner, Golkov Vladimir, Smagt Patrick Van Der, Cremers Daniel, and Brox Thomas. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 27582766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Abreu Ana De, Ozcinar Cagri, and Smolic Aljosa. 2017. Look around you: Saliency maps for omnidirectional images in VR applications. In Proceedings of the 9th International Conference on Quality of Multimedia Experience. IEEE, 16.Google ScholarGoogle Scholar
  23. [23] Zhu Yucheng, Zhai Guangtao, Min Xiongkuo, and Zhou Jiantao. 2019. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Trans. Multimedia 22, 9 (2019), 2331–2344.Google ScholarGoogle Scholar
  24. [24] Cornia Marcella, Baraldi Lorenzo, Serra Giuseppe, and Cucchiara Rita. 2018. Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27, 10 (2018), 51425154.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Jia Sen and Bruce Neil D. B.. 2020. Eml-net: An expandable multi-layer network for saliency prediction. Image Vision Comput. 95 (2020), 103887.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Wang Wenguan, Shen Jianbing, Guo Fang, Cheng Ming-Ming, and Borji Ali. 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 48944903.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Lai Qiuxia, Wang Wenguan, Sun Hanqiu, and Shen Jianbing. 2019. Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29 (2019), 11131126.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Lai W., Huang Y., Joshi N., Buehler C., Yang M., and Kang S. B.. 2018. Semantic-driven generation of hyperlapse from 360 degree video. IEEE Trans. Visual. Comput. Graph. 24, 9 (2018), 26102621. Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Hu Hou Ning, Lin Yen Chen, Liu Ming Yu, Cheng Hsien Tzu, Chang Yung Ju, and Sun Min. 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 13961405.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Zhu Yucheng, Zhai Guangtao, Min Xiongkuo, and Zhou Jiantao. 2020. Learning a deep agent to predict head movement in 360-degree images. ACM Trans. Multimedia Comput. Commun. Appl. 16, 4 (2020), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Chao Fang-Yi, Ozcinar Cagri, and Smolic Aljosa. 2021. Transformer-based long-term viewport prediction in \(360^\circ\) video: Scanpath is all you need. In Proceedings of the IEEE Workshop on Multimedia Signal Processing. 68.Google ScholarGoogle Scholar
  32. [32] Su Yu-Chuan and Grauman Kristen. 2017. Learning spherical convolution for fast features from \(360^\circ\) imagery. In Advances in Neural Information Processing Systems. MIT Press, 529539.Google ScholarGoogle Scholar
  33. [33] Cohen Taco S., Geiger Mario, Köhler Jonas, and Welling Max. 2018. Spherical CNNs. Retrieved from https://arxiv.org/abs/1801.10130.Google ScholarGoogle Scholar
  34. [34] Li Yunhao, Shen Wei, Gao Zhongpai, Zhu Yucheng, Zhai Guangtao, and Guo Guodong. 2021. Looking here or there? Gaze following in 360-degree images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 37423751.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Martin Daniel, Serrano Ana, Bergman Alexander W., Wetzstein Gordon, and Masia Belen. 2022. ScanGAN360: A generative model of realistic scanpaths for 360° images. IEEE Trans. Visual. Comput. Graph. 28, 5 (2022), 20032013.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Li Jie, Han Ling, Zhang Chong, Li Qiyue, and Liu Zhi. 2022. Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimedia Comput. Commun. Appl. Just Accepted (January 2022). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Xu Mai, Yang Li, Tao Xiaoming, Duan Yiping, and Wang Zulin. 2021. Saliency prediction on omnidirectional image with generative adversarial imitation learning. IEEE Trans. Image Process. 30 (2021), 20872102.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Yang Yiwei, Zhu Yucheng, Gao Zhongpai, and Zhai Guangtao. 2021. SalGFCN: Graph based fully convolutional network for panoramic saliency prediction. In Proceedings of the International Conference on Visual Communications and Image Processing (VCIP’21). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Guastello Stephen J.. 2013. Human Factors Engineering and Ergonomics: A Systems Approach. CRC Press, Boca Raton, FL.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Gutiérrez Jesús, David Erwan, Rai Yashas, and Callet Patrick Le. 2018. Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360 still images. Signal Process.: Image Commun. 69 (2018), 3542.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Carlson Christopher. [n.d.]. How I Made Wine Glasses from Sunflowers. Retrieved from http://blog.wolfram.com/2011/07/28/how-i-made-wine-glasses-from-sunflowers/.Google ScholarGoogle Scholar
  42. [42] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770778.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Bak Cagdas, Kocak Aysun, Erdem Erkut, and Erdem Aykut. 2017. Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20, 7 (2017), 16881698.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Min Xiongkuo, Zhai Guangtao, Gu Ke, and Yang Xiaokang. 2016. Fixation prediction through multimodal analysis. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1 (2016), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Karwowski Waldemar. 2006. International Encyclopedia of Ergonomics and Human Factors. CRC Press, Boca Raton, FL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Brandt Th, Dichgans Jo, and Koenig E.. 1973. Differential effects of central versus peripheral vision on egocentric and exocentric motion perception. Exper. Brain Res. 16, 5 (1973), 476491.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Chang Kung-Ching, Pearson Kelly, and Zhang Tan. 2008. Perron-frobenius theorem for nonnegative tensors. Commun. Math. Sci. 6, 2 (2008), 507520.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Pro Vive. 2019. VIVE Pro Eye: HMD with Precise Eye Tracking. Retrieved from https://enterprise.vive.com/us/product/vive-pro-eye/.Google ScholarGoogle Scholar
  49. [49] Rayner Keith. 1995. Eye movements and cognitive processes in reading, visual search, and scene perception. In Eye Movement Research, Vol. 6. North-Holland, 322.Google ScholarGoogle Scholar
  50. [50] Nguyen Anh and Yan Zhisheng. 2019. A saliency dataset for 360-degree videos. In Proceedings of the 10th ACM Multimedia Systems Conference. ACM, 279284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Bylinskii Zoya, Judd Tilke, Borji Ali, Itti Laurent, Durand Frédo, Oliva Aude, and Torralba Antonio. [n.d.]. MIT Saliency Benchmark. Retrieved from http://saliency.mit.edu/.Google ScholarGoogle Scholar
  52. [52] John Brendan, Raiturkar Pallavi, Meur Olivier Le, and Jain Eakta. 2019. A benchmark of four methods for generating 360 saliency maps from eye tracking data. Int. J. Semant. Comput. 13, 03 (2019), 329341.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Pan Junting, Ferrer Cristian Canton, McGuinness Kevin, O’Connor Noel E., Torres Jordi, Sayrol Elisa, and Nieto Xavier Giro-i. 2017. SalGAN: Visual saliency prediction with generative adversarial networks. Retrieved from https://arxiv.org/abs/1701.01081.Google ScholarGoogle Scholar
  54. [54] Zhang Jianming and Sclaroff Stan. 2013. Saliency detection: A boolean map approach. In Proceedings of the IEEE International Conference on Computer Vision. 153160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Itti Laurent, Koch Christof, and Niebur Ernst. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 11 (1998), 12541259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Cornia Marcella, Baraldi Lorenzo, Serra Giuseppe, and Cucchiara Rita. 2016. A deep multi-level network for saliency prediction. In Proceedings of the International Conference on Pattern Recognition. IEEE, 34883493.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Jiang Ming, Huang Shengsheng, Duan Juanyong, and Zhao Qi. 2015. Salicon: Saliency in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10721080.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Guo Chenlei, Ma Qi, and Zhang Liming. 2008. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 18.Google ScholarGoogle Scholar
  59. [59] Seo Hae Jong and Milanfar Peyman. 2009. Static and space-time visual saliency detection by self-resemblance. J. Vision 9, 12 (2009), 1515.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Linardos Panagiotis, Mohedano Eva, Nieto Juan Jose, O’Connor Noel E., Nieto Xavier Giro-i, and McGuinness Kevin. 2019. Simple vs. complex temporal recurrences for video saliency prediction. Retrieved from https://arXiv:1907.01869.Google ScholarGoogle Scholar
  61. [61] Min Kyle and Corso Jason J.. 2019. TASED-net: Temporally aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 23942403.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Monroy Rafael, Lutz Sebastian, Chalasani Tejo, and Smolic Aljosa. 2018. Salnet360: Saliency maps for omni-directional images with cnn. Signal Process.: Image Commun. 69 (2018), 2634.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zhang Kao and Chen Zhenzhong. 2018. Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circ. Syst. Video Technol. 29, 12 (2018), 35443557.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Judd Tilke, Durand Frédo, and Torralba Antonio. 2012. A Benchmark of Computational Models of Saliency to Predict Human Fxations. MIT tech report, Tech. Rep. http://hdl.handle.net/1721.1/68590.Google ScholarGoogle Scholar
  65. [65] Peters R. J., Iyer A., Itti L., and Koch C.. 2005. Components of bottom-up gaze allocation in natural images.Vision Res. 45, 18 (2005), 23972416.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Tatler Benjamin W., Baddeley Roland J., and Gilchrist Iain D.. 2005. Visual correlates of fixation selection: Effects of scale and time. Vision Res. 45, 5 (2005), 643659.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Jost Timothée, Ouerhani Nabil, Wartburg Roman Von, Müri René, and Hügli Heinz. 2005. Assessing the contribution of color in visual attention. Comput. Vision Image Understand. 100, 1-2 (2005), 107123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Djilali Yasser Abdelaziz Dahou, McGuinness Kevin, and O’Connor Noel E.. 2021. Simple baselines can fool 360deg saliency metrics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 37503756.Google ScholarGoogle Scholar

Index Terms

  1. Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
      April 2023
      545 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572861
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2023
      • Online AM: 29 September 2022
      • Accepted: 25 September 2022
      • Revised: 23 September 2022
      • Received: 30 March 2022
      Published in tomm Volume 19, Issue 2s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format