Abstract
Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, rendering, and 3D registration on the areas of interest. However, there is inadequate research to help in understanding the visual explorations of the users when using an AR system or modeling AR visual attention. To bridge the gap between the saliency prediction on real-world scenes and on scenes augmented by virtual information, we construct the ARVR saliency dataset. The virtual reality (VR) technique is employed to simulate the real-world. Annotations of object recognition and tracking as augmented contents are blended into omnidirectional videos. The saliency annotations of head and eye movements for both original and augmented videos are collected and together constitute the ARVR dataset. We also design a model that is capable of solving the saliency prediction problem in AR. Local block images are extracted to simulate the viewport and offset the projection distortion. Conspicuous visual cues in the local block images are extracted to constitute the spatial features. The optical flow information is estimated as an important temporal feature. We also consider the interplay between virtual information and reality. The composition of the augmentation information is distinguished, and the joint effects of adversarial augmentation and complementary augmentation are estimated. The Markov chain is constructed with block images as graph nodes. In the determination of the edge weights, both the characteristics of the viewing behaviors and the visual saliency mechanisms are considered. The order of importance for block images is estimated through the state of equilibrium of the Markov chain. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method.
- [1] . 1993. Tracking requirements for augmented reality. Commun. ACM 36, 7 (1993), 50–52.Google ScholarDigital Library
- [2] . 2010. A survey of augmented reality technologies, applications and limitations. Int. J. Virtual Real. 9, 2 (2010), 1–20.Google ScholarCross Ref
- [3] . 2011. Handbook of Augmented Reality. Springer Science & Business Media.Google ScholarCross Ref
- [4] . 2002. Collaborative augmented reality. Commun. ACM 45, 7 (2002), 64–70.Google ScholarDigital Library
- [5] . 2010. Perceptual issues in augmented reality revisited. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. 3–12.Google ScholarCross Ref
- [6] . 2015. Computational models: Bottom-up and top-down aspects. Retrieved from https://arXiv:cs.CV/1510.07748.Google Scholar
- [7] . 2019. Visual attention analysis and prediction on human faces for children with autism spectrum disorder. ACM Trans. Multimedia Comput. Commun. Appl. 15, 3s (2019), 1–23.Google ScholarDigital Library
- [8] . 2017. Learning sparse representation for objective image retargeting quality assessment. IEEE Trans. Cybernet. 48, 4 (2017), 1276–1289.Google ScholarCross Ref
- [9] . 2017. Which saliency weighting for omni directional image quality assessment? In Proceedings of the International Conference on Quality of Multimedia Experience. IEEE, 1–6.Google ScholarCross Ref
- [10] . 2018. 360-aware saliency estimation with conventional image saliency predictors. Signal Process.: Image Commun. 69 (2018), 43–52.Google ScholarCross Ref
- [11] . 2018. A feature-based approach for saliency estimation of omni-directional images. Signal Process.: Image Commun. 69 (2018), 53–59.Google ScholarCross Ref
- [12] . 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Process.: Image Commun. 69 (2018), 60–68.Google ScholarCross Ref
- [13] . 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Process.: Image Commun. 69 (2018), 69–78.Google ScholarCross Ref
- [14] . 2018. The prediction of head and eye movement for 360 degree images. Signal Process.: Image Commun. 69 (2018), 15–25.Google ScholarCross Ref
- [15] . 2018. Cube padding for weakly supervised saliency prediction in 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1420–1429.Google ScholarCross Ref
- [16] . 2018. Gaze prediction in dynamic \(360^\circ\) immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 5333–5342.Google ScholarCross Ref
- [17] . 2017. Toward a rational and mechanistic account of mental effort. Annu. Rev. Neurosci. 40, 1 (2017), 99–124.Google ScholarCross Ref
- [18] . 2019. Blind noisy image quality assessment using sub-band kurtosis. IEEE Trans. Cybernet. 50, 3 (2019), 1146–1156.Google ScholarCross Ref
- [19] . 2007. Graph-based visual saliency. In Advances in Neural Information Processing Systems. MIT Press, 545–552.Google ScholarDigital Library
- [20] . 2017. Large-scale scene understanding (LSUN) database. Retrieved from http://salicon.net/challenge-2017/.Google Scholar
- [21] . 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2758–2766.Google ScholarDigital Library
- [22] . 2017. Look around you: Saliency maps for omnidirectional images in VR applications. In Proceedings of the 9th International Conference on Quality of Multimedia Experience. IEEE, 1–6.Google Scholar
- [23] . 2019. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Trans. Multimedia 22, 9 (2019), 2331–2344.Google Scholar
- [24] . 2018. Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27, 10 (2018), 5142–5154.Google ScholarCross Ref
- [25] . 2020. Eml-net: An expandable multi-layer network for saliency prediction. Image Vision Comput. 95 (2020), 103887.Google ScholarDigital Library
- [26] . 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4894–4903.Google ScholarCross Ref
- [27] . 2019. Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29 (2019), 1113–1126.Google ScholarCross Ref
- [28] . 2018. Semantic-driven generation of hyperlapse from 360 degree video. IEEE Trans. Visual. Comput. Graph. 24, 9 (2018), 2610–2621. Google ScholarCross Ref
- [29] . 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1396–1405.Google ScholarCross Ref
- [30] . 2020. Learning a deep agent to predict head movement in 360-degree images. ACM Trans. Multimedia Comput. Commun. Appl. 16, 4 (2020), 1–23.Google ScholarDigital Library
- [31] . 2021. Transformer-based long-term viewport prediction in \(360^\circ\) video: Scanpath is all you need. In Proceedings of the IEEE Workshop on Multimedia Signal Processing. 6–8.Google Scholar
- [32] . 2017. Learning spherical convolution for fast features from \(360^\circ\) imagery. In Advances in Neural Information Processing Systems. MIT Press, 529–539.Google Scholar
- [33] . 2018. Spherical CNNs. Retrieved from https://arxiv.org/abs/1801.10130.Google Scholar
- [34] . 2021. Looking here or there? Gaze following in 360-degree images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3742–3751.Google ScholarCross Ref
- [35] . 2022. ScanGAN360: A generative model of realistic scanpaths for 360° images. IEEE Trans. Visual. Comput. Graph. 28, 5 (2022), 2003–2013.Google ScholarCross Ref
- [36] . 2022. Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimedia Comput. Commun. Appl. Just Accepted (January 2022). Google ScholarDigital Library
- [37] . 2021. Saliency prediction on omnidirectional image with generative adversarial imitation learning. IEEE Trans. Image Process. 30 (2021), 2087–2102.Google ScholarCross Ref
- [38] . 2021. SalGFCN: Graph based fully convolutional network for panoramic saliency prediction. In Proceedings of the International Conference on Visual Communications and Image Processing (VCIP’21). IEEE, 1–5.Google ScholarCross Ref
- [39] . 2013. Human Factors Engineering and Ergonomics: A Systems Approach. CRC Press, Boca Raton, FL.Google ScholarCross Ref
- [40] . 2018. Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360 still images. Signal Process.: Image Commun. 69 (2018), 35–42.Google ScholarCross Ref
- [41] . How I Made Wine Glasses from Sunflowers. Retrieved from http://blog.wolfram.com/2011/07/28/how-i-made-wine-glasses-from-sunflowers/.Google Scholar
- [42] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770–778.Google ScholarCross Ref
- [43] . 2017. Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20, 7 (2017), 1688–1698.Google ScholarCross Ref
- [44] . 2016. Fixation prediction through multimodal analysis. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1 (2016), 1–23.Google ScholarDigital Library
- [45] . 2006. International Encyclopedia of Ergonomics and Human Factors. CRC Press, Boca Raton, FL.Google ScholarDigital Library
- [46] . 1973. Differential effects of central versus peripheral vision on egocentric and exocentric motion perception. Exper. Brain Res. 16, 5 (1973), 476–491.Google ScholarCross Ref
- [47] . 2008. Perron-frobenius theorem for nonnegative tensors. Commun. Math. Sci. 6, 2 (2008), 507–520.Google ScholarCross Ref
- [48] . 2019. VIVE Pro Eye: HMD with Precise Eye Tracking. Retrieved from https://enterprise.vive.com/us/product/vive-pro-eye/.Google Scholar
- [49] . 1995. Eye movements and cognitive processes in reading, visual search, and scene perception. In Eye Movement Research, Vol. 6. North-Holland, 3–22.Google Scholar
- [50] . 2019. A saliency dataset for 360-degree videos. In Proceedings of the 10th ACM Multimedia Systems Conference. ACM, 279–284.Google ScholarDigital Library
- [51] . MIT Saliency Benchmark. Retrieved from http://saliency.mit.edu/.Google Scholar
- [52] . 2019. A benchmark of four methods for generating 360 saliency maps from eye tracking data. Int. J. Semant. Comput. 13, 03 (2019), 329–341.Google ScholarCross Ref
- [53] . 2017. SalGAN: Visual saliency prediction with generative adversarial networks. Retrieved from https://arxiv.org/abs/1701.01081.Google Scholar
- [54] . 2013. Saliency detection: A boolean map approach. In Proceedings of the IEEE International Conference on Computer Vision. 153–160.Google ScholarDigital Library
- [55] . 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 11 (1998), 1254–1259.Google ScholarDigital Library
- [56] . 2016. A deep multi-level network for saliency prediction. In Proceedings of the International Conference on Pattern Recognition. IEEE, 3488–3493.Google ScholarCross Ref
- [57] . 2015. Salicon: Saliency in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1072–1080.Google ScholarCross Ref
- [58] . 2008. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.Google Scholar
- [59] . 2009. Static and space-time visual saliency detection by self-resemblance. J. Vision 9, 12 (2009), 15–15.Google ScholarCross Ref
- [60] . 2019. Simple vs. complex temporal recurrences for video saliency prediction. Retrieved from https://arXiv:1907.01869.Google Scholar
- [61] . 2019. TASED-net: Temporally aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 2394–2403.Google ScholarCross Ref
- [62] . 2018. Salnet360: Saliency maps for omni-directional images with cnn. Signal Process.: Image Commun. 69 (2018), 26–34.Google ScholarCross Ref
- [63] . 2018. Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circ. Syst. Video Technol. 29, 12 (2018), 3544–3557.Google ScholarCross Ref
- [64] . 2012. A Benchmark of Computational Models of Saliency to Predict Human Fxations. MIT tech report, Tech. Rep. http://hdl.handle.net/1721.1/68590.Google Scholar
- [65] . 2005. Components of bottom-up gaze allocation in natural images.Vision Res. 45, 18 (2005), 2397–2416.Google ScholarCross Ref
- [66] . 2005. Visual correlates of fixation selection: Effects of scale and time. Vision Res. 45, 5 (2005), 643–659.Google ScholarCross Ref
- [67] . 2005. Assessing the contribution of color in visual attention. Comput. Vision Image Understand. 100, 1-2 (2005), 107–123.Google ScholarDigital Library
- [68] . 2021. Simple baselines can fool 360deg saliency metrics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3750–3756.Google Scholar
Index Terms
- Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos
Recommendations
Haptics in Augmented Reality
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2An augmented reality system merges synthetic sensory information into a user's perception of a three-dimensional environment. An important performance goal for an augmented reality system is that the user perceives a single seamless environment. In most ...
Perceptual thresholds of visual size discrimination in augmented and virtual reality
AbstractThe perception of size in virtual objects in Augmented Reality (AR) and Virtual Reality (VR) is a not trivial issue, as the effectiveness of manipulating and interacting with virtual content depends on the accuracy of size perception. However, ...
Graphical abstractDisplay Omitted
Highlights- A comparative experiment to understand the difference in size perception in AR vs VR.
- The thresholds for size discrimination in AR and VR are not the same.
- The accuracy of judgments is asymmetric for increases and decreases of ...
Saliency in Augmented Reality
MM '22: Proceedings of the 30th ACM International Conference on MultimediaWith the rapid development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary theory underlying AR is human visual confusion, which allows users to perceive the real-world scenes and ...
Comments