ABSTRACT
This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) features, the framework also uses topdown (goal-directed) contexts to predict the human gaze. The framework first builds feature maps using preattentive features such as luminance, hue, depth, size, and motion. The feature maps are then integrated into a single saliency map using the center-surround difference operation. This pixel-level bottom-up saliency map is converted to an object-level saliency map using the item buffer. Finally, the top-down contexts are inferred from the user's spatial and temporal behaviors during interactive navigation and used to select the most plausibly attended object among candidates produced in the object saliency map. The computational framework was implemented using the GPU and exhibited extremely fast computing performance (5.68 msec for a 256X256 saliency map), substantiating its adequacy for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the visual attention tracking framework with respect to actual human gaze data. The attained accuracy level was well supported by the theory of human cognition for visually identifying a single and multiple attentive targets, especially due to the addition of top-down contextual information. The framework can be effectively used for perceptually based rendering without employing an expensive eye tracker, such as providing the depth-of-field effects and managing the level-of-detail in virtual environments.
- Awh, E., and Pashler, H. 2000. Evidence for split attentional foci. Journal of Experimental Psychology 26, 2, 834--846.Google Scholar
- Backer, G., Mertsching, B., and Bollmann, M. 2001. Data- and model-driven gaze control for an active-vision system. IEEE Trans. Pattern Analysis and Machine Intelligence 23, 12, 1415--1429. Google ScholarDigital Library
- Beeharee, A. K., West, A. J., and Hubbold, R. J. 2003. Visual attention based information culling for distributed virtual environments. In Proceedings of ACM Symposium on Virtual Reality Software and Technology, 213--222. Google ScholarDigital Library
- Brown, R., Cooper, L., and Pham, B. 2003. Visual attentionbased polygon level of detail management. In Proceedings of GRAPHITE 2003, 55--62. Google ScholarDigital Library
- Burns, D., and Osfield, R., 2006. OpenSceneGraph. http://www.openscenegraph.org.Google Scholar
- Cater, K., Chalmers, A., and Ledda, P. 2002. Selective quality rendering by exploiting human inattentional blindness: looking but not seeing. In Proceedings of ACM Symposium on Virtual Reality Software and Technology, 17--24. Google ScholarDigital Library
- Connor, C., Egeth, H., and Yantis, S. 2004. Visual attention: Bottom-up vs. top-down. Current Biology 14, 19, 850--852.Google ScholarCross Ref
- Culhane, S. M., and Tsotsos, J. K. 1992. An attentional prototype for early vision. In Proceedings of European Conference on Computer Vision 1992, 551--560. Google ScholarDigital Library
- Engel, S., Zhang, X., and Wandell, B. 1997. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388, 6637, 68--71.Google Scholar
- Enns, J. T. 1990. Three-dimensional features that pop out in visual search. In Visual Search, Taylor and Francis, Eds. New York, 37--45.Google Scholar
- Haber, J., Myszkowski, K., Yamauchi, H., and Seidel, H.-P. 2001. Perceptually guided corrective splatting. Computer Graphics Forum 20, 3.Google ScholarCross Ref
- Henderson, J. M. 2003. Human gaze control during real-world scene perception. Trends in Cognitive Sciences 7, 11, 498--504.Google ScholarCross Ref
- Itti, L., Koch, C., and Niebur, E. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 20, 11, 1254--1259. Google ScholarDigital Library
- Itti, L. 2000. Models of Bottom-Up and Top-Down Visual Attention. PhD thesis, California Institute of Technology, Pasadena, California. Google ScholarDigital Library
- Jobson, D. J., ur Rahman, Z., and Woodell, G. A. 1997. Properties and performance of a center/surround retinex. IEEE Trans. on Image Processing 6, 3, 451--462. Google ScholarDigital Library
- Kalman, R. E. 1960. A new approach to linear filtering and predictive problems. Trans. ASME, Journal of basic engineering 82, 34--45.Google ScholarCross Ref
- Kessenich, J., Baldwin, D., and Rost, R., 2004. The OpenGL Shading Language. Version 1.10.59. 3Dlabs, Inc. Ltd. http://developer.3dlabs.com/documents/index.htm.Google Scholar
- Koch, C., and Ullman, S. 1985. Shifts in selective visual attention. Human Neurobiology 4, 219--227.Google Scholar
- Kuipers, B. 1978. Modeling spatial knowledge. Cognitive Science 2, 129--153.Google ScholarCross Ref
- Lee, C. H., Varshney, A., and Jacobs, D. W. 2005. Mesh saliency. In Proceedings of SIGGRAPH 2005, 659--666. Google ScholarDigital Library
- Loftus, G. R., and Mackworth, N. H. 1978. Cognitive determinants of fixation duration during picture viewing. Journal of Experimental Psychology 4, 565--572.Google Scholar
- Longhurst, P., Debattista, K., and Chalmers, A. 2006. A GPU based saliency map for high-fidelity selective rendering. In Proceedings of AFRIGRAPH 2006, 21--29. Google ScholarDigital Library
- Ma, Y.-F., Hua, X.-S., Lu, L., and Zhang, H. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. on Multimedia 7, 5, 907--919. Google ScholarDigital Library
- Marshall, J., Burbeck, C., Ariely, D., Rolland, J., and Martin, K. 1996. Occlusion edge blur: a cue to relative visual depth. Journal of the Optical Society of America 13, 681--688.Google ScholarCross Ref
- Mather, G. 1997. The use of image blur as a depth cue. Perception 26, 1147--1158.Google ScholarCross Ref
- Mozer, M. C., and Sitton, M. 1998. Computational modeling of spatial attention. In Attention, H. Pashler, Ed. UCL Press, London, 341--393.Google Scholar
- Nagy, A. L., and Sanchez, R. R. 1990. Critical color differences determined with a visual search task. Journal of the Optical Society of America 7, 7, 1209--1217.Google ScholarCross Ref
- Nakayama, K., and Silverman, G. 1986. Serial and parallel processing of visual feature conjunctions. Nature 320, 264--265.Google ScholarCross Ref
- O'Craven, K. M., Downing, P. E., and Kanwisher, N. 1999. fMRI evidence for objects as the units of attentional selection. Nature 401, 6753, 584--587.Google Scholar
- OpenCV, 2006. http://sourceforge.net/projects/opencvlibrary/.Google Scholar
- Ouerhani, N., Bracamonte, J., Hugli, H., Ansorge, M., and Pellandini, F. 2001. Adaptive color image compression based on visual attention. In Proceedings of ICIAP, 416--421.Google Scholar
- Ouerhani, N., von Wartburg, R., and Hugli, H. 2004. Empirical validation of the saliency-based model of visual attention. Electronic Letters on Computer Vision and Image Analysis 3, 1, 13--24.Google ScholarCross Ref
- Rutishauser, U., Walther, D., Koch, C., and Perona, P. 2004. Is bottom-up attention useful for object recognition? In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2004, 37--44. Google ScholarDigital Library
- Santella, A., and DeCarlo, D. 2004. Visual interest and NPR: an evaluation and manifesto. In Proceedings of Symposium on Non-Photorealistic Animation and Rendering, 71--150. Google ScholarDigital Library
- Sears, C., and Pylyshyn, Z. 2000. Multiple object tracking and attentional processing. Journal of Experimental Psychology 54, 1, 1--14.Google ScholarCross Ref
- Siegel, A. W., and White, S. H. 1975. The development of spatial representations of large-scale environments. In Advances in Child Development and Behavior, H. Reese, Ed., vol. 10. Academic Press, New York, 10--55.Google Scholar
- Speed, F. M., Hocking, R. R., and Hackney, O. P. 1978. Methods of analysis of linear models with unbalanced data. Journal of the American Statistical Association 73, 361, 105--112.Google ScholarCross Ref
- Treisman, A. M., and Gelade, G. 1980. A feature-integration theory of attention. Cognitive Psychology 12, 97--136.Google ScholarCross Ref
- Vishton, P., and Cutting, J. 1995. Wayfinding, displacements, and mental maps: velocity field are not typically used to determine one's aimpoint. Journal of Experimental Psychology 21, 978--995.Google Scholar
- Watson, B., Walker, N., Hodges, L. F., and Worden, A. 1997. Managing level of detail through peripheral degradation: Effects on search performance with a head-mounted display. ACM Trans. on Computer-Human Interaction 4, 4, 323--346. Google ScholarDigital Library
- Weghorst, H., Hooper, G., and Greenberg, D. P. 1984. Improved computational methods for ray tracing. ACM Trans. on Graphics 3, 1, 52--69. Google ScholarDigital Library
- Welsh, G., and Bishop, G. 1995. An introduction to the Kalman filter. Tech. Rep. 95--041, Univ. of North Carolina at Chapel Hill. Google ScholarDigital Library
- Wolfe, and Jeremy, M. 1993. Guided search 2.0. In Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 1295--1299.Google Scholar
- Yee, H., Pattanaik, S., and Greenberg, D. P. 2001. Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. on Graphics 20, 39--65. Google ScholarDigital Library
Index Terms
- Real-time tracking of visually attended objects in interactive virtual environments
Recommendations
Real-Time Tracking of Visually Attended Objects in Virtual Environments and Its Application to LOD
This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) saliency map, the proposed ...
Exploiting contrast cues for salient region detection
Visual saliency detection is an important cue used in human visual system, which can offer efficient solutions for both biological and artificial vision systems. Although there are many saliency detection models that can achieve good results on public ...
Depth Perception Tendencies in the 3-D Environment of Virtual Reality
Computer Vision and GraphicsAbstractThe human brain is not able to process the vast amount of visual information that originates in the environment around us. Therefore, a complex process of human visual attention based on the principles of selectivity and prioritization helps us to ...
Comments