Abstract
Mobile robots have to deal with an enormous amount of visual data containing static and dynamic stimuli. Depending on the task, only small portions of a scene are relevant. Artificial attention systems filter information at early stages. Among the various methods proposed to implement such systems, the region-based approach has proven to be robust and especially suited for integrating top-down influences. This concept was recently transferred to the spatiotemporal domain to obtain motion saliency. A full-featured integration of the spatial and spatiotemporal systems is presented here. We propose a biologically inspired two-stream system, which allows to use different spatial and temporal resolutions and to pick off spatiotemporal saliency at early stages. We compare the output to classic models and demonstrate the flexibility of the integrated approach in different experiments. These include online processing of continuous input, a task similar to thumbnail extraction and a top-down task of selecting specific moving and non-moving objects.
Similar content being viewed by others
References
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20(11):1254–9.
Aziz MZ, Mertsching B. Fast and robust generation of feature maps for region-based visual attention. In: IEEE transactions on image processing, vol. 17; 2008. p. 633–44.
Wolfe JM, Horowitz TS. What attributes guide the deployment of visual attention and how do they do it? Nat Rev Neurosci. 2004;5(6):495–501.
Aziz MZ, Knopf M, Mertsching B. Knowledge-driven saliency: attention to the unseen. In: ACIVS 2011, LNCS 6915; 2011. p. 34–45.
Aziz MZ. Behavior adaptive and real-time model of integrated bottom-up and top-down visual attention. Dissertation, University of Paderborn; 2009.
Tünnermann J, Mertsching B. Continuous region-based processing of spatiotemporal saliency. In: Proceedings of the international conference on computer vision theory and applications; 2012. p. 230–9.
Koch C, Ullman S. Shifts in selective attention: towards the underlying neural circuitry. Hum Neurobiol. 1985;4:219–27.
Treisman AM, Gelade G. A feature integration theory of attention. Cognit Psychol. 1980;12(1):97–136.
Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.
Belardinelli A, Pirri F, Carbone A. Attention in cognitive systems. Berlin: Springer. 2009. p. 112–23.
Adelson EH, Bergen JR. Spatiotemporal energy models for the perception of motion. J Opt Soc Am. 1985;2(2):284–99.
Hou X, Zhang L. Saliency detection: a spectral residual approach. In: IEEE CVPR; 2007. p. 1–8.
Li J, Levine MD, An X, He H. Saliency detection based on frequency and spatial domain analyses. In: Proceedings of the British machine vision conference, BMVA Press; 2011. p. 86.1–.11.
Guo C, Ma Q, Zhang L. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: IEEE CVPR; 2008. p. 1–8.
Cui X, Liu Q, Metaxas DN. Temporal spectral residual: fast motion saliency detection. In: ACM multimedia’09; 2009. p. 617–20.
Gao D, Mahadevan V, Vasconcelos N. The discriminant center-surround hypothesis for bottom-up saliency. In: Advances in neural information processing systems. vol. 20; 2007. p. 1–8.
Seo HJ, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vis. 2009;9(12):15.1–.27.
Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell. 2010;32(1):171–7.
Itti L, Baldi PF. Bayesian surprise attracts human attention. In: Advances in neural information processing systems, vol. 19. Cambridge, MA: MIT Press; 2006. p. 547–54.
Itti L, Baldi PF. Bayesian surprise attracts human attention. Vis Res. 2009;49(10):1295–306.
Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW. SUN: a Bayesian framework for saliency using natural statistics. J Vis. 2008;8(7):1–20.
Zhang L, Tong MH, Cottrell GW. SUNDAy: saliency using natural statistics for dynamic analysis of scenes. In: 31st annual cognitive science society conference; 2009. p. 2944–9.
Torralba A, Oliva A, Castelhano MS, Henderson JM. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766–86.
Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. In: Progress in brain research; 2006. p. 23–36.
Itti L, Koch C. Feature combination strategies for saliency-based visual attention systems. J Electron Imaging. 2001;10(1):161–9.
Navalpakkam V, Itti L. An integrated model of top-down and bottom-up attention for optimal object detection. In: IEEE CVPR; 2006. p. 2049–56.
Aziz MZ, Mertsching B. Visual search in static and dynamic scenes using fine-grain top-down visual attention. In: ICVS, vol. 5008; 2008. p. 3–12.
Wischnewski M, Belardinelli A, Schneider WX, Steil JJ. Where to look next? combining static and dynamic proto-objects in a TVA-based model of visual attention. Cognit Comput. 2010;2(4):326–43.
Kouchaki Z, Nasrabadi AM. A nonlinear feature fusion by variadic neural network in saliency-based visual attention. In: Proceedings of the international conference on computer vision theory and applications; 2012. p. 457–61.
Tünnermann J, Born C, Mertsching B. Top-down visual attention with complex templates. In: Proceedings of the international conference on computer vision theory and applications; 2013. p. 370–7.
Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):185–207.
Aziz MZ, Shafik MS, Mertsching B, Munir A. Color segmentation for visual attention of mobile robots. In: Proceedings of the IEEE symposium on emerging technologies; 2005. p. 115–20.
Backer M, Tünnermann J, Mertsching B. Parallel k-means image segmentation using sort, scan and connected components on a GPU. In: Keller R, Kramer D, Weiss JP, editors. Facing the multicore-challenge III. vol. 7686 of lecture notes in computer science. Berlin: Springer; 2013. p. 108–20.
Aziz MZ, Mertsching B. Pop-out and IOR in static scenes with region based visual attention. Bielefeld: Bielefeld University eCollections; 2007.
Ungerleider LG, Mishkin M. 18. In: Ingle DJ, Goodale M, Mansfield RJW, editors. Two Cortical Visual Systems; 1982. p. 549–86.
Goodale MA, Milner AD. Separate visual pathways for perception and action. Trends Neurosci. 1992;15(1):20–5.
Goodale MA, Westwood DA. An evolving view of duplex vision: separate but interacting cortical pathways for perception and action. Curr Opin Neurobiol. 2004;14(2):203–11.
Tseng P, Tünnermann J, Roker-Knight N, Winter D, Scharlau I, Bridgeman B. Enhancing implicit change detection through action. Perception. 2010;39:1311–21.
Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis Cognit. 2005;12(6):1093–123.
CRCNS. Collaborative research in computational neuroscience—data sharing. 2008. http://crcns.org/. Accessed Jun 2013.
Deubel H, Schneider WX. Saccade target selection and object recognition: evidence for a common attentional mechanism. Vis Res. 1996;36(12):1827–37.
Malcolm GL, Henderson JM. Combining Top-down processes to guide eye movements during real-world scene search. J Vis. 2010;10(2):1–11.
Tseng PH, Carmi R, Cameron IGM, Munoz DP, Itti L. Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis. 2009;9(7):1–16.
PETS2001. 2nd IEEE international workshop on performance evaluation of tracking and surveillance. 2001. http://ftp.pets.rdg.ac.uk/PETS2001/DATASET1/TESTING/. Accessed 3 Jun 2013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tünnermann, J., Mertsching, B. Region-Based Artificial Visual Attention in Space and Time. Cogn Comput 6, 125–143 (2014). https://doi.org/10.1007/s12559-013-9220-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-013-9220-5