Abstract
In this work we study how information provided by foveated images sampled according to the log-polar transformation can be integrated over time in order to build accurate world representations and accomplish visual search tasks in an efficient manner. We focus on a specific visual information modality depth and on how to store it in a flexible memory structure. We propose a probabilistic observational model for a stereo system that relies on the Unscented Transform in order to propagate uncertainty in stereo matching, due to spatial quantization in the retina, to the 3D Cartesian domain. Probabilistic depth measurements are integrated in a novel Sensory Ego-Sphere whose topology can be biased with foveal-like distributions, according to the autonomous agent short-term tasks and goals. Furthermore, we investigate an Upper Confidence Bound algorithm for the task of simultaneously finding the closest object to the observer (visual search) and learning the surrounding environment 3D map (mapping). The performance of task execution is assessed both with a foveated log-polar sensor and a classical uniform one. The advantage of foveal vision and custom ego-sphere representations are illustrated in a series of experiments with a realistic simulator.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Receptive fields are the fundamental visual processing units. Each corresponds to a specific region in the retina (image) and is represented by the average value of the photo-receptors (pixels) within it (e.g. average color). For more details, we refer the interested reader to Edelman (1995).
References
Agarwal, A., & Blake, A. (2010). Dense stereo matching over the panum band. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 416–430.
Agrawal, R. (1995). Sample mean based index policies with o (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054–1078.
Ahmad, S., & Yu, A. J. (2013). Active sensing as bayes-optimal sequential decision making, CoRR, vol. abs/1305.6650. http://arxiv.org/abs/1305.6650.
Audibert, J. -Y., & Bubeck, S. (2010). Best arm identification in multi-armed bandits. In COLT-23th conference on learning theory-2010 (pp. 13-p).
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.
Avelino, J. A., Figueiredo, R., Moreno, P., & Bernardino, A. (2016). On the perceptual advantages of visual suppression mechanisms for dynamic robot systems. In International conference on biologically inspired cognitive architectures (BICA).
Begum, M., & Karray, F. (2011). Visual attention for robotic cognition: A survey. IEEE Transactions on Autonomous Mental Development, 3(1), 92–105.
Bernardino, A., & Santos-Victor, J. (2002). A binocular stereo algorithm for log-polar foveated systems. In H. Blthoff, C. Wallraven, S. -W. Lee , & T. Poggio (Eds.), Biologically motivated computer vision, ser. Lecture notes in computer science, (Vol. 2525, pp. 127–136). Berlin: Springer.
Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185–207.
Butko, N. J., & Movellan, J. R. (2010). Infomax control of eye movements. IEEE Transactions on Autonomous Mental Development, 2(2), 91–107.
Carrasco, M. (2011). Visual attention: The past 25 years. Vision research, 51(13), 1484–1525. (vision Research 50th Anniversary Issue: Part 2). http://www.sciencedirect.com/science/article/pii/S0042698911001544.
Colombo, C., Rucci, M., & Dario, P. (1996). Integrating selective attention and space-variant sensing in machine vision. In J. L. C. Sanz (Ed.), Image technology: Advances in image processing, multimedia and machine vision (pp. 109–127). Springer Berlin Heidelberg.
Cox, D. D., John, S. (1992). Sdo: A statistical method for global optimization. In IEEE international conference on systems, man and cybernetics (pp. 1241–1246). IEEE.
Crawford, L. E., Landy, D., & Presson, A. N. (2014). Bias in spatial memory: Prototypes or relational categories. In Poster presented at the 36th annual conference of the cognitive science Society, Quebec.
Edelman, S. (1995). Receptive fields for vision: From hyperacuity to object recognition. http://cogprints.org/570/.
Ferreira, J., Bessière, P., Mekhnacha, K., Lobo, J., Dias, J., & Laugier, C. (2008). Bayesian models for multimodal perception of 3D structure and motion. In International conference on cognitive systems (CogSys 2008), Karlsruhe, Germany. https://hal.archives-ouvertes.fr/hal-00338800.
Fleming, K. A., Peters, R. A., & Bodenheimer, R. E. (2006). Image mapping and visual attention on a sensory ego-sphere. In 2006 IEEE/RSJ international conference on intelligent robots and systems, IROS 2006, Beijing, China (pp. 241–246). October 9-15, 2006. doi:10.1109/IROS.2006.281688.
Friston, K., Adams, R., & Montague, R. (2012). What is value accumulated reward or evidence? Frontiers in Neurorobotics,. doi:10.3389/fnbot.2012.00011.
Hirose, M., Furuhashi, H., Miyasaka, T., & Araki, K. (2002). Reconstruction of range data by means of geodesic dome type data structure. The Journal of the Institute of Image Electronics Engineers of Japan, 31(3), 388–395.
Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence, 30(2), 328–341.
Hoffman, M. D., Brochu, E., & de Freitas, N. (2011). Portfolio allocation for bayesian optimization. Citeseer.
Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Burgard, W. (2013). OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots. http://octomap.github.com.
Huang, D., Allen, T. T., Notz, W. I., & Zeng, N. (2006). Global optimization of stochastic black-box systems via sequential kriging meta-models. Journal of Global Optimization, 34(3), 441–466.
Itti, L., & Baldi, P. F. (2006). Bayesian surprise attracts human attention. In Advances in neural information processing systems (NIPS*2005) (Vol. 19, pp. 547–554). Cambridge, MA: MIT Press. su;mod;bu;td;ey.
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 1254–1259.
Julier, S., & Uhlmann, J. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401–422.
Koch, C., & Ullman, S. (1987). Shifts in selective visual attention: Towards the underlying neural circuitry. In L. M. Vaina (Ed.), Matters of intelligence: Conceptual structures in cognitive neuroscience (pp. 115–141). Dordrecht: Springer Netherlands.
Kriegman, D. J., Triendl, E., & Binford, T. O. (1989). Stereo vision and navigation in buildings for mobile robots. IEEE Transactions on Robotics and Automation, 5(6), 792–803.
Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Fluids Engineering, 86(1), 97–106.
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.
Lizotte, D., Wang, T., Bowling, M., & Schuurmans, D. (2007). Automatic gait optimization with gaussian process regression. In Proceedings of the IJCAI (pp. 944–949).
Mockus, J. (1974). On bayesian methods for seeking the extremum. In Proceedings of the IFIP technical conference (pp. 400–404). London, UK: Springer. http://dl.acm.org/citation.cfm?id=646296.687872.
Moreno, P., Nunes, R., Figueiredo, R., Ferreira, R., Bernardino, A., Santos-Victor, J., Beira, R., Vargas, L., Aragão, D., & Aragão, M. (2015). Vizzy: A humanoid on wheels for assistive robotics. In Robot 2015: Second Iberian robotics conference (pp. 17–28). Springer International Publishing 2016.
Muller, M. E. (1959). A note on a method for generating points uniformly on n-dimensional spheres. Communications of the ACM, 2(4), 19–20. doi:10.1145/377939.377946.
Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434(7031), 387–391.
Pamplona, D., & Bernardino, A. (2009). Smooth foveal vision with Gaussian receptive fields. In 9th IEEE-RAS international conference on humanoid robots, humanoids 2009, Paris, France (pp. 223–229). December 7–10, 2009. http://dx.doi.org/10.1109/ICHR.2009.5379575.
Perrollaz, M., Spalanzani, A., & Aubert, D. (2010). Probabilistic representation of the uncertainty of stereo-vision and application to obstacle detection. In Intelligent vehicles symposium (IV), 2010 IEEE (pp. 313–318). June 2010.
Peters, R. A., Hambuchen, K. A., & Bodenheimer, R. E. (2009). The sensory ego-sphere: A mediating interface between sensors and cognition. Autonomous Robots, 26(1), 1–19. doi:10.1007/s10514-008-9098-3.
Posner, M. (2012). Cognitive neuroscience of attention. Guilford Press. http://books.google.pt/books?id=8yjEjoS7EQsC.
Robbins, H., et al. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., & Pfeifer, R. (2008). Multimodal saliency-based bottom-up attention a framework for the humanoid robot ICUB. In IEEE international conference on robotics and automation, 2008. ICRA 2008 (pp. 962–967). May 2008.
Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (1st ed.). Cambridge, MA: MIT Press.
Tippetts, B., Lee, D. J., Lillywhite, K., & Archibald, J. (2016). Review of stereo vision algorithms and their suitability for resource-limited systems. Journal of Real-Time Image Processing, 11(1), 5–25.
Vijayakumar, S., Conradt, J., Shibata, T., & Schaal, S. (2001). Overt visual attention for a humanoid robot. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, 2001 (Vol. 4, pp. 2332–2337). IEEE.
von Helmholtz, H. & König, A. (1896). Handbuch der physiologischen Optik (Vol. 1). L. Voss. https://books.google.pt/books?id=Lb4KAAAAIAAJ.
Wang, J., & Liu, Y. (2007). A closed-form solution of reconstruction from nonparallel stereo geometry used in image guided system for surgery. In N. Sebe, Y. Liu, Y. Zhuang, & T. Huang (Eds) Multimedia content analysis and mining, ser. Lecture notes in computer science (Vol. 4577, pp. 371–380). Berlin Heidelberg: Springer.
Weiman, C. F. R. (1995). Binocular stereo via log-polar retinas. In SPIE, Ed.
Acknowledgements
This work has been partially supported by the Portuguese Foundation for Science and Technology (FCT) Project [UID/EEA/50009/2013]. Rui Figueiredo is funded by FCT Ph.D. Grant PD/BD/105779/2014. Helder Araújo would like to thank FCT (Portuguese Foundation for Science and Technology) grant UID-EEA-0048-2013.
Author information
Authors and Affiliations
Corresponding author
Additional information
This is one of several papers published in Autonomous Robots comprising the Special Issue on Active Perception.
Rights and permissions
About this article
Cite this article
de Figueiredo, R.P., Bernardino, A., Santos-Victor, J. et al. On the advantages of foveal mechanisms for active stereo systems in visual search tasks. Auton Robot 42, 459–476 (2018). https://doi.org/10.1007/s10514-017-9617-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9617-1