Abstract
This short paper outlines my position on a future direction for computational research on visual motion understanding. The direction combines motion perception, visual attention, action representation and computational vision. Due the breadth of literature in these areas, the paper cannot present a comprehensive review of any one topic. The review is a selective one, a selection that attempts to make some particular points. I claim that task-directed attentive processing is a largely unexplored dimension in the computational motion field. I recount in the context of motion understanding a past argument that in order to make vision systems general, attention is one of the components of the strategy. No matter how sophisticated the methods become for extracting motion information from image sequences, it will not be possible to achieve the goal of human-like performance without integrating the optimization of processing that attention provides. Virtually all past surveys of computational models of motion processing completely ignore attention. However, the concept has crept into work over the years in a variety of ways. A second claim is that the biology of attention offers some interesting insights to guide future development. Many computational authors had previously commented that too little is known about how biological vision systems use task-directed attention in motion processing; this is no longer true. Here, I briefly summarize biological evidence that attentive processing affects all aspects of visual perception including motion, and again emphasize that this paper does not do justice to the breadth and depth of the field. New findings provide a critical link between the perception of visual actions and their execution. Together these findings point to a strategy for motion understanding closely related to that presented more than two decades ago.
Similar content being viewed by others
References
Aggarwal, J.K. and Cai, Q. 1999. Human motion analysis: A Review. Computer Vision and Image Understanding, 73(3):428–440.
Aggarwal, J.K., Cai, Q., Liao, W., and Sabata, B. 1998. Nonrigid motion analysis: Articulated and elastic motion. Computer Vision and Image Understanding, 70(2):142–156.
Albright, T. and Stoner, G. 1995. Visual motion perception. Proc. National Academy of Science, USA, 92(7):2433–2440.
Allport, A. 1989. Visual attention. In Foundations of Cognitive Science, M.I. Posner (Ed.). MIT Press; Bradford Books: Cambridge, pp. 631–682.
Aloimonos, Y. (Ed.). 1993. Active Perception. Lawrence Erlbaum Associates Publ.
Aloimonos, Y., Weiss, I., and Bandyopadhay, A. 1987. Active vision. Int. J. Computer Vision, 1(4):333–356.
Badler, N.I. 1975. Temporal scene analysis: Conceptual descriptions of object movements. Ph.D. Thesis, Dept. of Computer Science, University of Toronto.
Bahcall, D. and Kowler, E. 1999. Attentional interference at small spatial separations. Vision Research, 39(1):71–86.
Bajcsy, R. 1985. Active perception vs passive perception. In Proc. IEEE Workshop on Computer Vision: Representation and Control, Oct., Bellaire, Mich., pp. 55–62.
Ballard, D. 1991. Animate vision. Artificial Intelligence, 48:57–86.
Baloch, A. and Grossberg, S. 1997. A neural model of high-level motion processing: Line motion and formoiton dynamics. Vision Research, 37(21):3037–3059.
Baluja, S. and Pomerleau, D.A. 1997. Expectation-based selective attention for the visual monitoring and control of a robot vehicle. Robotics and Autonomous Systems Journal, 22:329–344.
Beauchamp, M.S. et al. 1997. Graded effects of spatial and featural attention on human area MT and associated motion processing areas. J. Neurophysiol., 78:516–520.
Blake, A. 1992. Active vision. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.). MIT Press, pp. 61–63.
Bobick, A. 1997. Movement, activity, and action: The role of knowledge in the perception of motion. Royal Society Workshop on Knowledge-based Vision in Man and Machine, London, England.
Brefczynski, J.A. and DeYoe, E.A. 1999. A physiological correlate of the 'spotlight’ of visual attention. Nat Neurosci., 2(4):370–374.
Brooks, R. 1986. A layered intelligent control system for a mobile robot. IEEE Journal of Robotics and Automation RA-2, April, 14–23.
Büchel, C. et al. 1998. The functional anatomy of attention to visual motion: A functional MRI study. Brain, 121:1281–1294.
Burt, P. 1988. Attention mechanism for vision in a dynamic world. In Proc. 9th Int. Conf. on Pattern Recognition, pp. 977–987.
Caputo, G. and Guerra, S. 1998. Attentional selection by distractor suppression. Vision Research, 38(5):669–689.
Cavanagh, P. 1992. Attention based motion processing. Science, 257:1563–1565.
Cavanagh, P., Labianca, A.T., and Thornton, I. M. 2000. Attention‐based visual routines: Sprites. Cognition, in press.
Cedras, C. and Shah, M. 1994. A survey of motion analysis from moving light displays. IEEE CVPR-94, Seattle, Washington, pp. 214–221.
Cedras, C. and Shah, M. 1995. Motion-based recognition: A survey. Image and Vision Computing, 13(2):129–155.
Chelazzi, L., Duncan, J., Miller, E., and Desimone, R. 1998. Responses of neurons in inferior temporal cortex during memoryguided visual search. J. Neurophysiology, 80:2918–2940.
Clark, J.J. and Ferrier, N. 1988. Modal control of an attentive vision system. In Proc. ICCV, Dec.,Tarpon Springs Florida, pp. 514–523.
Desimone, R. and Duncan, J. 1995. Neural mechanisms of selective attention. Annual Review of Neuroscience, 18:193–222.
Dickmanns, E. 1992. Expectation-based dynamic scene understanding. In Active Vision, Blake and Yuille (Eds.), MIT Press: Cambridge, Massachusetts, pp. 303–334.
Dickmanns, E.D. and Wünsche, H.J. 1999. Dynamic vision for perception and control of motion. In Handbook of Computer Vision and Applications Vol. 2, B. Jahne, H. Haubeccker, and P. Geibler (Ed.). Academic Press.
Dreschler, L. and Nagel, H.H. 1982. On the selection of critical points and local curvature extrema of region boundaries for interframe matching. In Proc. Int. Conf. Pattern Recognition, Munich, pp. 542–544.
Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. 2000. Visuomotor neurons: Ambiguity of the discharge or ‘motor’ perception? Int. J. Psychophysiology, 35:165–177.
Felleman, D. and Van Essen, D. 1991. Distributed hierarchical processing in the primate visual cortex. Cerebral Cortex, 1:1–47.
Gandhi, S.P., Heeger, D.J., and Boynton, G.M. 1999. Spatial attention affects brain activity in human primary visual cortex. Proc Natl Acad Sci USA, 96(6):3314–3319.
Gavrila, D.M. 1999. The visual analysis of human movement: A Survey. Computer Vision and Image Understanding, 73(1):82–98.
Hildreth, E. and Royden, C. 1995. Motion perception. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.). pp. 585–588. MIT Press.
Hoffman, J. 1998. Visual attention and eye movements. In Attention, H. Pashler (Ed.). Psychology Press, 119–154.
Humphreys, G. and Riddoch, M. 2001. Detection by action: Neuropsychological evidence for action-defined templates in search. Nature Neuroscience, 4:84–88.
Joseph, J., Chun, M., and Nakayama, K. 1997. Attentional requirements in a ‘preattentive’ feature search task. Nature, 387:805–807.
Kalman, R.E. 1960. A new approach to linear filtering and prediction problems. Trans. ASMEE J. Basic Eng., 82:35OE45.
Kastner, S., De Weerd, P., Desimone, R., and Ungerleider, L. 1998.Mechanisms of directed attention in the human extrastriate cortex as revealed by functional MRI. Science, 282:108–111.
Kastner, S. and Ungerleider, L. 2000. Mechanisms of visual attention in the human cortex. Annual Rev. Neuroscience, 23:315–341.
Koch, C. and Ullman, S. 1985. Shifts in selective visual attention: Towards the underlying neural circuitry. Hum. Neurobiology, 4:219–227.
Lu, Z. and Sperling, G. 1995. Attention-generated apparent motion. Nature, 377:237–239.
Mann, R., Jepson, A., and Siskind, J. 1997. The computational perception of scene dynamics. Computer Vision and Image Understanding, 65(2):113–128.
Marr, D. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman.
Martin, W. and Aggarwal, J. 1988.
Metzger, W. 1974. Consciousness, Perception and Action. In Handbook of Perception vol. 1, Historical and Philosophical Roots of Perception, Academic Press, pp. 109–125.
Moran, J. and Desimone, R. 1985. Selective attention gates visual processing in the extrastriate cortex. Science, 229:782–784.
Nagel, H.H. 1981. Image sequence analysis: What can we learn from applications? In Image Sequence Analysis, T.S. Huang (Ed.). Springer Verlag: Berlin, pp. 19–228.
Nagel, H.H. 1988. From image sequences towards conceptual descriptions. Image and Vision Computing, 6(2):59–74.
Nowlan, S. and Sejnowski, T. 1995. A selection model for motion processing in area MT of primates. The Journal of Neuroscience, 15(2):1195–1214.
O'Craven, K.M. et al. 1997. Voluntary attention modulates fMRI activity in human MT-MST. Neuron, 18:591–598.
Olshausen, B. Anderson and C. Van Essen, D. 1993. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. of Neuroscience, 13(1):4700–4719.
Pahlavan, Uhlin and Eklundh. 1993. Active vision as a methodology. In Active Perception, Y. Aloimonos (Ed.). Lawrence Erlbaum Associates Publ., pp.19–46.
Parodi, P., Lanciwicki, R., Vijh, A., and Tsotsos J.K. 1998. Empirically-derived estimates of the complexity of labeling line drawings of polyhedral scenes. Artificial Intelligence, 105:47–75.
Pashler, H. 1997. The Psychology of Attention. MIT Press.
Perrett, D., Mistlin, A., Harries, M., and Chitty, A. 1990. Understanding the visual appearance and consequence of hand actions. In Vision and Action: The Control of Grasping, M. Goodale (Ed.). Ablex Publishing Corp., pp. 163–180.
Pinhanez, C. and Bobick, A. 1997. Human action detection using PNF propagation of temporal constraints. MIT Media Lab TR 423.
Raymond, J. 2000. Attentional modulation of visual motion perception. Trends in Cognitive Sciences, 4(2):42–50.
Rees, G. et al. 1997. Modulation of irrelevant motion perception by varying load in an unrelated task. Science, 278:1616–1619.
Rensink, R. 1989. A new proof of the NP-completeness of visual match. TR 89–22, Dept. of Computer Science, University of British Columbia.
Reynolds, J., Chelazzi, L., and Desimone, R. 1999. Competitive mechanisms subserve attention in macaque areas V2 and V4. The Journal of Neuroscience, 19(5):1736–1753.
Rizzolatti, G. and Arbib, M. 1998. Language within our grasp. Trends in Neuroscience, 21(5):188–194.
Roelfsema, P., Lamme, V., and Spekreijse, H. 1998. Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395:376–380.
Salin, P.A. and Bullier, J. 1995. Corticocortical connections in the visual system: Structure and function. Physiological Reviews, 75(1):107–154.
Sandini, G., Gandolfo, F., Grosso, E., and Tistarelli, M. 1993. Vision during action. In Active Perception,Y. Aloimonos (Ed.). Lawrence Erlbaum Associates Publ.
Seidemann, E. and Newsome, W.T. 1999. Effect of spatial attention on the responses of area MT neurons. J. Neurophysiol., 81:1783–1794.
Shah, M. and Jain, R. 1997. Visual recognition of activities, gestures, facial expressions and speech: An introduction and a perspective. In Motion-Based Recognition, M. Shah and R. Jain (Ed.). Kluwer Academic Publishers.
Siskind, J.M. 1995. Grounding language in perception. Artificial Intelligence Review, 8:371–391.
Sperling, G. and Lu, Z.-L. 1998. A systems analysis of visual motion perception. In High-Level Motion Processing, T. Watanabe (Ed.). MIT Press. pp. 153–183.
Styles, E. 1997. The Psychology of Attention. Psychology Press.
Treue, S. and Maunsell, J.H.R. 1996. Attentional modulation of visual motion processing in cortical areas MT and MST. Nature, 382:539–541.
Treue, S. and Trujillo, J.C.M. 1999. Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399:575–579.
Tsotsos, J.K. 1977. Some notes on motion understanding. In Proc. Fifth International Joint Conference on Artificial Intelligence, MIT, Cambridge: Mass. p. 611.
Tsotsos, J.K. 1980. A framework for visual motion understanding. Ph.D. Thesis, Dept. of Computer Science, University of Toronto.
Tsotsos, J.K. 1985. The role of knowledge organization in representation and interpretation of time-varying data: The ALVEN system. Computational Intelligence, 1(1):16–32.
Tsotsos, J.K. 1987a. Representational axes and temporal cooperative processes. In Vision, Brain and Cooperative Computation, M. Arbib and A. Hanson (Ed.). MIT Press/Bradford Books, pp. 361–418.
Tsotsos, J.K. 1987b. A ‘complexity level’ analysis of vision. In Proc. International Conference on Computer Vision, London, England.
Tsotsos, J.K. 1989. The complexity of perceptual search tasks. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, Michigan, pp. 1571–1577.
Tsotsos, J.K. 1990. Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13(3):423–445.
Tsotsos, J.K. 1992. On the relative complexity of passive vs active visual search. International Journal of Computer Vision, 7(2):127–141.
Tsotsos, J.K. 1995b.On behaviorist intelligence and the scaling problem. Artificial Intelligence, 75:135–160.
Tsotsos, J.K. in press. Neurobiological models of visual attention. 5th Course of International Summer School “Neural Nets E.R. Caianiello”on Visual Attention Mechanisms.
Tsotsos, J.K., Culhane, S., and Cutzu, F. 2001. From theoretical foundations to a hierarchical circuit for selective attention. In Visual Attention and Cortical Circuits, J. Braun, C. Koch, and J. Davis, (Ed.). MIT Press, pp. 285–306.
Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., and Nuflo, F. 1995. Modeling visual attention via selective tuning. Artificial Intelligence, 78(1–2):507–547.
Tsotsos, J., Mylopoulos, J., Covvey, H.D., and Zucker, S.W. 1980. A framework for visual motion understanding. IEEE Pattern Analysis and Machine Intelligence, Special Issue on Computer Analysis of Time-Varying Imagery, pp. 563–573.
Ullman S. 1984.Visual routines. In Visual Cognition, S. Pinker (Ed.). MIT Press, pp. 97–160.
Ullman, S. 1995. Sequence seeking and counter streams: A computational model for bidirectional information flow in the visual cortex. Cerebral Cortex, 1:1–11.
Valdes-Sosa, M. et al. 1998. Switching attention without shifting the spotlight: Object-based attentional modulation of brain potentials. J. Cogn. Neurosci., 10:137–151.
Vanduffel, W., Tootell, R., and Orban, G. 2000. Attention-dependent suppression of metabolic activity in the early stages of the macaque visual system. Cerebral Cortex, 10:109–126.
Wachter, S. and Nagel, H.H. 1999. Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3):174–192.
Watanabe, T. and Miyauchi, S. 1998. Roles of attention and form in visual motion processing: Psychophysical and brain-imaging studies. In High-Level Motion Processing, T. Watanabe (Ed.). MIT Press, pp. 95–113.
Wertheimer, M. 1912, 1961. Experimentelle studien über das sehen von bewegung. In Classics in Psychology, T. Shipley (Ed.). Philosophical Library: New York, Vol. 61, pp. 161–265.
Ye, Y. and Tsotsos, J.K. 1999. Sensor planning for object search. Computer Vision and Image Understanding, 73(2):145–168.
Yeshurun, Y. 1997. Attentional mechanisms in computer vision. In Artificial Vision: Image Description, Recognition and Communication, V. Cantoni, S. Levialdi, and V. Roberto (Ed.). Academic Press, pp. 43–52.
Yeshurun, Y. and Carrasco, M. 1999. Spatial attention improves performance in spatial resolution tasks. Vision Research, 39(2):293–306.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tsotsos, J.K. Motion Understanding: Task-Directed Attention and Representations that Link Perception with Action. International Journal of Computer Vision 45, 265–280 (2001). https://doi.org/10.1023/A:1013666302043
Issue Date:
DOI: https://doi.org/10.1023/A:1013666302043