Skip to main content
Log in

Motion Understanding: Task-Directed Attention and Representations that Link Perception with Action

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This short paper outlines my position on a future direction for computational research on visual motion understanding. The direction combines motion perception, visual attention, action representation and computational vision. Due the breadth of literature in these areas, the paper cannot present a comprehensive review of any one topic. The review is a selective one, a selection that attempts to make some particular points. I claim that task-directed attentive processing is a largely unexplored dimension in the computational motion field. I recount in the context of motion understanding a past argument that in order to make vision systems general, attention is one of the components of the strategy. No matter how sophisticated the methods become for extracting motion information from image sequences, it will not be possible to achieve the goal of human-like performance without integrating the optimization of processing that attention provides. Virtually all past surveys of computational models of motion processing completely ignore attention. However, the concept has crept into work over the years in a variety of ways. A second claim is that the biology of attention offers some interesting insights to guide future development. Many computational authors had previously commented that too little is known about how biological vision systems use task-directed attention in motion processing; this is no longer true. Here, I briefly summarize biological evidence that attentive processing affects all aspects of visual perception including motion, and again emphasize that this paper does not do justice to the breadth and depth of the field. New findings provide a critical link between the perception of visual actions and their execution. Together these findings point to a strategy for motion understanding closely related to that presented more than two decades ago.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal, J.K. and Cai, Q. 1999. Human motion analysis: A Review. Computer Vision and Image Understanding, 73(3):428–440.

    Google Scholar 

  • Aggarwal, J.K., Cai, Q., Liao, W., and Sabata, B. 1998. Nonrigid motion analysis: Articulated and elastic motion. Computer Vision and Image Understanding, 70(2):142–156.

    Google Scholar 

  • Albright, T. and Stoner, G. 1995. Visual motion perception. Proc. National Academy of Science, USA, 92(7):2433–2440.

    Google Scholar 

  • Allport, A. 1989. Visual attention. In Foundations of Cognitive Science, M.I. Posner (Ed.). MIT Press; Bradford Books: Cambridge, pp. 631–682.

    Google Scholar 

  • Aloimonos, Y. (Ed.). 1993. Active Perception. Lawrence Erlbaum Associates Publ.

  • Aloimonos, Y., Weiss, I., and Bandyopadhay, A. 1987. Active vision. Int. J. Computer Vision, 1(4):333–356.

    Google Scholar 

  • Badler, N.I. 1975. Temporal scene analysis: Conceptual descriptions of object movements. Ph.D. Thesis, Dept. of Computer Science, University of Toronto.

  • Bahcall, D. and Kowler, E. 1999. Attentional interference at small spatial separations. Vision Research, 39(1):71–86.

    Google Scholar 

  • Bajcsy, R. 1985. Active perception vs passive perception. In Proc. IEEE Workshop on Computer Vision: Representation and Control, Oct., Bellaire, Mich., pp. 55–62.

  • Ballard, D. 1991. Animate vision. Artificial Intelligence, 48:57–86.

    Article  Google Scholar 

  • Baloch, A. and Grossberg, S. 1997. A neural model of high-level motion processing: Line motion and formoiton dynamics. Vision Research, 37(21):3037–3059.

    Google Scholar 

  • Baluja, S. and Pomerleau, D.A. 1997. Expectation-based selective attention for the visual monitoring and control of a robot vehicle. Robotics and Autonomous Systems Journal, 22:329–344.

    Google Scholar 

  • Beauchamp, M.S. et al. 1997. Graded effects of spatial and featural attention on human area MT and associated motion processing areas. J. Neurophysiol., 78:516–520.

    Google Scholar 

  • Blake, A. 1992. Active vision. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.). MIT Press, pp. 61–63.

  • Bobick, A. 1997. Movement, activity, and action: The role of knowledge in the perception of motion. Royal Society Workshop on Knowledge-based Vision in Man and Machine, London, England.

  • Brefczynski, J.A. and DeYoe, E.A. 1999. A physiological correlate of the 'spotlight’ of visual attention. Nat Neurosci., 2(4):370–374.

    Google Scholar 

  • Brooks, R. 1986. A layered intelligent control system for a mobile robot. IEEE Journal of Robotics and Automation RA-2, April, 14–23.

  • Büchel, C. et al. 1998. The functional anatomy of attention to visual motion: A functional MRI study. Brain, 121:1281–1294.

    Google Scholar 

  • Burt, P. 1988. Attention mechanism for vision in a dynamic world. In Proc. 9th Int. Conf. on Pattern Recognition, pp. 977–987.

  • Caputo, G. and Guerra, S. 1998. Attentional selection by distractor suppression. Vision Research, 38(5):669–689.

    Google Scholar 

  • Cavanagh, P. 1992. Attention based motion processing. Science, 257:1563–1565.

    Google Scholar 

  • Cavanagh, P., Labianca, A.T., and Thornton, I. M. 2000. Attention‐based visual routines: Sprites. Cognition, in press.

  • Cedras, C. and Shah, M. 1994. A survey of motion analysis from moving light displays. IEEE CVPR-94, Seattle, Washington, pp. 214–221.

  • Cedras, C. and Shah, M. 1995. Motion-based recognition: A survey. Image and Vision Computing, 13(2):129–155.

    Google Scholar 

  • Chelazzi, L., Duncan, J., Miller, E., and Desimone, R. 1998. Responses of neurons in inferior temporal cortex during memoryguided visual search. J. Neurophysiology, 80:2918–2940.

    Google Scholar 

  • Clark, J.J. and Ferrier, N. 1988. Modal control of an attentive vision system. In Proc. ICCV, Dec.,Tarpon Springs Florida, pp. 514–523.

  • Desimone, R. and Duncan, J. 1995. Neural mechanisms of selective attention. Annual Review of Neuroscience, 18:193–222.

    Google Scholar 

  • Dickmanns, E. 1992. Expectation-based dynamic scene understanding. In Active Vision, Blake and Yuille (Eds.), MIT Press: Cambridge, Massachusetts, pp. 303–334.

    Google Scholar 

  • Dickmanns, E.D. and Wünsche, H.J. 1999. Dynamic vision for perception and control of motion. In Handbook of Computer Vision and Applications Vol. 2, B. Jahne, H. Haubeccker, and P. Geibler (Ed.). Academic Press.

  • Dreschler, L. and Nagel, H.H. 1982. On the selection of critical points and local curvature extrema of region boundaries for interframe matching. In Proc. Int. Conf. Pattern Recognition, Munich, pp. 542–544.

  • Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. 2000. Visuomotor neurons: Ambiguity of the discharge or ‘motor’ perception? Int. J. Psychophysiology, 35:165–177.

    Google Scholar 

  • Felleman, D. and Van Essen, D. 1991. Distributed hierarchical processing in the primate visual cortex. Cerebral Cortex, 1:1–47.

    PubMed  Google Scholar 

  • Gandhi, S.P., Heeger, D.J., and Boynton, G.M. 1999. Spatial attention affects brain activity in human primary visual cortex. Proc Natl Acad Sci USA, 96(6):3314–3319.

    Google Scholar 

  • Gavrila, D.M. 1999. The visual analysis of human movement: A Survey. Computer Vision and Image Understanding, 73(1):82–98.

    Google Scholar 

  • Hildreth, E. and Royden, C. 1995. Motion perception. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.). pp. 585–588. MIT Press.

  • Hoffman, J. 1998. Visual attention and eye movements. In Attention, H. Pashler (Ed.). Psychology Press, 119–154.

  • Humphreys, G. and Riddoch, M. 2001. Detection by action: Neuropsychological evidence for action-defined templates in search. Nature Neuroscience, 4:84–88.

    Google Scholar 

  • Joseph, J., Chun, M., and Nakayama, K. 1997. Attentional requirements in a ‘preattentive’ feature search task. Nature, 387:805–807.

    Google Scholar 

  • Kalman, R.E. 1960. A new approach to linear filtering and prediction problems. Trans. ASMEE J. Basic Eng., 82:35OE45.

    Google Scholar 

  • Kastner, S., De Weerd, P., Desimone, R., and Ungerleider, L. 1998.Mechanisms of directed attention in the human extrastriate cortex as revealed by functional MRI. Science, 282:108–111.

    Google Scholar 

  • Kastner, S. and Ungerleider, L. 2000. Mechanisms of visual attention in the human cortex. Annual Rev. Neuroscience, 23:315–341.

    Google Scholar 

  • Koch, C. and Ullman, S. 1985. Shifts in selective visual attention: Towards the underlying neural circuitry. Hum. Neurobiology, 4:219–227.

    Google Scholar 

  • Lu, Z. and Sperling, G. 1995. Attention-generated apparent motion. Nature, 377:237–239.

    Google Scholar 

  • Mann, R., Jepson, A., and Siskind, J. 1997. The computational perception of scene dynamics. Computer Vision and Image Understanding, 65(2):113–128.

    Google Scholar 

  • Marr, D. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman.

  • Martin, W. and Aggarwal, J. 1988.

  • Metzger, W. 1974. Consciousness, Perception and Action. In Handbook of Perception vol. 1, Historical and Philosophical Roots of Perception, Academic Press, pp. 109–125.

  • Moran, J. and Desimone, R. 1985. Selective attention gates visual processing in the extrastriate cortex. Science, 229:782–784.

    PubMed  Google Scholar 

  • Nagel, H.H. 1981. Image sequence analysis: What can we learn from applications? In Image Sequence Analysis, T.S. Huang (Ed.). Springer Verlag: Berlin, pp. 19–228.

    Google Scholar 

  • Nagel, H.H. 1988. From image sequences towards conceptual descriptions. Image and Vision Computing, 6(2):59–74.

    Google Scholar 

  • Nowlan, S. and Sejnowski, T. 1995. A selection model for motion processing in area MT of primates. The Journal of Neuroscience, 15(2):1195–1214.

    Google Scholar 

  • O'Craven, K.M. et al. 1997. Voluntary attention modulates fMRI activity in human MT-MST. Neuron, 18:591–598.

    Google Scholar 

  • Olshausen, B. Anderson and C. Van Essen, D. 1993. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. of Neuroscience, 13(1):4700–4719.

    Google Scholar 

  • Pahlavan, Uhlin and Eklundh. 1993. Active vision as a methodology. In Active Perception, Y. Aloimonos (Ed.). Lawrence Erlbaum Associates Publ., pp.19–46.

  • Parodi, P., Lanciwicki, R., Vijh, A., and Tsotsos J.K. 1998. Empirically-derived estimates of the complexity of labeling line drawings of polyhedral scenes. Artificial Intelligence, 105:47–75.

    Google Scholar 

  • Pashler, H. 1997. The Psychology of Attention. MIT Press.

  • Perrett, D., Mistlin, A., Harries, M., and Chitty, A. 1990. Understanding the visual appearance and consequence of hand actions. In Vision and Action: The Control of Grasping, M. Goodale (Ed.). Ablex Publishing Corp., pp. 163–180.

  • Pinhanez, C. and Bobick, A. 1997. Human action detection using PNF propagation of temporal constraints. MIT Media Lab TR 423.

  • Raymond, J. 2000. Attentional modulation of visual motion perception. Trends in Cognitive Sciences, 4(2):42–50.

    Google Scholar 

  • Rees, G. et al. 1997. Modulation of irrelevant motion perception by varying load in an unrelated task. Science, 278:1616–1619.

    Google Scholar 

  • Rensink, R. 1989. A new proof of the NP-completeness of visual match. TR 89–22, Dept. of Computer Science, University of British Columbia.

  • Reynolds, J., Chelazzi, L., and Desimone, R. 1999. Competitive mechanisms subserve attention in macaque areas V2 and V4. The Journal of Neuroscience, 19(5):1736–1753.

    Google Scholar 

  • Rizzolatti, G. and Arbib, M. 1998. Language within our grasp. Trends in Neuroscience, 21(5):188–194.

    Google Scholar 

  • Roelfsema, P., Lamme, V., and Spekreijse, H. 1998. Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395:376–380.

    Article  PubMed  Google Scholar 

  • Salin, P.A. and Bullier, J. 1995. Corticocortical connections in the visual system: Structure and function. Physiological Reviews, 75(1):107–154.

    PubMed  Google Scholar 

  • Sandini, G., Gandolfo, F., Grosso, E., and Tistarelli, M. 1993. Vision during action. In Active Perception,Y. Aloimonos (Ed.). Lawrence Erlbaum Associates Publ.

  • Seidemann, E. and Newsome, W.T. 1999. Effect of spatial attention on the responses of area MT neurons. J. Neurophysiol., 81:1783–1794.

    PubMed  Google Scholar 

  • Shah, M. and Jain, R. 1997. Visual recognition of activities, gestures, facial expressions and speech: An introduction and a perspective. In Motion-Based Recognition, M. Shah and R. Jain (Ed.). Kluwer Academic Publishers.

  • Siskind, J.M. 1995. Grounding language in perception. Artificial Intelligence Review, 8:371–391.

    Google Scholar 

  • Sperling, G. and Lu, Z.-L. 1998. A systems analysis of visual motion perception. In High-Level Motion Processing, T. Watanabe (Ed.). MIT Press. pp. 153–183.

  • Styles, E. 1997. The Psychology of Attention. Psychology Press.

  • Treue, S. and Maunsell, J.H.R. 1996. Attentional modulation of visual motion processing in cortical areas MT and MST. Nature, 382:539–541.

    Google Scholar 

  • Treue, S. and Trujillo, J.C.M. 1999. Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399:575–579.

    Article  PubMed  Google Scholar 

  • Tsotsos, J.K. 1977. Some notes on motion understanding. In Proc. Fifth International Joint Conference on Artificial Intelligence, MIT, Cambridge: Mass. p. 611.

    Google Scholar 

  • Tsotsos, J.K. 1980. A framework for visual motion understanding. Ph.D. Thesis, Dept. of Computer Science, University of Toronto.

  • Tsotsos, J.K. 1985. The role of knowledge organization in representation and interpretation of time-varying data: The ALVEN system. Computational Intelligence, 1(1):16–32.

    Google Scholar 

  • Tsotsos, J.K. 1987a. Representational axes and temporal cooperative processes. In Vision, Brain and Cooperative Computation, M. Arbib and A. Hanson (Ed.). MIT Press/Bradford Books, pp. 361–418.

  • Tsotsos, J.K. 1987b. A ‘complexity level’ analysis of vision. In Proc. International Conference on Computer Vision, London, England.

  • Tsotsos, J.K. 1989. The complexity of perceptual search tasks. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, Michigan, pp. 1571–1577.

  • Tsotsos, J.K. 1990. Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13(3):423–445.

    Google Scholar 

  • Tsotsos, J.K. 1992. On the relative complexity of passive vs active visual search. International Journal of Computer Vision, 7(2):127–141.

    Google Scholar 

  • Tsotsos, J.K. 1995b.On behaviorist intelligence and the scaling problem. Artificial Intelligence, 75:135–160.

    Google Scholar 

  • Tsotsos, J.K. in press. Neurobiological models of visual attention. 5th Course of International Summer SchoolNeural Nets E.R. Caianielloon Visual Attention Mechanisms.

  • Tsotsos, J.K., Culhane, S., and Cutzu, F. 2001. From theoretical foundations to a hierarchical circuit for selective attention. In Visual Attention and Cortical Circuits, J. Braun, C. Koch, and J. Davis, (Ed.). MIT Press, pp. 285–306.

  • Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., and Nuflo, F. 1995. Modeling visual attention via selective tuning. Artificial Intelligence, 78(1–2):507–547.

    Google Scholar 

  • Tsotsos, J., Mylopoulos, J., Covvey, H.D., and Zucker, S.W. 1980. A framework for visual motion understanding. IEEE Pattern Analysis and Machine Intelligence, Special Issue on Computer Analysis of Time-Varying Imagery, pp. 563–573.

  • Ullman S. 1984.Visual routines. In Visual Cognition, S. Pinker (Ed.). MIT Press, pp. 97–160.

  • Ullman, S. 1995. Sequence seeking and counter streams: A computational model for bidirectional information flow in the visual cortex. Cerebral Cortex, 1:1–11.

    Google Scholar 

  • Valdes-Sosa, M. et al. 1998. Switching attention without shifting the spotlight: Object-based attentional modulation of brain potentials. J. Cogn. Neurosci., 10:137–151.

    Google Scholar 

  • Vanduffel, W., Tootell, R., and Orban, G. 2000. Attention-dependent suppression of metabolic activity in the early stages of the macaque visual system. Cerebral Cortex, 10:109–126.

    Google Scholar 

  • Wachter, S. and Nagel, H.H. 1999. Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3):174–192.

    Google Scholar 

  • Watanabe, T. and Miyauchi, S. 1998. Roles of attention and form in visual motion processing: Psychophysical and brain-imaging studies. In High-Level Motion Processing, T. Watanabe (Ed.). MIT Press, pp. 95–113.

  • Wertheimer, M. 1912, 1961. Experimentelle studien über das sehen von bewegung. In Classics in Psychology, T. Shipley (Ed.). Philosophical Library: New York, Vol. 61, pp. 161–265.

    Google Scholar 

  • Ye, Y. and Tsotsos, J.K. 1999. Sensor planning for object search. Computer Vision and Image Understanding, 73(2):145–168.

    Google Scholar 

  • Yeshurun, Y. 1997. Attentional mechanisms in computer vision. In Artificial Vision: Image Description, Recognition and Communication, V. Cantoni, S. Levialdi, and V. Roberto (Ed.). Academic Press, pp. 43–52.

  • Yeshurun, Y. and Carrasco, M. 1999. Spatial attention improves performance in spatial resolution tasks. Vision Research, 39(2):293–306.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsotsos, J.K. Motion Understanding: Task-Directed Attention and Representations that Link Perception with Action. International Journal of Computer Vision 45, 265–280 (2001). https://doi.org/10.1023/A:1013666302043

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013666302043

Navigation