Skip to main content

Advertisement

Log in

Action Recognition Using a Bio-Inspired Feedforward Spiking Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We propose a bio-inspired feedforward spiking network modeling two brain areas dedicated to motion (V1 and MT), and we show how the spiking output can be exploited in a computer vision application: action recognition. In order to analyze spike trains, we consider two characteristics of the neural code: mean firing rate of each neuron and synchrony between neurons. Interestingly, we show that they carry some relevant information for the action recognition application. We compare our results to Jhuang et al. (Proceedings of the 11th international conference on computer vision, pp. 1–8, 2007) on the Weizmann database. As a conclusion, we are convinced that spiking networks represent a powerful alternative framework for real vision applications that will benefit from recent advances in computational neuroscience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Adelson, E., & Bergen, J. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2, 284–299.

    Article  Google Scholar 

  • Bayerl, P., & Neumann, H. (2007). Disambiguating visual motion by form–motion interaction—a computational model. International Journal of Computer Vision, 72(1), 27–45.

    Article  Google Scholar 

  • Beintema, J., & Lappe, M. (2002). Perception of biological motion without local image motion. Proceedings of the National Academy of Sciences of the USA, 99(8), 5661–5663.

    Article  Google Scholar 

  • Berzhanskaya, J., Grossberg, S., & Mingolla, E. (2007). Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception. Spatial Vision, 20(4), 337–395.

    Article  Google Scholar 

  • Biederlack, J., Castelo-Branco, M., Neuenschwander, S., Wheeler, D. W., Singer, W., & Nikoli, D. (2006). Brightness induction: rate enhancement and neuronal synchronization as complementary codes. Neuron, 52(6), 1073–1083.

    Article  Google Scholar 

  • Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review of Psychology, 58, 12.1–12.27.

    Article  Google Scholar 

  • Blank, M., Gorelick, L., Shechtman, E., Irani, M., & Basri, R. (2005). Actions as space-time shapes. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1395–1402).

  • Bobick, A., & Davis, J. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.

    Article  Google Scholar 

  • Born, R. T. (2000). Center-surround interactions in the middle temporal visual area of the owl monkey. Journal of Neurophysiology, 84, 2658–2669.

    Google Scholar 

  • Born, R., & Bradley, D. (2005). Structure and function of visual area MT. Annual Reviews—Neuroscience, 28, 157–189.

    Article  Google Scholar 

  • Buracas, G. T., & Albright, T. D. (1996). Contribution of area mt to perception of three-dimensional shape: a computational study. Vision Research, 36(6), 869–87.

    Article  Google Scholar 

  • Casile, A., & Giese, M. (2003). Roles of motion and form in biological motion recognition. In Lecture notes in computer science : Vol. 2714. Artificial networks and neural information processing (pp. 854–862). Berlin: Springer.

    Google Scholar 

  • Casile, A., & Giese, M. (2005). Critical features for the recognition of biological motion. Journal of Vision, 5, 348–360.

    Article  Google Scholar 

  • Cessac, B., Rostro-Gonzalez, H., Vasquez, J., & Vieville, T. (2008). To which extend is the “neural code” a metric? In Deuxième conférence française de neurosciences computationnelles.

  • Collins, R., Gross, R., & Shi, J. (2002). Silhouette-based human identification from body shape and gait. In 5th intl. conf. on automatic face and gesture recognition (p. 366).

  • Conway, B., & Livingstone, M. (2003). Space-time maps and two-bar interactions of different classes of direction-selective cells in macaque V1. Journal of Neurophysiology, 89, 2726–2742.

    Article  Google Scholar 

  • Cutler, R., & Davis, L. (2000). Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8)

  • Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge: MIT Press.

    MATH  Google Scholar 

  • De Valois, R., Cottaris, N., (2000). Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vision Research, 40, 3685–3702.

    Article  Google Scholar 

  • Destexhe, A., Rudolph, M., & Paré, D. (2003). The high-conductance state of neocortical neurons in vivo. Nature Reviews Neuroscience, 4, 739–751.

    Article  Google Scholar 

  • Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS-PETS (pp. 65–72).

  • Efros, A., Berg, A., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In Proceedings of the 9th international conference on computer vision (Vol. 2, pp. 726–734).

  • Escobar, M. J., & Kornprobst, P. (2008). Action recognition with a bio–inspired feedforward motion processing model: The richness of center-surround interactions. In Lecture notes in computer science. Proceedings of the 10th European conference on computer vision. Berlin: Springer.

    Google Scholar 

  • Escobar, M. J., Wohrer, A., Kornprobst, P., & Vieville, T. (2006). Biological motion recognition using an mt-like model. In Proceedings of 3rd Latin American robotic symposium.

  • Felleman, D., & Essen, D. V. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex, 1, 1–47.

    Article  Google Scholar 

  • Fellous, J. M., Tiesinga, P. H. E., Thomas, P. J., & Sejnowski, T. J. (2004). Discovering spike patterns in neural responses. The Journal of Neuroscience, 24(12), 2989–3001.

    Article  Google Scholar 

  • Fries, P., Neuenschwander, S., Engel, A. K., Goebel, R., & Singer, W. (2001). Rapid feature selective neuronal synchronization through correlated latency shifting. Nature Neuroscience, 4(2), 194–200.

    Article  Google Scholar 

  • Gautrais, J., & Thorpe, S. (1998). Rate coding vs temporal order coding: a theoretical approach. Biosystems, 48, 57–65.

    Article  Google Scholar 

  • Gavrila, D. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73(1), 82–98.

    Article  MATH  Google Scholar 

  • Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In Proceedings of the international conference on computer vision and pattern recognition. San Francisco: IEEE.

    Google Scholar 

  • Gerstner, W., & Kistler, W. (2002). Spiking neuron models. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Giese, M., & Poggio, T. (2003). Neural mechanisms for the recognition of biological movements and actions. Nature Reviews Neuroscience, 4, 179–192.

    Article  Google Scholar 

  • Gollisch, T., & Meister, M. (2008). Rapid neural coding in the retina with relative spike latencies. Science, 319, 1108–1111.

    Article  Google Scholar 

  • Goncalves, L., DiBernardo, E., Ursella, E., & Perona, P. (1995). Monocular tracking of the human arm in 3D. In Proceedings of the 5th international conference on computer vision (pp. 764–770).

  • Grzywacz, N., & Yuille, A. (1990). A model for the estimate of local image velocity by cells on the visual cortex. Proceedings of the Royal Society London B: Biological Sciences, 239(1295), 129–161.

    Article  Google Scholar 

  • Hiris, E., Humphrey, D., & Stout, A. (2005). Temporal properties in masking biological motion. Perception and Psychophysics, 67(3), 435–443.

    Google Scholar 

  • Hogg, D. (1983). Model-based vision: a paradigm to see a walking person. Image and Vision Computing, 1(1), 5–20.

    Article  Google Scholar 

  • Hubel, D., & Wiesel, T. (1962). Receptive fields, binocular interaction and functional architecture in the cat visual cortex. Journal of Physiology, 160, 106–154.

    Google Scholar 

  • Izhikevich, E. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15(5), 1063–1070.

    Article  Google Scholar 

  • Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the 11th international conference on computer vision (pp. 1–8).

  • Kreuz, T., Haas, J. S., Morelli, A., Abarbanel, H. D., & Politi, A. (2007). Measuring spike train synchrony. Journal of Neuroscience Methods, 165, 151–161.

    Article  Google Scholar 

  • Laptev, I., Capuo, B., Schultz, C., & Lindeberg, T. (2007). Local velocity-adapted motion events for spatio-temporal recognition. Computer Vision and Image Understanding, 108(3), 207–229.

    Article  Google Scholar 

  • Lui, L. L., Bourne, J. A., & Rosa, M. G. P. (2007). Spatial summation, end inhibition and side inhibition in the middle temporal visual area MT. Journal of Neurophysiology, 97(2), 1135.

    Article  Google Scholar 

  • Maldonado, P., Babul, C., Singer, W., Rodriguez, E., Berger, D., & Grün, S. (2008). Synchronization of neuronal responses in primarily visual cortex of monkeys viewing natural images. Journal of Neurophysiology, 100, 1523–1532.

    Article  Google Scholar 

  • Mestre, D. R., Masson, G. S., & Stone, L. S. (2001). Spatial scale of motion segmentation from speed cues. Vision Research, 41(21), 2697–2713.

    Article  Google Scholar 

  • Michels, L., Lappe, M., & Vaina, L. (2005). Visual areas involved in the perception of human movement from dynamic analysis. Brain Imaging, 16(10), 1037–1041.

    Google Scholar 

  • Mokhber, A., Achard, C., & Milgram, M. (2008). Recognition of human behavior by space-time silhouette characterization. Pattern Recognition Letters, 29(1), 81–89.

    Article  Google Scholar 

  • Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. In Proceedings of the international conference on computer vision and pattern recognition (pp. 11–18).

  • Neuenschwander, S., Castelo-Branco, M., & Singer, W. (1999). Synchronous oscillations in the cat retina. Vision Research, 39(15), 2485–2497.

    Article  Google Scholar 

  • Niebles, J. C., Wang, H., & Fei-Fei, L. (2006). Unsupervised learning of human action categories using spatial-temporal words. In British machine vision conference.

  • Nowak, L., & Bullier, J. (1997). The timing of information transfer in the visual system. In Cerebral cortex (Vol. 12, pp. 205–241). New York: Plenum Press. Chap. 5.

    Google Scholar 

  • Nowlan, S., & Sejnowski, T. (1995). A selection model for motion processing in area MT of primates. Journal of Neuroscience, 15, 1195–1214.

    Google Scholar 

  • Pack, C. C., Hunter, J. N., & Born, R. T. (2005). Contrast dependence of suppressive influences in cortical area mt of alert macaque. Journal of Neurophysiology, 93(3), 1809–1815.

    Article  Google Scholar 

  • Perge, J., Borghuis, B., Bours, R., Lankheet, M., & van Wezel, R. (2005). Temporal dynamics of direction tuning in motion-sensitive macaque area mt. Journal of Neurophysiology, 93, 2194–2116.

    Google Scholar 

  • Perkel, D. H., & Bullock, T. H. (1968). Neural coding. Neurosciences Research Program Bulletin, 6, 221–348.

    Google Scholar 

  • Pinto, N., Cox, D. D., & DiCarlo, J. J. (2008). Why is real-world visual object recognition hard? PLoS Computational Biology, 4(1), e27.

    Article  MathSciNet  Google Scholar 

  • Polana, R., & Nelson, R. (1997). Detection and recognition of periodic, non-rigid motion. International Journal of Computer Vision, 23(3), 261–282.

    Article  Google Scholar 

  • Riehle, A., Grün, S., Diesmann, M., & Aertsen, A. (1997). Spike synchronization and rate modulation differentially involved in motor cortical function. Science, 278, 1950–1953.

    Article  Google Scholar 

  • Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes: Exploring the neural code. Cambridge: Bradford Books.

    Google Scholar 

  • Robson, J. (1966). Spatial and temporal contrast-sensitivity functions of the visual system. Journal of Optical Society of America, 69, 1141–1142.

    Article  Google Scholar 

  • Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (2004). Synchrony and covariation of firing rates in the primary visual cortex during contour grouping. Nature Neuroscience, 7(9), 982–991.

    Article  Google Scholar 

  • Rohr, K. (1994). Toward model-based recognition of human movements in image sequences. CVGIP, Image Understanding, 1, 94–115.

    Article  Google Scholar 

  • Rust, N., Mante, V., Simoncelli, E., & Movshon, J. (2006). How MT cells analyze the motion of visual patterns. Nature Neuroscience, 11, 1421–1431.

    Article  Google Scholar 

  • Saul, A., Carras, P., & Humphrey, A. (2005). Temporal properties of inputs to direction-selective neurons in monkey v1. Journal of Neurophysiology, 94, 282–294.

    Article  Google Scholar 

  • Seitz, S., & Dyer, C. (1997). View-invariant analysis of cyclic motion. The International Journal of Computer Vision, 25(3), 231–251.

    Article  Google Scholar 

  • Sereno, M. E., & Sereno, M. L. (1999). 2-d center-surround effects on 3-d structure-from-motion. Journal of Experimental Psychology: Human Perception and Performance, 25(6), 1834–1854.

    Article  Google Scholar 

  • Serre, T. (2006). Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.

  • Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the international conference on computer vision and pattern recognition (pp. 994–1000).

  • Shah, M., & Jain, R. (1997). Motion-based recognition. Computational imaging and vision series. Dordrecht: Kluwer Academic.

    MATH  Google Scholar 

  • Sigala, R., Serre, T., Poggio, T., & Giese, M. (2005). Learning features of intermediate complexity for the recognition of biological motion. In LNCS : Vol. 3696. ICANN 2005 (pp. 241–246). Berlin: Springer.

    Google Scholar 

  • Simoncelli, E. P., & Heeger, D. (1998). A model of neuronal responses in visual area MT. Vision Research, 38, 743–761.

    Article  Google Scholar 

  • Smith, M., Majaj, N., & Movshon, A. (2005). Dynamics of motion signaling by neurons in macaque area mt. Nature Neuroscience, 8(2), 220–228.

    Article  Google Scholar 

  • Snowden, R. J., Treue, S., Erickson, R. G., & Andersen, R. A. (1991). The response of area mt and v1 neurons to transparent motion. The Journal of Neuroscience, 11(9), 2768–2785.

    Google Scholar 

  • Thorpe, S. (1990). Spike arrival times: A highly efficient coding scheme for neural networks. In Parallel processing in neural systems and computers (pp. 91–94).

  • Thorpe, S. (2002). Ultra-rapid scene categorization with a wave of spikes. In Lecture notes in computer science : Vol. 2525. Biologically motivated computer vision (pp. 1–15). Berlin: Springer.

    Chapter  Google Scholar 

  • Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522.

    Article  Google Scholar 

  • Topsoe, F. (2000). Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on Information Theory, 46(4), 1602–1609.

    Article  MathSciNet  Google Scholar 

  • Tsotsos, J., Liu, Y., Martinez-Trujillo, J., Pomplun, M., Simine, E., & Zhou, K. (2005). Attending to visual motion. Computer Vision and Image Understanding, 100, 3–40.

    Article  Google Scholar 

  • VanRullen, R., & Thorpe, S. J. (2002). Surfing a spike wave down the ventral stream. Vision Research, 42, 2593–2615.

    Article  Google Scholar 

  • Victor, J., & Purpura, K. (1996). Nature and precision of temporal coding in visual cortex: a metric-space analysis. Journal of Neurophysiology, 76, 1310–1326.

    Google Scholar 

  • Wang, L., & Suter, D. (2007). Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model. In Proceedings CVPR.

  • Wang, D. L., & Terman, D. (1995). Locally excitatory globally inhibitory oscillator networks. IEEE Transactions on Neural Networks, 6, 283–286.

    Article  Google Scholar 

  • Watson, A., & Ahumada, A. (1983). A look at motion in the frequency domain (NASA Tech. Memo).

  • Wielaard, D. J., Shelley, M., McLaughlin, D., & Shapley, R. (2001). How simple cells are made in a nonlinear network model of the visual cortex. The Journal of Neuroscience, 21(14), 5203–5211.

    Google Scholar 

  • Wohrer, A., & Kornprobst, P. (2008). Virtual Retina: A biological retina model and simulator, with contrast gain control. Journal of Computational Neuroscience. doi:10.1007/s10827-008-0108-4.

    Google Scholar 

  • Wong, S. F., Kim, T. K., & Cipolla, R. (2007). Learning motion categories using both semantic and structural information. In Proceedings of the international conference on computer vision and pattern recognition (pp. 1–6).

  • Xiao, D., Raiguel, S., Marcar, V., Koenderink, J., & Orban, G. A. (1995). Spatial heterogeneity of inhibitory surrounds in the middle temporal visual area. Proceedings of the National Academy of Sciences, 92(24), 11303–11306.

    Article  Google Scholar 

  • Xiao, D. K., Raiguel, S., Marcar, V., & Orban, G. A. (1997). The spatial distribution of the antagonistic surround of MT/V5 neurons. Cereb Cortex, 7(7), 662–677.

    Article  Google Scholar 

  • Zelnik-Manor, L., & Irani, M. (2001). Event-based analysis of video. In Proceedings of CVPR’01 (Vol. 2, pp. 123–128).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria-Jose Escobar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Escobar, MJ., Masson, G.S., Vieville, T. et al. Action Recognition Using a Bio-Inspired Feedforward Spiking Network. Int J Comput Vis 82, 284–301 (2009). https://doi.org/10.1007/s11263-008-0201-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0201-1

Keywords

Navigation