Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

Ommer, Björn; Mader, Theodor; Buhmann, Joachim M.

doi:10.1007/s11263-009-0211-7

Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

Published: 12 February 2009

Volume 83, pages 57–71, (2009)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Björn Ommer¹,
Theodor Mader² &
Joachim M. Buhmann²

422 Accesses
31 Citations
Explore all metrics

Abstract

Category-level object recognition, segmentation, and tracking in videos becomes highly challenging when applied to sequences from a hand-held camera that features extensive motion and zooming. An additional challenge is then to develop a fully automatic video analysis system that works without manual initialization of a tracker or other human intervention, both during training and during recognition, despite background clutter and other distracting objects. Moreover, our working hypothesis states that category-level recognition is possible based only on an erratic, flickering pattern of interest point locations without extracting additional features. Compositions of these points are then tracked individually by estimating a parametric motion model. Groups of compositions segment a video frame into the various objects that are present and into background clutter. Objects can then be recognized and tracked based on the motion of their compositions and on the shape they form. Finally, the combination of this flow-based representation with an appearance-based one is investigated. Besides evaluating the approach on a challenging video categorization database with significant camera motion and clutter, we also demonstrate that it generalizes to action recognition in a natural way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Moving Object Segmentation: All You Need is SAM (and Flow)

Language-Motivated Approaches to Action Recognition

Appearance-Based Refinement for Object-Centric Motion Segmentation

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Avidan, S. (2005). Ensemble tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 494–501).
Blank, M., Gorelick, L., Shechtman, E., Irani, M., & Basri, R. (2005). Actions as space-time shapes. In Proceedings of the IEEE international conference on computer vision (pp. 1395–1402).
Brostow, G. J., & Cipolla, R. (2006). Unsupervised Bayesian detection of independent motion in crowds. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 594–601).
Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In Proceedings of the European conference on computer vision, (pp. 44–57).
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: A library for support vector machines.
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–575.
Article Google Scholar
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the European conference on computer vision. Workshop stat. learn. in comp. vis.
Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In Proceedings of the European conference on computer vision (pp. 428–441).
Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. J. (2005). Behavior recognition via sparse spatio-temporal features. In International workshop on performance evaluation of tracking and surveillance (pp. 65–72).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271).
Goldberger, J., & Greenspann, H. (2006). Context-based segmentation of image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 463–468.
Article Google Scholar
Grabner, M., Grabner, H., & Bischof, H. (2007). Learning features for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Hartley, R. I., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Google Scholar
Irani, M., Rousso, B., & Peleg, S. (1994). Computing occluding and transparent motions. International Journal of Computer Vision, 12(1), 5–16.
Article Google Scholar
Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the IEEE international conference on computer vision.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2145–2152).
Pawan Kumar, M., Torr, P. H., & Zisserman, A. (2008). Learning layered motion segmentations of video. International Journal of Computer Vision, 76(3), 301–319.
Article Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2169–2178).
Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2007). Dynamic 3D scene analysis from a moving vehicle. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of the European conference on computer vision. Workshop stat. learn. in comp. vis.
Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 775–781).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679).
Magee, D. R., & Boyle, R. D. (2002). Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. Image and Vision Computing, 20(8), 581–594.
Article Google Scholar
Mahindroo, A., Bose, B., Chaudhury, S., & Harit, G. (2002). Enhanced video representation using objects. In Proceedings of the Indian conference on computer vision (pp. 105–112).
Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441.
Article MATH MathSciNet Google Scholar
McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. New York: John Wiley.
MATH Google Scholar
Niebles, J. C., & Fei Fei, L. (2007). A hierarchical model of shape and appearance for human action classification. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Ommer, B., & Buhmann, J. M. (2006). Learning compositional categorization models. In Proceedings of the European conference on computer vision (pp. 316–329).
Ommer, B., & Buhmann, J. M. (2007). Compositional object recognition, segmentation, and tracking in video. In Energy minimization methods in computer vision and pattern recognition (pp. 318–333).
Ommer, B., & Buhmann, J. M. (2007). Learning the compositional nature of visual objects. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Perera, A. G. A., Brooksby, G., Hoogs, A., & Doretto, G. (2006). Moving object segmentation using scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. Workshop on perceptual organization in computer vision.
Pontil, M., Rogai, S., & Verri, A. (1998). Recognizing 3-d objects with linear support vector machines. In Proceedings of the European conference on computer vision (pp. 469–483).
Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local SVM approach. In Proceedings of the international conference on pattern recognition (pp. 32–36).
Seemann, E., & Schiele, B. (2006). Cross-articulation learning for robust detection of pedestrians. In Pattern recognition (symposium of the DAGM) (pp. 242–252).
Shi, J., & Tomasi, C. (1994). Good features to track. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 593–600).
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their localization in images. In Proceedings of the IEEE international conference on computer vision (pp. 370–377).
Sivic, J., Schaffalitzky, F., & Zisserman, A. (2006). Object level grouping for video shots. International Journal of Computer Vision, 67(2), 189–210.
Article Google Scholar
Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 246–252).
Vidal, R., Ma, Y., & Sastry, S. (2003). Generalized principal component analysis (GPCA). In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 621–628).
Vidal, R., & Ravichandran, A. (2005). Optical flow estimation and segmentation of multiple moving dynamic textures. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 516–521).
Viola, P., Jones, M. J., & Snow, D. (2003). Detecting pedestrians using patterns of motion and appearance. In Proceedings of the IEEE international conference on computer vision (pp. 734–741).
Wallraven, C., & Bülthoff, H. H. (2001). Automatic acquisition of exemplar-based representations for recognition from image sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition. Workshop on models vs. exemplars.
Wang, J. Y. A., & Adelson, E. H. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.
Article Google Scholar
Yan, J. Y., & Pollefeys, M. (2006). A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Proceedings of the European conference on computer vision (pp. 94–106).
Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2126–2133).

Download references

Author information

Authors and Affiliations

Department of EECS, University of California, Berkeley, USA
Björn Ommer
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Theodor Mader & Joachim M. Buhmann

Authors

Björn Ommer
View author publications
You can also search for this author inPubMed Google Scholar
Theodor Mader
View author publications
You can also search for this author inPubMed Google Scholar
Joachim M. Buhmann
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Björn Ommer.

Additional information

This work was supported in part by the Swiss national science foundation under contract no. 200021-107636.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ommer, B., Mader, T. & Buhmann, J.M. Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera. Int J Comput Vis 83, 57–71 (2009). https://doi.org/10.1007/s11263-009-0211-7

Download citation

Received: 03 June 2008
Accepted: 09 January 2009
Published: 12 February 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11263-009-0211-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Moving Object Segmentation: All You Need is SAM (and Flow)

Language-Motivated Approaches to Action Recognition

Appearance-Based Refinement for Object-Centric Motion Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Moving Object Segmentation: All You Need is SAM (and Flow)

Language-Motivated Approaches to Action Recognition

Appearance-Based Refinement for Object-Centric Motion Segmentation

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Below is the link to the electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now