Skip to main content

Sequential Learning of Layered Models from Video

  • Chapter
Toward Category-Level Object Recognition

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4170))

  • 2813 Accesses

Abstract

A popular framework for the interpretation of image sequences is the layers or sprite model, see e.g. [15], [6] . Jojic and Frey [8] provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretized transformations (e.g. translations, or affines) for each layer simultaneously. Exact computation with this model scales exponentially with the number of objects, so Jojic and Frey used an approximate variational algorithm to speed up inference. Williams and Titsias [16] proposed an alternative sequential algorithm for the extraction of objects one at a time using a robust statistical method, thus avoiding the combinatorial explosion.

In this chapter we elaborate on our sequential algorithm in the following ways: Firstly, we describe a method to speed up the computation of the transformations based on approximate tracking of the multiple objects in the scene. Secondly, for sequences where the motion of an object is large so that different views (or aspects) of the object are visible at different times in the sequence, we learn appearance models of the different aspects. We demonstrate our method on four video sequences, including a sequence where we learn articulated parts of a human body.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allan, M., Titsias, M.K., Williams, C.K.I.: Fast Learning of Sprites using Invariant Features. In: Proceedings of the British Machine Vision Conference 2005, pp. 40–49 (2005)

    Google Scholar 

  2. Black, M.J., Jepson, A.D.: EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. In: Proc. ECCV, pp. 329–342 (1996)

    Google Scholar 

  3. Darrell, T., Pentland, A.P.: Cooperative Robust Estimation Using Layers of Support. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 474–487 (1995)

    Article  Google Scholar 

  4. Fitzgibbon, A., Zisserman, A.: On Affine Invariant Clustering and Automatic Cast Listing in Movies. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 304–320. Springer, Heidelberg (2002)

    Google Scholar 

  5. Frey, B.J., Jojic, N.: Transformation Invariant Clustering Using the EM Algorithm. IEEE Trans Pattern Analysis and Machine Intelligence 25(1), 1–17 (2003)

    Article  Google Scholar 

  6. Irani, M., Rousso, B., Peleg, S.: Computing Occluding and Transparent Motions. International Journal of Computer Vision 12(1), 5–16 (1994)

    Article  Google Scholar 

  7. Jepson, A.D., Fleet, D.J., Black, M.J.: A Layered Motion Representation with Occlusion and Compact Spatial Support. In: ECCV 2002. LNCS, vol. 2353, pp. 692–706. Springer, Heidelberg (2002)

    Google Scholar 

  8. Jojic, N., Frey, B.J.: Learning Flexible Sprites in Video Layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2001. IEEE Computer Society Press, Kauai (2001)

    Google Scholar 

  9. Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pp. 158–163 (2004)

    Google Scholar 

  10. Sawhney, H.S., Ayer, S.: Compact Representations of Videos Through Dominant and Multiple Motion Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 814–830 (1996)

    Article  Google Scholar 

  11. Tao, H., Sawhney, H.S., Kumar, R.: Dynamic Layer Representation with Applications to Tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.II: 134–141 (2000)

    Google Scholar 

  12. Titsias, M.K., Williams, C.K.I.: Fast unsupervised greedy learning of multiple objects and parts from video. In: Proc. Generative-Model Based Vision Workshop (2004)

    Google Scholar 

  13. Titsias, M.K.: Unsupervised Learning of Multiple Objects in Images. Ph.D thesis, School of Informatics, University of Edinburgh (2005)

    Google Scholar 

  14. Torr, P.H.S.: Geometric motion segmentation and model selection. Phil. Trans. Roy. Soc. Lond. A 356, 1321–1340 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  15. Wang, J.Y.A., Adelson, E.H.: Representing Moving Images with Layers. IEEE Transactions on Image Processing 3(5), 625–638 (1994)

    Article  Google Scholar 

  16. Williams, C.K.I., Titsias, M.K.: Greedy Learning of Multiple Objects in Images using Robust Statistics and Factorial Learning. Neural Computation 16(5), 1039–1062 (2004)

    Article  MATH  Google Scholar 

  17. Wills, J., Agarwal, S., Belongie, S.: What Went Where. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2003, pp.I: 37–44 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Titsias, M.K., Williams, C.K.I. (2006). Sequential Learning of Layered Models from Video. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_29

Download citation

  • DOI: https://doi.org/10.1007/11957959_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68794-8

  • Online ISBN: 978-3-540-68795-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics