Skip to main content

Advertisement

Log in

Learning Articulated Structure and Motion

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Humans demonstrate a remarkable ability to parse complicated motion sequences into their constituent structures and motions. We investigate this problem, attempting to learn the structure of one or more articulated objects, given a time series of two-dimensional feature positions. We model the observed sequence in terms of “stick figure” objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed. The problem is formulated as a single probabilistic model that includes multiple sub-components: associating the features with particular sticks, determining the proper number of sticks, and finding which sticks are physically joined. We test the algorithm on challenging datasets of 2D projections of optical human motion capture and feature trajectories from real videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abdel-Malek, K., Arora, J., Beck, S., Bhatti, M., Carroll, J., Cook, T., Dasgupta, S., Grosland, N., Han, R., Kim, H., Lu, J., Swan, C., Williams, A., & Yang, J. Digital human modeling and virtual reality for FCS (Technical Report VSR-04.02). The Virtual Soldier Research (VSR) Program, Center for Computer-Aided Design, College of Engineering, The University of Iowa, October 2004.

  • Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In ECCV (2), pp. 642–655.

  • Costeira, J., & Kanade, T. (1996). A multi-body factorization method for motion analysis. In Image understanding workshop (pp. 1013–1026).

  • Costeira, J. P., & Kanade, T. (1998). A multibody factorization method for independently moving-objects. International Journal of Computer Vision, 29(3), 159–179.

    Article  Google Scholar 

  • Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. New York: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • Culverhouse, P. F., & Wang, H. (2003). Robust motion segmentation by spectral clustering. In British machine vision conference (pp. 639–648).

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39, 1–38.

    MATH  MathSciNet  Google Scholar 

  • Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.

    Article  MathSciNet  Google Scholar 

  • Gear, C. W. (1998). Multibody grouping from motion images. International Journal of Computer Vision, 29(2), 133–150. doi:10.1023/A:1008026310903. ISSN 0920-5691.

    Article  Google Scholar 

  • Ghahramani, Z., & Hinton, G. E. (1996a). The EM algorithm for mixtures of factor analyzers (Technical Report CRG-TR-96-1). University of Toronto.

  • Ghahramani, Z., & Hinton, G. E. (1996b). Parameter estimation for linear dynamical systems (Technical Report CRG-TR-96-2). University of Toronto.

  • Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore: Johns Hopkins Press.

    MATH  Google Scholar 

  • Gruber, A., & Weiss, Y. (2003). Factorization with uncertainty and missing data: Exploiting temporal coherence. In Thrun, S., Saul, L. K., & Schölkopf, B. (Eds.) Advances in Neural Information Processing Systems. Cambridge: MIT Press. ISBN0-262-20152-6.

    Google Scholar 

  • Gruber, A., & Weiss, Y. (2004). Multibody factorization with uncertainty and missing data using the EM algorithm. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 707–714).

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry. Cambridge: Cambridge University Press.

    Google Scholar 

  • Herda, L., Fua, P., Plankers, R., Boulic, R., & Thalmann, D. (2001). Using skeleton-based tracking to increase the reliability of optical motion capture. Human Movement Science Journal, 20(3), 313–341.

    Article  Google Scholar 

  • Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211.

    Google Scholar 

  • Kirk, A. G., O’Brien, J. F., & Forsyth, D. A. (2005). Skeletal parameter estimation from optical motion capture data. In Proceedings of IEEE conference on computer vision and pattern recognition. Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-2372-2.

    Google Scholar 

  • Neal, R., & Hinton, G. (1998). A view of the em algorithm that justifies incremental, sparse, and other variants. In Jordan, M. I. (Ed.) Learning in graphical models. Norwell: Kluwer Academic.

    Google Scholar 

  • Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems (NIPS).

  • Ross, D. A. (2008a). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Ontario, Canada.

  • Ross, D. A. (2008b). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Toronto, Ontario, Canada.

  • Ross, D. A., & Zemel, R. S. (2006). Learning parts-based representations of data. Journal of Machine Learning Research, 7, 2369–2397.

    MathSciNet  Google Scholar 

  • Ross, D. A., Tarlow, D., & Zemel, R. S. (2007). Learning articulated skeletons from motion. In Workshop on dynamical vision at ICCV.

  • Ross, D. A., Tarlow, D., & Zemel, R. S. (2008). Unsupervised learning of skeletons from motion. In Forsyth, D., Torr, P., & Zisserman, A. (Eds.) Proceedings of the 10th European conference on computer vision (ECCV 2008). Berlin: Springer.

    Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • Shi, J., & Tomasi, C. (1994). Good features to track. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), (pp. 593–600).

  • Silaghi, M. C., Plankers, R., Boulic, R., Fua, P., & Thalmann, D. (1998). Local and global skeleton fitting techniques for optical motion capture, modeling and motion capture techniques for virtual environments. In Lecture notes in artificial intelligence (pp. 26–40). Berlin: Springer.

    Google Scholar 

  • Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–393.

    Article  Google Scholar 

  • Song, Y., Goncalves, L., & Perona, P. (2003). Unsupervised learning of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 814–827.

    Article  Google Scholar 

  • Song, Y., Goncalves, L., & Perona, P. (2001). Learning probabilistic structure for human motion detection. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 771–777). Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-1272-0.

    Google Scholar 

  • Taycher, L., Fisher III, J. W., & Darrell, T. (2002). Recovering articulated model topology from observed rigid motion. In Becker, S., Thrun, S., & Obermayer, K. (Eds.) Advances in neural information processing systems (NIPS) (pp. 1311–1318). Cambridge: MIT Press.

    Google Scholar 

  • Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9, 137–154.

    Article  Google Scholar 

  • Tresadern, P., & Reid, I. (2005). Articulated structure from motion by factorization. In CVPR ’05: proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (Vol. 2, pp. 1110–1115). Washington: IEEE Comput. Soc. doi:10.1109/CVPR.2005.75. ISBN 0-7695-2372-2.

    Chapter  Google Scholar 

  • Viklands, T. (2006). Algorithms for the weighted orthogonal Procrustes problem and other least squares problems (PhD thesis). Umeå University, Umeå, Sweden.

  • Weiss, Y. (1999). Segmentation using eigenvectors: a unifying view. In Proceedings of the international conference on computer vision (ICCV).

  • Yan, J., & Pollefeys, M. (2005a). Factorization-based approach to articulated motion recovery. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).

  • Yan, J., & Pollefeys, M. (2005b). Articulated motion segmentation using ransac with priors. In Workshop on dynamical vision (ICCV).

  • Yan, J., & Pollefeys, M. (2006a). A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Proceedings computer vision—ECCV 2006, 9th European conference on computer vision, Part III, Graz, Austria, May 7–13.

  • Yan, J., & Pollefeys, M. (2006b). Automatic kinematic chain building from feature trajectories of articulated objects. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).

  • Yan, J., & Pollefeys, M. (2008). A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 865–877. ISSN 0162-8828. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70739.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Ross.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ross, D.A., Tarlow, D. & Zemel, R.S. Learning Articulated Structure and Motion. Int J Comput Vis 88, 214–237 (2010). https://doi.org/10.1007/s11263-010-0325-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0325-y

Keywords

Navigation