Skip to main content

Advertisement

Log in

Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We develop a method for the estimation of articulated pose, such as that of the human body or the human hand, from a single (monocular) image. Pose estimation is formulated as a statistical inference problem, where the goal is to find a posterior probability distribution over poses as well as a maximum a posteriori (MAP) estimate. The method combines two modeling approaches, one discriminative and the other generative. The discriminative model consists of a set of mapping functions that are constructed automatically from a labeled training set of body poses and their respective image features. The discriminative formulation allows for modeling ambiguous, one-to-many mappings (through the use of multi-modal distributions) that may yield multiple valid articulated pose hypotheses from a single image. The generative model is defined in terms of a computer graphics rendering of poses. While the generative model offers an accurate way to relate observed (image features) and hidden (body pose) random variables, it is difficult to use it directly in pose estimation, since inference is computationally intractable. In contrast, inference with the discriminative model is tractable, but considerably less accurate for the problem of interest. A combined discriminative/generative formulation is derived that leverages the complimentary strengths of both models in a principled framework for articulated pose inference. Two efficient MAP pose estimation algorithms are derived from this formulation; the first is deterministic and the second non-deterministic. Performance of the framework is quantitatively evaluated in estimating articulated pose of both the human hand and human body.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alt, F.L. 1962. Digital pattern recognition by moments. Journal of the Association for Computing Machinery, 9(2):240–258.

    MATH  Google Scholar 

  • Amari, S.I. 1995. Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8(9):1379–1408.

    Article  Google Scholar 

  • Barron, C. and Kakadiaris, I. 2000. Estimating anthropometry and pose from a single image. In Proc. Computer Vision and Pattern Recognition, pp. 669–676.

  • Black, M.J., Yacoob, Y., Jepson, A.D., Fleet, D.J., 1997. Tracking and recognizing rigid and non-rigid facial motion using local parametric models of image motion. In Proc. International Conference on Computer Vision.

  • Black, M.J., Yacoob, Y., Jepson, A.D., Fleet, D.J. 1997. Learning parameterized models of image motion. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR-97, Puerto Rico, pp. 561–567.

  • Brand, M. 1999. Shadow puppetry. In Proc. International Conference on Computer Vision, pp. 1237–1244.

  • Bregler, C. 1998. Tracking people with twists and exponential maps. In Proc. Computer Vision and Pattern Recognition, pp. 8–15.

  • Cheng, J. and Druzdzel, M. 2000. AIS-BN: An adaptive importance sampling algorithm for evidential reasoning in large bayesian networks. Journal of Artificial Intelligence Research, 13:155–188.

    MathSciNet  Google Scholar 

  • Cover, T. and Thomas, J. 1991. Elements of Information Theory. Wiley Series in Telecommunications, John Wiley & Sons: New York, NY, USA.

  • Csiszar, I. and Tusnady, G. 1984. Information geometry and alternating minimization procedures. Statistics and Decisions, 1:205–237.

    MathSciNet  Google Scholar 

  • Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood estimation from incomplete data. Journal of the Royal Statistical Society (B), 39(1):1–38.

    MathSciNet  Google Scholar 

  • Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In Proc. Computer Vision and Pattern Recognition.

  • Felzenszwalb, P. and Huttenlocher, D. 2000. Efficient matching of pictorial structures. In Proc. Computer Vision and Pattern Recognition.

  • Friedman, J.H. 1991. Multivatiate adaptive regression splines. The Annals of Statistics, 19:1–141.

    MATH  MathSciNet  Google Scholar 

  • Gavrila, D. and Davis, L. 1995. Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In International Workshop on Automatic Face and Gesture Recognition, pp. 272–277.

  • Haritaouglu, I., Harwood, D., and Davis, L. 1998. Ghost: A human body part labeling system using silhouettes. In International Conference on Pattern Recognition, pp. 77–82.

  • Heap, T. and Hogg, D. 1996. Towards 3d hand tracking using a deformable model. In Proc. International Conference on Automatic Face and Gesture Recognition, pp. 140–145.

  • Hinton, G., Sallans, B., and Ghahramani, Z. 1998. A hierarchical community of experts. In Learning in Graphical Models, M. Jordan (ed.), pp. 479–494.

  • Hogg, D., Dudani, S., Breeding, K., and McGhee, R. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20.

    Article  Google Scholar 

  • Howe, N.R., Leventon, M.E., and Freeman, W.T. 2000. Bayesian reconstruction of 3d human motion from single-camera video. In Advances in Neural Information Processing Systems, 12:820–826.

    Google Scholar 

  • Hu, M.K. 1962. Visual pattern recognition by moment invariants. IRE Transactions Information Theory, IT(8):179–187.

    Google Scholar 

  • Iijima, T., Genchi, H., and Mori, K. 1973. A theory of character recognition by pattern matching method. In Proc. First Int'l Joint Conf. Pattern Recognition, pp. 50–56.

  • Isard, M. and Blake, A. 1998. Condensation – conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.

    Article  Google Scholar 

  • Johansson, G. 1973. Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14(2):210–211.

    Google Scholar 

  • Jordan, M. 1999. Learning in Graphical Models. Kluwer Academic: The Netherlands.

  • Jordan, M. and Jacobs, R. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.

    Google Scholar 

  • Mackay, D. 1998. Introduction to Monte Carlo methods. Learning in Graphical Models.

  • McLachlan, G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York.

  • Neal, R. and Hinton, G. 1998. A view of the em algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, M. Jordan (ed.), pp. 355–368.

  • Ng, A. and Jordan, M. 2001. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Advances in Neural Information Processing Systems, pp. 841–848.

  • Ormoneit, D., Sidenbladh, H., Black, M., and Hastie, T. 2001. Learning and tracking cyclic human motion. Advances in Neural Information Processing Systems 13:894–900.

    Google Scholar 

  • Pavlović, V., Rehg, J., and MacCormick, J. 2001. Learning switching linear models of human motion. Advances in Neural Information Processing Systems, 13:981–987.

    Google Scholar 

  • Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufman.

  • Rehg, J.M. and Kanade, T. 1995. Model-based tracking of self-occluding articulated objects. In Proc. International Conference on Computer Vision, pp. 612–617.

  • Rissanen, J. 1986. Stochastic complexity and modeling. Annals of Statistics, 14:1080–1100.

    MATH  MathSciNet  Google Scholar 

  • Rosales, R. 2002. The specialized mappings architecture, with applications to vision-based estimation of articulated body pose. PhD thesis, Boston University.

  • Rosales, R., Athitsos, V., Sigal, L., and Sclaroff, S. 2001. 3d hand pose estimation using specialized mappings. In Proc. International Conference on Computer Vision, pp. 378–387.

  • Rubinstein, R. 1981. Simulation and the Monte Carlo Method. John Wiley & Sons.

  • Rubinstein, Y. and Hastie, T. 1997. Discriminative vs. informative learning. In 3rd International Conference on Knowledge Discovery and Data Mining, pp. 49–56.

  • Shimada, N., Shirai, Y., Kuno, Y., and Miura, J. 1998. Hand gesture estimation and model refinement using monocular camera - ambiguity limitation by inequality constraints. In Proc. International Conference on Automatic Face and Gesture Recognition, pp. 268–273.

  • Sigal, L., Sclaroff, S., and Athitsos, V. 2000. Estimation and prediction of evolving color distributions for skin segmentation undervarying illumination. In Proc. Computer Vision and Pattern Recognition, pp. 152–159.

  • Sminchisescu, C. and Triggs, B. 2001. Covariance scaled sampling for monocular 3d body tracking. In Proc. Computer Vision and Pattern Recognition, pp. 447–454.

  • Song, Y., Feng, X., and Perona, P. 2000. Towards detection of human motion. In Proc. Computer Vision and Pattern Recognition, pp. 810–817.

  • Taylor, C.J. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. Computer Vision and Image Understanding: CVIU, 80(3):349–363.

    Article  MATH  Google Scholar 

  • Virtual Technologies, Inc. 1998. Palo Alto, CA. VirtualHand Software Library Reference Manual.

  • Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A. 1997. Pfinder: Real time tracking of the human body. PAMI, 19(7):780–785.

    Google Scholar 

  • Zhu, S.C., Guo, C., and Wu, Y. 2003. Modeling visual patterns by integrating descriptive and generative models. International Journal of Computer Vision, 53(1):5–29.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to RÓMer Rosales.

Additional information

Most of this work was done while the first author was with Boston University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rosales, R., Sclaroff, S. Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation. Int J Comput Vision 67, 251–276 (2006). https://doi.org/10.1007/s11263-006-5165-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-006-5165-4

Navigation