Abstract
Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity.
We propose two approaches to this problem. In one, the search can be pruned by using projected versions of a classifier that accepts groups corresponding to people. We describe an efficient projection algorithm for one popular classifier, and demonstrate that our approach can be used to determine whether images of real scenes contain people.
The second approach employs a probabilistic framework, so that we can draw samples of assemblies, with probabilities proportional to their likelihood, which allows to draw human-like assemblies more often than the non-person ones. The main performance problem is in segmentation of images, but the overall results of both approaches on real images of people are encouraging.
Similar content being viewed by others
References
Agin, G.J. 1972. Representation and description of curved objects. Ph.D. Thesis, Stanford University, Stanford, CA.
Binford, T.O. 1971. Visual perception by computer. In Proc. IEEE Conference on Systems and Control, Miami, FL.
Blake, A. and Isard, M. 1998. Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics to Visual Tracking of Shapes in Motion. Springer Verlag, London.
Brady, J.M. and Asada, H. 1984. Smoothed local symmetries and their implementation. International Journal of Robotics Research, 3(3):36-61, New York.
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 8-15, Santa Barbara, CA.
Brooks, R.A. 1981. Symbolic reasoning among 3-D models and 2-D images. Ph.D. Thesis, Stanford University, Computer Science Dept. Stanford, CA.
Burl, M.C., Leung, T.K., and Perona. P. 1995. Face localisation via shape statistics. In Int. Workshop on Automatic Face and Gesture Recognition.
Cutler, R. and Davis, L.S. 2000. Robust real-time periodic motion detection, analysis and applications. IEEE T. Pattern Analysis and Machine Intelligence, 22(8):781-796.
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B (39), pp. 185-197.
Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In IEEE Conf. on Computer Vision and Pattern Recognition.
Dickinson, S. Pentland, A.P., and Rosenfeld. A. 2000. 3D shape recovery using distributed aspect matching. IEEE Trans. Patt. Anal. Mach. Intell., 14(2):174-198.
Faugeras, O.D. and Hebert, M.1986. The representation, recognition, and locating of 3-D objects. International Journal of Robotics Research, 5(3):27-52.
Felzenszwalb, P. and Huttenlocher, D. 2000. Efficient matching of pictorial structures. In IEEE Conf. on Computer Vision and Pattern Recognition.
Forsyth, D.A. and Fleck, M.M. 1997. Body plans. In IEEE Conf. on Computer Vision and Pattern Recognition.
Forsyth, D.A. and Fleck, M.M. 1999. Automatic detection of human nudes.Int. J. Computer Vision, 32(1):63-77.
Forsyth, D.A., Fleck, M.M., and Bregler, C. 1996. Finding naked people. In European Conference on Computer Vision.
Freund, Y. and Schapire, R.E. 1996. Experiments with a newboosting algorithm. In Machine Learning-13.
Gavrila, D.M. and Davis, L.S. 1996. 3d model-based tracking of humans in action: A multi-view approach. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 73-80.
Grimson, W.E.L. and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Trans. Patt. Anal. Mach. Intell., 9(4):469-482.
Haddon, J. and Forsyth, D.A. 1997. Shading primitives. In Int. Conf. on Computer Vision.
Haritaoglu, I., Harwood, D., and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities. IEEE T.Pattern Analysis and Machine Intelligence, 22(8):809-830.
Hogg, D. 1983. Model based vision: a program to see a walking person. Image and Vision Computing, 1(1):5-20.
Huang, C-Y., Camps, O.T., and Kanungo, T. 1997. Object recognition using appearance-based parts and relations. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 877-883.
Huttenlocher, D.P. and Ullman, S. 1987. Object recognition using alignment. In Proc. Int. Conf. Comp. Vision, London, U.K. pp. 102-111.
Kanazawa, K., Koller, D., and Russell, S. 1995. Stochastic simulation algorithms for dynamic probabilistic networks. In Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference.
Leung, T.K., Burl, M.C., and Perona, P. 1995. Finding faces in cluttered scenes using random labelled graph matching. In Int. Conf. on Computer Vision.
Liu, F. and Picard, R.W. 1996. Detecting and segmenting periodic motion. Media lab vision and modelling tr-400, MIT, Cambridge, MA.
Meila, M. and Jordan, M. 2000. Learning with mixtures of trees. submitted Journal of Machine Learning Research.
Neal, R.M. 1998. Annealed importance sampling. Technical Report no. 9805, University of Toronto.
Nevatia, R. and Binford, T.O. 1977. Description and recognition of complex curved objects. Artificial Intelligence, 8(1):77-98.
Niyogi, S.A. and Adelson, E.H. 1995. Analyzing and recognizing walking figures in xyt. Media lab vision and modelling tr-223, MIT, Cambridge, MA.
Oren, M., Papageorgiou, C., Sinha, P., and Osuna, E. 1997. Pedestrian detection using wavelet templates. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 193-199.
O'Rourke, J. and Badler, N. 1980. Model-based image analysis of human motion using constraint propagation. IEEE T. Pattern Analysis and Machine Intelligence, 2(6):522-546.
Poggio, T. and Sung, K.-K. 1995. Finding human faces with a gaussian mixture distribution-based face model. In Asian Conf. on Computer Vision, pp. 435-440.
Rehg, J. and Kanade, T. 1994. Visual tracking of high dof articulated structures: An application to human hand tracking. In European Conference on Computer Vision, pp. 35-46.
Rohr, K. 1993. Incremental recognition of pedestrians from image sequences. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 9-13.
Rowley, H.A., Baluja, S., and Kanade, T. 1996a. Human face detection in visual scenes. In Touretzky, D.S., Mozer, M.C., and Hasselmo, M.E. (Eds.). Advances in Neural Information Processing, 8:875-881, MIT Press: Cambridge, MA, USA.
Rowley, H.A., Baluja, S., and Kanade, T. 1996b. Neural networkbased face detection. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 203-208.
Rowley, H.A., Baluja, S., and Kanade, T. 1998a. Neural networkbased face detection. IEEE T. Pattern Analysis and Machine Intelligence, 20(1):23-38.
Rowley, H.A., Baluja, S., and Kanade, T. 1998b. Rotation invariant neural network-based face detection. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 38-44.
Shi, J. and Malik, J. 1997. Normalised cuts and image segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 731-737.
Shuppan, E. Pose File, 1993-1996. Vol. 1-7. Books Nippan. A collection of photographs of human models, annotated in Japanese, Japan.
Sung, K-K. and Poggio, T. 1998. Example-based learning for viewbased human face detection. PAMI, 20(1):39-51.
Thompson, D.W. and Mundy, J.L. 1987. Three-dimensional model matching from an unconstrained viewpoint. In IEEE Int. Conf. on Robotics and Automation, Raleigh, NC, pp. 208-220.
Ullman, S. 1996. High-level Vision: Object Recognition and Visual Cognition. MIT Press: Cambridge, MA, USA.
Ulupinar, F. and Nevatia, R. 1988. Using symmetries for analysis of shape from contour. In Proc. Int. Conf. Comp. Vision, Tampa, FL, pp. 414-426.
Vapnik, V.N. 1996. The Nature of Statistical Learning Theory. Springer Verlag.
Wren, C.R., Azarbayejani, A., Darrell, T., and Pentland, A.P. 1997. Pfinder: Real-time tracking of the human body. PAMI, 19(7):780-785.
Zerroug, M. and Nevatia, R. 1999. Part-based 3d descriptions of complex objects from a single image. PAMI, 21(9):835-848.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ioffe, S., Forsyth, D. Probabilistic Methods for Finding People. International Journal of Computer Vision 43, 45–68 (2001). https://doi.org/10.1023/A:1011179004708
Issue Date:
DOI: https://doi.org/10.1023/A:1011179004708