Abstract
We describe a top-down object detection and segmentation approach that uses a skeleton-based shape model and that works directly on real images. The approach is based on three components. First, we propose a fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments. The model is capable of generating a wide variation of shapes as instances of a given object category. Second, we develop a progressive selection mechanism to search among the generated shapes for the category instances that are present in the image. The search begins with a large pool of candidates identified by a dynamic programming (DP) algorithm and progressively reduces it in size by applying series of criteria, namely, local minimum criterion, extent of shape overlap, and thresholding of the objective function to select the final object candidates. Third, we propose the Partitioned Chamfer Matching (PCM) measure to capture the support of image edges for a hypothesized shape. This measure overcomes the shortcomings of the Oriented Chamfer Matching and is robust against spurious edges, missing edges, and accidental alignment between the image edges and the shape boundary contour. We have evaluated our approach on the ETHZ dataset and found it to perform well in both object detection and object segmentation tasks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adluru, N., & Latecki, L. J. (2009). Contour grouping based on contour-skeleton duality. International Journal of Computer Vision, 83(1), 12–29.
Adluru, N., Latecki, L. J., Lakaemper, R., Yong, T., Bai, X., & Gross, A. (2005). Deformation invariant image matching. In ICCV ’05: proceedings of the tenth IEEE international conference on computer vision (Vol. II, pp. 1466–1473). Los Alamitos: IEEE Computer Society Press.
Amit, Y., & Kong, A. (1996). Graphical templates for model registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(3), 225–236.
Bai, X., Wang, X., Latecki, L. J., Liu, W., & Tu, Z. (2009). Active skeleton for non-rigid object detection. In ICCV ’09: proceedings of the twelfth IEEE international conference on computer vision. Los Alamitos: IEEE Computer Society Press.
Balan, A. O., & Black, M. J. (2006). An adaptive appearance model approach for model-based articulated object tracking. In CVPR’06 (pp. 758–765). Los Alamitos: IEEE Computer Society Press.
Barrow, H. (1977). Parametric correspondence and chamfer matching: two new techniques for image matching. In Proc 5th int joint conf artificial intelligence.
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondences. In CVPR’05 (pp. 26–33). Los Alamitos: IEEE Computer Society Press.
Bertele, U., & Brioschi, F. (1972). Nonserial dynamic programming. Orlando: Academic Press.
Bishop, C. M. (2007). Pattern recognition and machine learning. Berlin: Springer.
Chui, H., & Rangarajan, A. (2003). A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding, 89(2–3), 114–141. doi:10.1016/S1077-3142(03)00009-2.
Coughlan, J., Yuille, A., English, C., & Snow, D. (2000). Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding, 78(3), 303–319. doi:10.1006/cviu.2000.0842.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV international workshop on statistical learning in computer vision.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR’05 (pp. 886–893). Los Alamitos: IEEE Computer Society Press.
Demirci, M. F., Shokoufandeh, A., & Dickinson, S. J. (2009). Skeletal shape abstraction from examples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 944–952.
Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV ’03: proceedings of the ninth IEEE international conference on computer vision (pp. 634–640). Los Alamitos: IEEE Computer Society Press.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
Felzenszwalb, P. F. (2005). Representation and detection of deformable shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 208–220.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Felzenszwalb, P. F., & Schwartz, J. D. (2007). Hierarchical matching of deformable shapes. In CVPR’07. Los Alamitos: IEEE Computer Society Press.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Madison, Wisconsin (pp. 264–271). Los Alamitos: IEEE Computer Society Press. URL citeseer.nj.nec.com/580536.html.
Ferrari, V., Tuytelaars, T., & Gool, L. V. (2006). Object detection by contour segment networks. In Lecture notes in computer science: Vol. 3951. ECCV2006 (pp. 14–28). Berlin: Springer.
Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In CVPR’07 (pp. 1–8). Los Alamitos: IEEE Computer Society Press.
Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 36–51.
Ferrari, V., Jurie, F., & Schmid, C. (2010). From images to shape models for object detection. International Journal of Computer Vision, 87(3), 284–303.
Geman, S., & Kochanek, K. (2001). Dynamic programming and the graphical representation of error-correcting codes. IEEE Transactions on Information Theory, 47(2), 549–568.
Giblin, P. J., & Kimia, B. B. (2003a). On the intrinsic reconstruction of shape from its symmetries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 895–911.
Giblin, P. J., & Kimia, B. B. (2003b). On the local form and transitions of symmetry sets, medial axes, and shocks. International Journal of Computer Vision, 54(1–3), 143–157.
Gu, C., Lim, J. J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Miami, Florida, USA (pp. 1030–1037). Los Alamitos: IEEE Computer Society Press.
Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. (1993). Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863.
Jain, V., Kimia, B. B., & Mundy, J. L. (2007). Segregation of moving objects using elastic matching. Computer Vision and Image Understanding, 108, 230–242.
Jiang, X., Münger, A., & Bunke, H. (2001). On median graphs: Properties, algorithms, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1144–1151.
Jurie, F., & Schmid, C. (2004). Scale-invariant shape features for recognition of object categories. In CVPR (Vol. II, pp. 90–96).
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In ICCV ’05: proceedings of the tenth IEEE international conference on computer vision (ICCV’05) (Vol. 1, pp. 604–610). Los Alamitos: IEEE Computer Society Press.
Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: active contour models. International Journal of Computer Vision, 1(4), 321–331.
Kelly, M. F., & Levine, M. D. (1995). Annular symmetry operators: a method for locating and describing objects. In ICCV.
Kimia, B. (1991). Conservation laws and a theory of shape. Ph.D. dissertation, McGill Center for Intelligent Machines, McGill University, Montreal, Canada.
Kimia, B. B. (2003). On the role of medial geometry in human vision. Journal of Physiology-Paris, 97(2–3), 155–190.
Kimia, B. B. (2009). Shapes and shock graphs: from segmented shapes to shapes embedded in images. In S. J. Dickinson, A. Leonardis, B. Schiele, & M. J. Tarr (Eds.), Object categorization: computer and human vision perspectives (pp. 430–450). Cambridge: Cambridge University Press.
Kimia, B. B., Tannenbaum, A. R., & Zucker, S. W. (1990). Toward a computational theory of shape: an overview. In O. D. Faugeras (Ed.), Lecture notes in computer science: Vol. 427. ECCV (pp. 402–407). Berlin: Springer.
Kimia, B. B., Tannenbaum, AR, & Zucker, S. W. (1995). Shapes, shocks, and deformations, I: the components of shape and the reaction-diffusion space. International Journal of Computer Vision, 15(3), 189–224.
Kimia, B. B., Frankel, I., & Popescu, A. M. (2003). Euler spiral for shape completion. International Journal of Computer Vision, 54, 159–182.
Kovesi, P. D. (2009). MATLAB and Octave functions for computer vision and image processing. School of Computer Science & Software Engineering, The University of Western Australia. Available from: http://www.csse.uwa.edu.au/~pk/research/matlabfns/.
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004a). Extending pictorial structures for object recognition. In BMVC’04, British Machine Vision Association (pp. 789–798).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004b). Learning layered pictorial structures from video. In B. Chanda, S. Chandran, & L. S. Davis (Eds.), ICVGIP 2004 (pp. 158–164). Mumbai: Allied Publishers.
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005). Obj cut. In CVPR’05 (pp. 18–25). Los Alamitos: IEEE Computer Society Press.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR’06 (pp. 2169–2178). Los Alamitos: IEEE Computer Society Press.
Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive mean-shift search. In DAGM-Symposium (pp. 145–153).
Leordeanu, M., Hebert, M., & Sukthankar, R. (2007). Beyond local appearance: category recognition from pairwise interactions of simple features. In CVPR’07. Los Alamitos: IEEE Computer Society Press.
Lin, L., Peng, S., Porway, J., Zhu, S., & Wang, Y. (2007). An empirical study of object category recognition: sequential testing with generalized samples. In ICCV07 (pp. 1–8).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Maji, S., & Malik, J. (2009). Object detection using a max-margin hough transform. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1038–1045). Los Alamitos: IEEE Computer Society Press.
Martin, D. R., Fowlkes, C. C., & Malik, J., (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549. doi:10.1109/TPAMI.2004.1273918.
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Mori, G. (2005). Guiding model search using segmentation. In ICCV ’05: proceedings of the tenth IEEE international conference on computer vision (pp. 1417–1423). Los Alamitos: IEEE Computer Society Press.
Nilsson, D. (1998). An efficient algorithm for finding the m most probable configurations in probabilistic expert systems. Statistics and Computing, 8(2), 159–173. doi:10.1023/A:1008990218483.
Olson, C. F., & Huttenlocher, D. P. (1997). Automatic target recognition by matching oriented edge pixels. IEEE Transactions on Image Processing, 6(1), 103–113.
Ommer, B., & Malik, J. (2009). Multi-scale object detection by clustering lines. In ICCV ’09: proceedings of the twelfth IEEE international conference on computer vision. Los Alamitos: IEEE Computer Society Press.
Opelt, A., Pinz, A., & Zisserman, A. (2006a). A boundary-fragment-model for object detection. In Lecture notes in computer science: Vol. 3951. ECCV’06 (pp. 575–588). Berlin: Springer.
Opelt, A., Pinz, A., & Zisserman, A. (2006b). Incremental learning of object detectors using a visual shape alphabet. In CVPR’06 (pp. 3–10). Los Alamitos: IEEE Computer Society Press.
Opelt, A., Pinz, A., & Zisserman, A. (2008). Learning an alphabet of shape and appearance for multi-class object detection. International Journal of Computer Vision, 80(1), 16–44.
Ozcanli, O. C., & Kimia, B. B. (2007). Generic object recognition via shock patch fragments. In N. M. Rajpoot & A. Bhalerao (Eds.), Proceedings of the British machine vision conference (pp. 1030–1039). Coventry: Warwick Print.
Ozcanli, O. C., Tamrakar, A., Kimia, B. B., & Mundy, J. L. (2006). Augmenting shape with appearance in vehicle category recognition. In CVPR’06 (pp. 935–942). Los Alamitos: IEEE Computer Society Press.
Ramanan, D. (2007). Learning to parse images of articulated bodies. In B. Schölkopf, J. Platt & T. Hoffman (Eds.), NIPS’06. Cambridge: MIT Press.
Sala, P., & Dickinson, S. (2008). Model-based perceptual grouping and shape abstraction. In Computer vision and pattern recognition workshops, CVPRW ’08. IEEE computer society conference on (pp. 1–8). Los Alamitos: IEEE Computer Society Press.
Sebastian, T., Klein, P., & Kimia, B. (2004). Recognition of shapes by editing their shock graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 551–571.
Sebastian, T. B., Klein, P. N., & Kimia, B. B. (2001). Recognition of shapes by editing shock graphs. In Proceedings of the eighth international conference on computer vision, Vancouver, Canada (pp. 755–762). Los Alamitos: IEEE Computer Society Press.
Selinger, A., & Nelson, R. C. (1999). A perceptual grouping hierarchy for appearance-based 3d object recognition. Computer Vision and Image Understanding, 76(1), 83–92.
Sharvit, D., Chan, J., & Kimia, B. B. (1998). Symmetry-based indexing of image databases. In Workshop on content-based access of image and video libraries, CVPR98 (pp. 56–62).
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In ICCV (pp. 281–288).
Shotton, J., Winn, J. M., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Lecture notes in computer science: Vol. 3951. ECCV’06 (pp. 1–15). Berlin: Springer.
Shotton, J., Blake, A., & Cipolla, R. (2008). Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1270–1281.
Siddiqi, K., & Kimia, B. B. (1995). Parts of visual form: computational aspects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(3), 239–251.
Siddiqi, K., & Kimia, B. B. (1996). A shock grammar for recognition. In Proc. CVPR (pp. 507–513).
Siddiqi, K., Tresness, K. J., & Kimia, B. B. (1996). Parts of visual form: ecological and psychophysical aspects. Perception, 25, 399–424.
Siddiqi, K., Shokoufandeh, A., Dickinson, S. J., & Zucker, S. W. (1999). Shock graphs and shape matching. International Journal of Computer Vision, 35(1), 13–32.
Siddiqi, K., Kimia, B. B., Tannenbaunm, AR, & Zucker, S. W. (2001). On the psychophysics of the shape triangle. Vision Research, 41(9), 1153–1178.
Tek, H., & Kimia, B. B. (2003). Symmetry maps of free-form curve segments via wave propagation. International Journal of Computer Vision, 54(1–3), 35–81.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR’06 (pp. 927–934). Los Alamitos: IEEE Computer Society Press.
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR’04 (pp. 762–769). Los Alamitos: IEEE Computer Society Press.
Torsello, A. (2008). An importance sampling approach to learning structural representations of shape. In CVPR’08. Los Alamitos: IEEE Computer Society Press.
Torsello, A., & Hancock, ER (2006). Learning shape-classes using a mixture of tree-unions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 954–967.
Triesch, J., & von der Malsburg, C. (2002). Classification of hand postures against complex backgrounds using elastic graph matching. Image and Vision Computing, 20(13-14), 937–943.
Trinh, N. H., & Kimia, B. B. (2007). A symmetry-based generative model for shape. In ICCV ’07: proceedings of the eleventh IEEE international conference on computer vision, Rio de Janeiro, Brazil. Los Alamitos: IEEE Computer Society Press.
Trinh, N. H., & Kimia, B. B. (2009). Category-specific object recognition and segmentation using a skeletal shape model. In BMVC’09: proceedings of the British Machine Vision Conference.
Trinh, N. H., & Kimia, B. B. (2010). Learning prototypical shapes for object categories. In Proceedings of CVPR workshop on structured models in computer vision (SMiCV’10). Los Alamitos: IEEE Computer Society Press.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Kauai, Hawaii, USA (pp. 511–518). Los Alamitos: IEEE Computer Society Press.
Winn, J. M., & Jojic, N. (2005). Locus: Learning object classes with unsupervised segmentation. In ICCV ’05: Proceedings of the tenth IEEE international conference on computer vision. (pp. 756–763). Los Alamitos: IEEE Computer Society Press.
Yanover, C., & Weiss, Y. (2004). Finding the m most probable configurations in arbitrary graphical models. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), NIPS’03. Cambridge: MIT Press.
Yuille, AL, Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8(2), 99–111.
Zhang, J., Luo, J., Collins, R. T., & Liu, Y. (2006). Body localization in still images using hierarchical models and hybrid search. In CVPR’06 (pp. 1536–1543). Los Alamitos: IEEE Computer Society Press.
Zhu, Q., Wang, L., Wu, Y., & Shi, J. (2008). Contour context selection for object detection: A set-to-set contour matching approach. In Lecture notes in computer science: Vol. 5303. ECCV (pp. 774–787). Berlin: Springer.
Zhu, S. C., & Yuille, AL (1996). FORMS: a flexible object recognition and modeling system. International Journal of Computer Vision, 20(3), 187–212.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors gratefully acknowledge the support of US National Foundation Grant NSF 0957045.
Rights and permissions
About this article
Cite this article
Trinh, N.H., Kimia, B.B. Skeleton Search: Category-Specific Object Recognition and Segmentation Using a Skeletal Shape Model. Int J Comput Vis 94, 215–240 (2011). https://doi.org/10.1007/s11263-010-0412-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0412-0