Abstract
The figure-ground segmentation of humans in images captured in natural environments is an outstanding open problem due to the presence of complex backgrounds, articulation, varying body proportions, partial views and viewpoint changes. In this work we propose class-specific segmentation models that leverage parametric max-flow image segmentation and a large dataset of human shapes. Our contributions are as follows: (1) formulation of a sub-modular energy model that combines class-specific structural constraints and data-driven shape priors, within a parametric max-flow optimization methodology that systematically computes all breakpoints of the model in polynomial time; (2) design of a data-driven class-specific fusion methodology, based on matching against a large training set of exemplar human shapes (100,000 in our experiments), that allows the shape prior to be constructed on-the-fly, for arbitrary viewpoints and partial views.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Notice, however, that the methodology we propose is also applicable to other categories than people. Here we focus on humans because for now, large training sets of segmented shapes with structural annotations are available only for them, through Human3.6M [7]. But, as large datasets for other object categories emerge, we expect our methodology to generalize well. In this respect, our results on a challenging visual category, humans, are indicative of the performance bounds one can expect.
References
Urtasun, R., Darrell, T.: Sparse probabilistic regression for activity-independent human pose inference. In: CVPR (2008)
Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: ICCV (2011)
Ionescu, C., Carreira, J., Sminchisescu, C.: Iterated second-order label sensitive pooling for 3D human pose estimation. In: CVPR (2014)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35, 2878–2890 (2013)
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_13
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. PAMI 7, 1325–1339 (2014)
Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989)
Kolmogorov, V., Boykov, Y., Rother, C.: Applications of parametric maxflow in computer vision. In: ICCV (2007)
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. In: PAMI (2012)
Ladicky, L., Torr, P.H.S., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR (2013)
Wang, H., Koller, D.: Multi-level inference by relaxed dual decomposition for human pose segmentation. In: CVPR (2011)
Ghiasi, G., Yang, Y., Ramanan, D., Fowlkes, C.C.: Parsing occluded people. In: CVPR (2014)
Xia, W., Song, Z., Feng, J., Cheong, L.-F., Yan, S.: Segmentation over detection by coupled global and local sparse representations. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 662–675. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33715-4_48
Ferrari, V., Marin, M., Zisserman, A.: Pose search: retrieving people using their pose. In: CVPR (2009)
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: CVPR (2009)
Zuffi, S., Freifeld, O., Black, M.J.: From pictorial structures to deformable structures. In: CVPR (2012)
Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: ICCV (2013)
Boussaid, H., Kokkinos, I.: Fast and exact: ADMM-based discriminative shape segmentation with loopy part models. In: CVPR (2014)
Alpert, S., Galun, M., Basri, R., Brandt, A.: Image segmentation by probabilistic bottom-up aggregation and cue integration. In: CVPR (2007)
Kumar, M.P., Torr, P., Zisserman, A.: OBJCUT: efficient segmentation using top-down and bottom-up cues. PAMI 32, 530–545 (2010)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)
Flohr, F., Gavrila, D.M.: PedCut: an iterative framework for pedestrian segmentation combining shape models and multiple data cues. In: BMVC (2013)
Russell, B.C., Efros, A., Sivic, J., Freeman, W.T., Zisserman, A.: Segmenting scenes by matching image composites. In: NIPS (2009)
Rosenfeld, A., Weinshall, D.: Extracting foreground masks towards object recognition. In: ICCV (2011)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Lin, G., Shen, C., Ian, R., van dan Hengel, A.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR (2016)
Kuettel, D., Ferrari, V.: Figure-ground segmentation by transferring window masks. In: CVPR (2012)
Gu, C., Arbeláez, P., Lin, Y., Yu, K., Malik, J.: Multi-component models for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 445–458. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_32
Lempitsky, V., Blake, A., Rother, C.: Image segmentation by branch-and-mincut. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 15–29. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88693-8_2
Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI (2010)
Malisiewicz, T., Efros, A.: Improving spatial support for objects via multiple segmentations. In: BMVC (2007)
van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: ICCV (2011)
Brox, T., Bourdev, L., Maji, S., Malik, J.: Object segmentation by alignment of poselet activations to image contours. In: CVPR (2011)
Endres, I., Hoiem, D.: Category independent object proposals. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 575–588. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_42
Kim, J., Grauman, K.: Shape sharing for object segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 444–458. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_33
Levinshtein, A., Sminchisescu, C., Dickinson, S.: Optimal contour closure by superpixel grouping. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 480–493. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15552-9_35
Maire, M., Yu, S.X., Perona, P.: Object detection and segmentation from joint embedding of parts and pixels. In: ICCV (2011)
Dong, J., Chen, Q., Yan, S., Yuille, A.: Towards unified object detection and semantic segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 299–314. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_20
Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: CVPR (2008)
Leordeanu, M., Sukthankar, R., Sminchisescu, C.: Efficient closed-form solution to generalized boundary detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 516–529. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_37
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_32
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24, 509–522 (2002)
Ryabko, B.Y., Stognienko, V., Shokin, Y.I.: A new test for randomness and its application to some cryptographic problems. J. Stat. Plan. Infer. 123, 365–376 (2004)
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: ICCV (2009)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Acknowledgments
This work was supported in part by CNCS-UEFISCDI under PCE-2011-3-0438, JRP-RO-FR-2014-16, and NVIDIA through a GPU card donation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Popa, AI., Sminchisescu, C. (2017). Parametric Image Segmentation of Humans with Structural Shape Priors. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-54184-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)