Skip to main content
Log in

Decomposed human localization from social photo album

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Recent years, there has tremendous progress in human detection, whereas only upright poses are usually considered, but the body poses in our daily lives are varied. In this paper, we mainly focus on localizing highly deformable persons which commonly appears in personal photo albums. Decomposition-based human localization is extremely challenging, due to the large pose variances, disabling the traditional part-based template detectors. To deal with the infeasibility of the template-based person models, we propose a decomposition-based human localization model based on the observation that highly deformable persons usually have a distinct body part (upper body) that possesses rigid and highly detectable structural nature, while the rest parts of the human are discriminative yet dependent to the upper body. The model tackles persons with highly deformable in three steps: firstly detect a stable upper body, then extend a set of bigger bounding boxes, from which the most appropriate instance is distinguished by a discriminative Whole Person Model (WPModel). From the experiment results, we can see that our decomposition-based model worked very well in localizing deformable persons, which improved the average precision by 10 % compared to state-of-the-art person detectors. And furthermore, Similar Pose Feature (SPF) shows the feasibility of projecting persons having similar poses into same clusters which facilitate a novel pose-based photo album browsing functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans Pattern Anal Mach Intel 32(7), 1239–1258 (2010)

    Article  Google Scholar 

  2. Gavrila, D., Giebel, J., Munder, S.: Vision-based pedestrian detection: The protector system [C]. Intelligent Vehicles Symposium, 2004 IEEE. IEEE, 13–18 (2004).

  3. Tons, M., Doerfler, R., Meinecke, M.-M., Obojski, M.A.: Radar sensors and sensor platform used for pedestrian protection in the EC-funded project SAVE-U [C]. Intelligent Vehicles Symposium, 2004 IEEE. IEEE, 813–818 (2004)

  4. Oren, M., Papageorion, C., Sinha, P. et al.: Pedestrian detection using wavelet templates, CVPR, 193–199 (1997)

  5. Gavrila, D.M., Philomin, V.: Real-time Object Detection for Smart Vehicles, ICCV, 87–93 (1999)

  6. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  7. Shashua, A., Gdalyahu, Y., Hayun, G.: Pedestrian detection for driving assistance systems: single-frame classification and system level performance, Proceedings of IEEE Intelligent Vehicles Symposium, 1–6 (2004)

  8. Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors, ECCV. 69–82 (2004)

  9. Wu, B., Nevatia, R.: Detection of multiple. Partially occluded humans in a single image by bayesian combination of edgelet part detectors, ICCV 1, 90–97 (2005)

    Google Scholar 

  10. Sabzmeydani, P., Mori, G.: Detecting pedestrians by learning shapelet features, CVPR, 1–8 (2007)

  11. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)

    Article  Google Scholar 

  12. Dalal N., Triggs B.: Histograms of oriented gradients for human detection, in Comp-uter Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 1, 886–893 (2005)

  13. Dalal N.: Finding people in images and videos, Ph.D. thesis, Institut National Polytechnique de Grenoble-INPG (2006)

  14. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models, IEEE Trans Pattern Anal Mach Intell 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  15. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Intern J Comp Vision 61(1), 55–79 (2005)

    Article  Google Scholar 

  16. Divvala S.K., Efros, A., Hebert, M.: Object instance sharing by enhanced boun-ding box correspondence (2011)

  17. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007, in 2th PASCAL Challenge Workshop (2009)

  18. Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations, in Computer VisionCECCV 2010, Springer, 168–181 (2010)

  19. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space red-uction for human pose estimation, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, pp. 1–8 (2008)

  20. Poppe, R.: A survey on vision-based human action recognition. Image and vision compu-ting 28(6), 976–990 (2010)

    Article  Google Scholar 

  21. Ohare, N., Lee, H., Cooray, S., Gurrin, C., Jones, Gareth J.F., Malobabic, Jovanka, O., Noel E., Smeaton, Alan F., Uscilowski, B.: Mediassist: Using content-based analysis and context to manage personal photo collections, in Image and video retrieval, Springer, pp. 529–532 (2006)

  22. Suh, B., Bederson, B.B .: Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition. Interacting Compu 19(4), 524–544 (2007)

    Article  Google Scholar 

  23. Cui, J., Wen, F., Xiao, R., Tian, Y., Tang, X.: Easyalbum: an interactive photo annotation system based on face clustering and reranking, In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp. 367–376 (2007)

  24. Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X., Xindong, W.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Imag Process22, 363–376 (2013)

    Article  Google Scholar 

  25. Gao, Y., Wang, F., Luan, H., Chua, T.: Brand Data Gathering From Live Social Media Streams, ACM Conference on Multimedia Retrieval (2014)

  26. Parkhi, O.M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: The truth about cats and dogs, in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 1427–1434 (2011)

  27. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts, in ACM Transactions on Graphics (TOG). ACM 23, 309–314 (2004)

    Article  Google Scholar 

  28. Viola, P., Jones M.J.: Robust real-time face detection. Intern J Comp Vision 57(2), 137–154 (2004)

    Article  Google Scholar 

  29. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2, 2169–2178 (2006)

  30. Pascal voc results 2010, http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  31. Girshick, R.B., Felzenszwalb, P. F., McAllester, D.: Discriminatively trained deformable part models, release 3, http://cs.brown.edu/~pff/latent-release3/

  32. Divvala, S.K., Efros, A., Hebert, M.: How important are deformable parts in the deformable parts model?, in Computer Vision–ECCV 2012 Workshop (2012)

  33. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines, ACM Trans Intell Syst Technol, 2, 27:1–27:27 (2011)

Download references

Acknowledgments

This work is supported by the Nature Science Foundation of China (No. 61422210, No. 61373076, and No. 61202143), the Natural Science Foundation of Fujian Province of China (No. 2013J05100, No. 2010J01345 and No. 2011J01367), the Fundamental Research Funds for the Central Universities (No. 2013121026 and No. 2011121052), the Xiamen University 985 project, the Research Fund for the Doctoral Program of Higher Education of China (No. 201101211120024), and the Special Fund for Developing Shenzhens Strategic Emerging Industries (No. JCYJ20120614164600201), and is partly supported by the Hunan Provincial Natural Science Foundation of China (12JJ2040), the Research Foundation of Education Committee of Hunan Province, China (09A046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongrong Ji.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Zhang, M., Su, S. et al. Decomposed human localization from social photo album. Multimedia Systems 22, 137–148 (2016). https://doi.org/10.1007/s00530-014-0422-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0422-9

Keywords

Navigation