Skip to main content
Log in

Directional geometric histogram feature extraction and applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image feature has been a hot research topic within the field of computer vision, with a wide scope of direct impacts on detection, recognition, image retrieval and pose estimation, etc. In this paper, we propose a novel image feature: Directional Geometric Histogram (DGH) which adopts directional geometric approximation from the geometric Bandelet transform to enhance the description distinctiveness and selectivity among monocular images, particularly by renovating the histogram of geometric regularity to characterize local image context with human objects. Other than the image geometry defined over edges, our approach can well preserve inner and outer patterns of contours with strict geometry. We have compared the proposed method with classic global features and conducted comprehensive experiments in human detection, pose estimation as well as scene recognition tasks on various datasets. Final evaluation results show that the dimensionality of the DGH feature can be reduced to less than half of the original size, which is also sparse while keeping competitive discriminatory effectiveness and distinctiveness in such visual tasks. Besides its relaxed computational requirement and off-the-shelf theoretical backup, the method is in the meanwhile quite promising for potential fields in video surveillance, pattern identification, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Agarwal A, Triggs B (2006) Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 28(1):44–58

    Article  Google Scholar 

  2. Andriluka M, Roth S, Schiele B (2012) Discriminative appearance models for pictorial structures. Int J Comput Vis 99(3):259–280

    Article  MathSciNet  Google Scholar 

  3. Bo L, Ren X, Fox D (2014) Learning hierarchical sparse features for RGB-(D) object recognition. Int J Robot Res 33(4):581–599

    Article  Google Scholar 

  4. Bo L, Sminchisescu C (2010) Twin gaussian processes for structured prediction. Int J Comput Vis 87(1–2):28–52

    Article  Google Scholar 

  5. Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, pp 401–408

  6. Boureau YL, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 111– 118

  7. Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports 6

  8. Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Deep neural networks predict hierarchical spatio-temporal cortical dynamics of human visual object recognition. arXiv:1601.02970

  9. Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3836

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. vol 1, pp 886–893

  11. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Computer vision–ECCV 2006, pp 428–441

  12. Eichner M, Ferrari V (2013) Appearance sharing for collective human pose estimation. In: Computer vision–ACCV 2012, pp 138–151

  13. Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis 99(2):190–214

    Article  MathSciNet  Google Scholar 

  14. Ekiz E, Cinbiş Nİ (2015) A multiple region selection based approach for scene recognition. In: 2015 23nd signal processing and communications applications conference (SIU) IEEE, pp 2238–2241

  15. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition, pp 580–587

  16. Inria person dataset. Website (2005). http://lear.inrialpes.fr/data

  17. Kanaujia A, Sminchisescu C, Metaxas D (2007) Semi-supervised hierarchical models for 3d human pose reconstruction. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07, pp 1–8

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  19. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, 2006, pp 2169–2178

  20. Le QV (2013) Building high-level features using large scale unsupervised learning. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013, pp 8595–8598

  21. Le Pennec E, Mallat S (2000) Image compression with geometrical wavelets. In: International conference on image processing, 2000. Proceedings. 2000, vol 1, pp 661–664

  22. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110

    Article  Google Scholar 

  23. Mesnil G, Rifai S, Bordes A, Glorot X, Bengio Y, Vincent P (2015) Unsupervised learning of semantics of object detections for scene categorization. In: Pattern recognition applications and methods. Springer, pp 209–224

  24. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630

    Article  Google Scholar 

  25. Mironicǎ I, Duţǎ IC, Ionescu B, Sebe N (2016) A modified vector of locally aggregated descriptors approach for fast video classification. Multimedia Tools Appl 75(15):1–28

    Google Scholar 

  26. Mironica I, Uijlings J, Rostamzadeh N, Ionescu B, Sebe N (2013) Time matters!: capturing variation in time in video using fisher kernels. In: ACM international conference on multimedia, pp 701–704

  27. Onishi K, Takiguchi T, Ariki Y (2008) 3D human posture estimation using the HOG features from monocular image. In: 19th international conference on pattern recognition, 2008. ICPR 2008, pp 1–4

  28. Pennec EL, Mallat S (2005) Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 14(4):423– 438

    Article  MathSciNet  Google Scholar 

  29. Peyré G, Mallat S (2004) Second generation bandelets and their application to image and 3D meshes compression. Mathematics and Image Analysis MIA 4

  30. Peyré G, Mallat S (2005) Surface compression with geometric bandelets. ACM Trans Graph (TOG) 24(3):601–608

    Article  Google Scholar 

  31. Poppe R (2007) Evaluating example-based pose estimation: experiments on the humaneva sets. Centre for Telematics and Information Technology University of Twente

  32. Raj A, Bhattacharya T, Mukerjee MA Articulated Human Detection and Pose Estimation (CS365 Course Project)

  33. Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence. http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14388

  34. Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    Article  MathSciNet  MATH  Google Scholar 

  35. Seo S, Wallat M, Graepel T, Obermayer K (2000) Gaussian process regression: active data selection and test point rejection. In: Mustererkennung 2000, pp 27–34

  36. Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp 3506–3513

  37. Sigal L, Balan AO, Black MJ (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1-2):4–27

    Article  Google Scholar 

  38. Sminchisescu C, Kanaujia A, Metaxas DN (2007) BM 3E: discriminative density propagation for visual tracking. IEEE Trans Pattern Anal Mach Intell 29(11):2030–2044

    Article  Google Scholar 

  39. Song Y, McLoughlin IV, Dai LR (2014) Local coding based matching kernel method for image classification. Plos One 9(8):e103575

    Article  Google Scholar 

  40. Tepper M, Sapiro G (2012) Decoupled coarse-to-fine matching and nonlinear regularization for efficient motion estimation. In: 19th IEEE international conference on image processing (ICIP), 2012, pp 1517–1520

  41. Tian J, Li L, Liu W (2014) Multi-scale human pose tracking in 2D monocular images. J Comput Commun 2:78

    Article  Google Scholar 

  42. Ukita N (2013) Iterative action and pose recognition using global-and-pose features and action-specific models. In: IEEE international conference on computer vision workshops (ICCVW), 2013, pp 476–483

  43. Van De Sande KE, Gevers T, Snoek CG (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  44. van Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Computer vision–ECCV 2008, pp 696–709

  45. Wang F, Li Y (2013) Learning visual symbols for parsing human poses in images. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 2510–2516

  46. Wang J, Gong Y (2012) Discovering image semantics in codebook derivative space. IEEE Trans Multimedia 14(4):986–994

    Article  Google Scholar 

  47. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, pp 1794–1801

  48. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2014) Object detectors emerge in deep scene CNNs. Comput Sci

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61075041, 61105016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Han.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, H., Gou, J. Directional geometric histogram feature extraction and applications. Multimed Tools Appl 76, 15173–15189 (2017). https://doi.org/10.1007/s11042-017-4729-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4729-3

Keywords

Navigation