Abstract
Image feature has been a hot research topic within the field of computer vision, with a wide scope of direct impacts on detection, recognition, image retrieval and pose estimation, etc. In this paper, we propose a novel image feature: Directional Geometric Histogram (DGH) which adopts directional geometric approximation from the geometric Bandelet transform to enhance the description distinctiveness and selectivity among monocular images, particularly by renovating the histogram of geometric regularity to characterize local image context with human objects. Other than the image geometry defined over edges, our approach can well preserve inner and outer patterns of contours with strict geometry. We have compared the proposed method with classic global features and conducted comprehensive experiments in human detection, pose estimation as well as scene recognition tasks on various datasets. Final evaluation results show that the dimensionality of the DGH feature can be reduced to less than half of the original size, which is also sparse while keeping competitive discriminatory effectiveness and distinctiveness in such visual tasks. Besides its relaxed computational requirement and off-the-shelf theoretical backup, the method is in the meanwhile quite promising for potential fields in video surveillance, pattern identification, etc.
Similar content being viewed by others
References
Agarwal A, Triggs B (2006) Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 28(1):44–58
Andriluka M, Roth S, Schiele B (2012) Discriminative appearance models for pictorial structures. Int J Comput Vis 99(3):259–280
Bo L, Ren X, Fox D (2014) Learning hierarchical sparse features for RGB-(D) object recognition. Int J Robot Res 33(4):581–599
Bo L, Sminchisescu C (2010) Twin gaussian processes for structured prediction. Int J Comput Vis 87(1–2):28–52
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, pp 401–408
Boureau YL, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 111– 118
Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports 6
Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Deep neural networks predict hierarchical spatio-temporal cortical dynamics of human visual object recognition. arXiv:1601.02970
Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3836
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. vol 1, pp 886–893
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Computer vision–ECCV 2006, pp 428–441
Eichner M, Ferrari V (2013) Appearance sharing for collective human pose estimation. In: Computer vision–ACCV 2012, pp 138–151
Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis 99(2):190–214
Ekiz E, Cinbiş Nİ (2015) A multiple region selection based approach for scene recognition. In: 2015 23nd signal processing and communications applications conference (SIU) IEEE, pp 2238–2241
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition, pp 580–587
Inria person dataset. Website (2005). http://lear.inrialpes.fr/data
Kanaujia A, Sminchisescu C, Metaxas D (2007) Semi-supervised hierarchical models for 3d human pose reconstruction. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07, pp 1–8
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, 2006, pp 2169–2178
Le QV (2013) Building high-level features using large scale unsupervised learning. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013, pp 8595–8598
Le Pennec E, Mallat S (2000) Image compression with geometrical wavelets. In: International conference on image processing, 2000. Proceedings. 2000, vol 1, pp 661–664
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110
Mesnil G, Rifai S, Bordes A, Glorot X, Bengio Y, Vincent P (2015) Unsupervised learning of semantics of object detections for scene categorization. In: Pattern recognition applications and methods. Springer, pp 209–224
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Mironicǎ I, Duţǎ IC, Ionescu B, Sebe N (2016) A modified vector of locally aggregated descriptors approach for fast video classification. Multimedia Tools Appl 75(15):1–28
Mironica I, Uijlings J, Rostamzadeh N, Ionescu B, Sebe N (2013) Time matters!: capturing variation in time in video using fisher kernels. In: ACM international conference on multimedia, pp 701–704
Onishi K, Takiguchi T, Ariki Y (2008) 3D human posture estimation using the HOG features from monocular image. In: 19th international conference on pattern recognition, 2008. ICPR 2008, pp 1–4
Pennec EL, Mallat S (2005) Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 14(4):423– 438
Peyré G, Mallat S (2004) Second generation bandelets and their application to image and 3D meshes compression. Mathematics and Image Analysis MIA 4
Peyré G, Mallat S (2005) Surface compression with geometric bandelets. ACM Trans Graph (TOG) 24(3):601–608
Poppe R (2007) Evaluating example-based pose estimation: experiments on the humaneva sets. Centre for Telematics and Information Technology University of Twente
Raj A, Bhattacharya T, Mukerjee MA Articulated Human Detection and Pose Estimation (CS365 Course Project)
Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence. http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14388
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Seo S, Wallat M, Graepel T, Obermayer K (2000) Gaussian process regression: active data selection and test point rejection. In: Mustererkennung 2000, pp 27–34
Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp 3506–3513
Sigal L, Balan AO, Black MJ (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1-2):4–27
Sminchisescu C, Kanaujia A, Metaxas DN (2007) BM 3E: discriminative density propagation for visual tracking. IEEE Trans Pattern Anal Mach Intell 29(11):2030–2044
Song Y, McLoughlin IV, Dai LR (2014) Local coding based matching kernel method for image classification. Plos One 9(8):e103575
Tepper M, Sapiro G (2012) Decoupled coarse-to-fine matching and nonlinear regularization for efficient motion estimation. In: 19th IEEE international conference on image processing (ICIP), 2012, pp 1517–1520
Tian J, Li L, Liu W (2014) Multi-scale human pose tracking in 2D monocular images. J Comput Commun 2:78
Ukita N (2013) Iterative action and pose recognition using global-and-pose features and action-specific models. In: IEEE international conference on computer vision workshops (ICCVW), 2013, pp 476–483
Van De Sande KE, Gevers T, Snoek CG (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
van Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Computer vision–ECCV 2008, pp 696–709
Wang F, Li Y (2013) Learning visual symbols for parsing human poses in images. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 2510–2516
Wang J, Gong Y (2012) Discovering image semantics in codebook derivative space. IEEE Trans Multimedia 14(4):986–994
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, pp 1794–1801
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2014) Object detectors emerge in deep scene CNNs. Comput Sci
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61075041, 61105016).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, H., Gou, J. Directional geometric histogram feature extraction and applications. Multimed Tools Appl 76, 15173–15189 (2017). https://doi.org/10.1007/s11042-017-4729-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4729-3