ABSTRACT
Common techniques represent images by quantizing local descriptors and summarizing their distribution in a histogram. In this paper we propose to employ a parametric description and compare its capabilities to histogram based approaches. We use the multivariate Gaussian distribution, applied over the SIFT descriptors, extracted with dense sampling on a spatial pyramid. Every distribution is converted to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. Experiments on Caltech-101 and ImageCLEF2011 are performed using the Stochastic Gradient Descent solver, which allows to deal with large scale datasets and high dimensional feature spaces.
- S. Ali and S. Silvey. A general class of coefficients of divergence of one distribution from another. J. of the Royal Stat. Soc. (B), 28(1):131--142, 1966.Google Scholar
- A. Binder, W. Samek, M. Kloft, C. Müller, K.-R. Müller, and M. Kawanabe. The Joint Submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 Photo Annotation Task. In CLEF Workshop, 2011.Google Scholar
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, pages 161--168, 2008.Google Scholar
- K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, 2011.Google ScholarCross Ref
- G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV Workshop Stat. Learn. Comput. Vision, 2004.Google Scholar
- P. Gehler and S. Nowozin. On feature combination for multiclass object classification. In ICCV, 2009.Google ScholarCross Ref
- J. C. Gemert, J.-M. Geusebroek, C. J. Veenman, and A. W. Smeulders. Kernel codebooks for scene categorization. In ECCV, 2008. Google ScholarDigital Library
- K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res., 8:725--760, 2007. Google ScholarDigital Library
- Y. Huang, K. Huang, C. Wang, and T. Tan. Exploring relations of visual codes for image classification. In Proc. of CVPR, 2011. Google ScholarDigital Library
- Y. Jia, C. Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, 2012. Google ScholarDigital Library
- Z. Jiang, G. Zhang, and L. S. Davis. Submodular dictionary learning for sparse coding. In CVPR, 2012. Google ScholarDigital Library
- T. Kailath. The divergence and Bhattacharyya distance measures in signal selection. IEEE T. Commun. Techn., 15(1):52--60, 1967.Google ScholarCross Ref
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of CVPR, 2006. Google ScholarDigital Library
- Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, and K. Yu. Large-scale image classification: Fast feature extraction and svm training. In CVPR, 2011.Google ScholarDigital Library
- L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, 2011.Google Scholar
- S. Martelli, D. Tosato, M. Farenzena, M. Cristani, and V. Murino. An FPGA-based Classification Architecture on Riemannian Manifolds. In DEXA Workshops, 2010. Google ScholarDigital Library
- E. Spyromitros-Xioufis, K. Sechidis, G. Tsoumakas, and I. P. Vlahavas. MLKD's Participation at the CLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks. In CLEF Workshop, 2011.Google Scholar
- T. Tuytelaars, M. Fritz, K. Saenko, and T. Darrell. The nbnn kernel. In ICCV, 2011. Google ScholarDigital Library
- O. Tuzel, F. Porikli, and P. Meer. Pedestrian Detection via Classification on Riemannian Manifolds. IEEE T. Pattern Anal., 30(10):1713--1727, 2008. Google ScholarDigital Library
- A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.Google Scholar
- J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010.Google ScholarCross Ref
- J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009.Google Scholar
Index Terms
- Modeling local descriptors with multivariate gaussians for object and scene recognition
Recommendations
Random interest regions for object recognition based on texture descriptors and bag of features
In this work we propose a novel method for object recognition based on a random selection of interest regions, texture features (local binary/ternary patterns and local phase quantization) for describing each region, a bag-of-features approach for ...
New color GPHOG descriptors for object and scene image classification
This paper presents a novel set of image descriptors that encodes information from color, shape, spatial and local features of an image to improve upon the popular Pyramid of Histograms of Oriented Gradients (PHOG) descriptor for object and scene image ...
Local contour descriptors around scale-invariant keypoints
ICIP'09: Proceedings of the 16th IEEE international conference on Image processingDescribing local patches to register image keypoints is an important task for building a huge database from video frames. When searching for an efficient descriptor, task is twofold: features must describe the featuring patches at a high efficiency, ...
Comments