Abstract
The state-of-the-art image classification models, generally including feature coding and pooling, have been widely adopted to generate discriminative and robust image representations. However, the coding schemes available in these models only preserve salient features which results in information loss in the process of generating final image representations. To address this issue, we propose a novel spatial locality-preserving feature coding strategy which selects representative codebook atoms based on their density distribution to retain the structure of features more completely and make representations more descriptive. In the codebook learning stage, we propose an effective approximated K-means with cluster closures to initialize the codebook and independently adjust the center of each cluster of the dense regions. Afterwards, in the coding stage, we first define the concept of “density” to describe the spatial relationship among the code atoms and the features. Then, the responses of local features are adaptively encoded. Finally, in the pooling stage, a locality-preserving pooling strategy is utilized to aggregate the encoded response vectors into a statistical vector for representing the whole image or all the regions of interest. We carry out image classification experiments on three commonly used benchmark datasets including 15-Scene, Caltech-101, and Caltech-256. The experimental results demonstrate that, comparing with the state-of-the-art Bag-of-Words (BoW) based methods, our approach achieves the best classification accuracy on these benchmarked datasets.
Similar content being viewed by others
References
Bill T, Navneet D (2005) Histograms of oriented gradients for human detection. Cvpr 1(12):886 –893
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on Computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8
Boureau Y, Bach F, Lecun Y, Ponce J (2010) Learning mid-level features for recognition. In: IEEE Conference on computer vision and pattern recognition, pp 2559–2566
Boureau YL, Ponce J, Lecun Y (2010) A theoretical analysis of feature pooling in vision algorithms. In: Proceedings of the International conference on machine learning (ICML’10)
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints, pp 1–22
Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2697–2704
Gao S, Tsang IWH, Chia LT, Zhao P (2010) Local features are not lonely laplacian sparse coding for image classification. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp 3555–3561
Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AWM (2008) Kernel codebooks for scene categorization. Computer Vision – ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III
Gemert Jan CV, Veenman CJ, Smeulders AWM, Jan-Mark G (2009) Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(7):1271?-1283
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology
Huang Y, Huang K, Wang C, Tan T (2011) Exploring relations of visual codes for image classification. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, pp 1649–1656
Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol Lond:106–154
Jain P, Kulis B, Grauman K (2008) Fast image search for learned metrics. In: 2008 IEEE Conference on computer vision and pattern recognition, pp 1–8
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on Computer vision and pattern recognition
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of the 2011 international conference on computer vision, ICCV ’11. IEEE Computer Society, Washington DC, pp 2486–2493
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. In: International journal of computer vision, pp 91–110
Verma N, Kpotufe S, Dasgupta S (2009) Which spatial partition trees are adaptive to intrinsic dimension?. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, UAI ’09. AUAI Press, pp 565–574
Wang J, Wang J, Ke Q, Zeng G, Li S (2012) Fast approximate k-means via cluster closures. In: CVPR 2012. IEEE Computer Society
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3360–3367
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009
Yu K, Zhang T, Gong Y (2009) Nonlinear learning using local coordinate coding. Advances in Neural Information Processing Systems
Acknowledgments
This work is supported by the Natural Science Foundation of China (Grant Nos. 61673204, 61273257, 61321491), the Program for Distinguished Talents of Jiangsu Province, China (Grant No. 2013-XXRJ-018), and the Fundamental Research Funds for the Central Universities (Grant No. 020214380026).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, QH., Wang, ZZ., Mao, XJ. et al. Spatial locality-preserving feature coding for image classification. Appl Intell 47, 148–157 (2017). https://doi.org/10.1007/s10489-016-0887-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0887-7