Skip to main content
Log in

Spatial locality-preserving feature coding for image classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The state-of-the-art image classification models, generally including feature coding and pooling, have been widely adopted to generate discriminative and robust image representations. However, the coding schemes available in these models only preserve salient features which results in information loss in the process of generating final image representations. To address this issue, we propose a novel spatial locality-preserving feature coding strategy which selects representative codebook atoms based on their density distribution to retain the structure of features more completely and make representations more descriptive. In the codebook learning stage, we propose an effective approximated K-means with cluster closures to initialize the codebook and independently adjust the center of each cluster of the dense regions. Afterwards, in the coding stage, we first define the concept of “density” to describe the spatial relationship among the code atoms and the features. Then, the responses of local features are adaptively encoded. Finally, in the pooling stage, a locality-preserving pooling strategy is utilized to aggregate the encoded response vectors into a statistical vector for representing the whole image or all the regions of interest. We carry out image classification experiments on three commonly used benchmark datasets including 15-Scene, Caltech-101, and Caltech-256. The experimental results demonstrate that, comparing with the state-of-the-art Bag-of-Words (BoW) based methods, our approach achieves the best classification accuracy on these benchmarked datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bill T, Navneet D (2005) Histograms of oriented gradients for human detection. Cvpr 1(12):886 –893

    Google Scholar 

  2. Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on Computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8

  3. Boureau Y, Bach F, Lecun Y, Ponce J (2010) Learning mid-level features for recognition. In: IEEE Conference on computer vision and pattern recognition, pp 2559–2566

  4. Boureau YL, Ponce J, Lecun Y (2010) A theoretical analysis of feature pooling in vision algorithms. In: Proceedings of the International conference on machine learning (ICML’10)

  5. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints, pp 1–22

    Google Scholar 

  6. Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2697–2704

  7. Gao S, Tsang IWH, Chia LT, Zhao P (2010) Local features are not lonely laplacian sparse coding for image classification. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp 3555–3561

  8. Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AWM (2008) Kernel codebooks for scene categorization. Computer Vision – ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III

  9. Gemert Jan CV, Veenman CJ, Smeulders AWM, Jan-Mark G (2009) Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(7):1271?-1283

    Article  Google Scholar 

  10. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology

  11. Huang Y, Huang K, Wang C, Tan T (2011) Exploring relations of visual codes for image classification. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, pp 1649–1656

  12. Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol Lond:106–154

  13. Jain P, Kulis B, Grauman K (2008) Fast image search for learned metrics. In: 2008 IEEE Conference on computer vision and pattern recognition, pp 1–8

  14. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on Computer vision and pattern recognition

  15. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  16. Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  17. Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of the 2011 international conference on computer vision, ICCV ’11. IEEE Computer Society, Washington DC, pp 2486–2493

    Google Scholar 

  18. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. In: International journal of computer vision, pp 91–110

  19. Verma N, Kpotufe S, Dasgupta S (2009) Which spatial partition trees are adaptive to intrinsic dimension?. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, UAI ’09. AUAI Press, pp 565–574

  20. Wang J, Wang J, Ke Q, Zeng G, Li S (2012) Fast approximate k-means via cluster closures. In: CVPR 2012. IEEE Computer Society

  21. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3360–3367

  22. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009

  23. Yu K, Zhang T, Gong Y (2009) Nonlinear learning using local coordinate coding. Advances in Neural Information Processing Systems

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of China (Grant Nos. 61673204, 61273257, 61321491), the Program for Distinguished Talents of Jiangsu Province, China (Grant No. 2013-XXRJ-018), and the Fundamental Research Funds for the Central Universities (Grant No. 020214380026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Bin Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, QH., Wang, ZZ., Mao, XJ. et al. Spatial locality-preserving feature coding for image classification. Appl Intell 47, 148–157 (2017). https://doi.org/10.1007/s10489-016-0887-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0887-7

Keywords

Navigation