Image Representation Learning by Deep Appearance and Spatial Coding

Liu, Bingyuan; Liu, Jing; Li, Zechao; Lu, Hanqing

doi:10.1007/978-3-319-16865-4_43

Bingyuan Liu⁵,
Jing Liu⁵,
Zechao Li⁶ &
…
Hanqing Lu⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Included in the following conference series:

Asian Conference on Computer Vision

2073 Accesses
1 Citations

Abstract

The bag of feature model is one of the most successful model to represent an image for classification task. However, the discrimination loss in the local appearance coding and the lack of spatial information hinder its performance. To address these problems, we propose a deep appearance and spatial coding model to build more optimal image representation for the classification task. The proposed model is a hierarchical architecture consisting of three operations: appearance coding, max-pooling and spatial coding. Firstly, with an image as input, we extract a set of local descriptors and adopt the appearance coding to encode them into high-dimensional robust vectors. Then max-pooling is performed within the over spatial partitioned grids to incorporate spatial information. After that, spatial coding is carried out to increasingly integrate the region vectors to a global image signature. Finally, the resulting image representation are employed to train a one-versus-others SVM classifier. In the learning of the proposed model, we layerwisely pre-train the network and then perform supervised fine-tuning with image labels. The experiments on three image benchmark datasets (i.e. 15-Scenes, PASCAL VOC 2007 and Caltech-256) demonstrate the effectiveness of our proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV 2004 Workshop on Statistical Learning in Computer Vision (2004)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Chapter Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Swersky, K., Tarlow, D., Sutskever, I., Salakhutdinov, R., Zemel, R., Adams, R.: Cardinality restricted boltzmann machines. In: NIPS (2012)
Google Scholar
Roth, P.M., Winter, M.: Survey of Appearance-Based methods for object recognition. Institute for Computer Graphics and Vision, Graz University of Technology, Technical report (2008)
Google Scholar
Perronnin, F., Dance, C., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)
Chapter Google Scholar
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV (2005)
Google Scholar
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)
Google Scholar
van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1271–1283 (2010)
Article Google Scholar
Jiang, Z., Lin, Z., Davis, L.S.: Learning a discriminative dictionary for sparse coding via label consistent k-svd. In: CVPR (2011)
Google Scholar
Yang, J., Yu, K., Huang, T.S.: Supervised translation-invariant sparse coding. In: CVPR (2010)
Google Scholar
Goh, H., Thome, N., Cord, M., Lim, J.-H.: Unsupervised and supervised visual codes with restricted boltzmann machines. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 298–311. Springer, Heidelberg (2012)
Chapter Google Scholar
Li, Z., Liu, J., Yang, Y., Zhou, X., Lu, H.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26, 2138–2150 (2014)
Article Google Scholar
Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: AAAI (2012)
Google Scholar
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: CVPR (2006)
Google Scholar
Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higher-order spatial feature extraction for object categorization. In: CVPR (2008)
Google Scholar
Morioka, N., Satoh, S.: Building compact local pairwise codebook with joint feature space clustering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 692–705. Springer, Heidelberg (2010)
Chapter Google Scholar
Morioka, N., Satoh, S.: Learning directional local pairwise bases with sparse coding. In: BMVC (2010)
Google Scholar
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Chapter Google Scholar
Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Google Scholar
Harada, T., Ushiku, Y., Yamashita, Y., Kuniyoshi, Y.: Discriminative spatial pyramid. In: CVPR (2011)
Google Scholar
Sharma, G., Jurie, F.: Learning discriminative spatial representation for image classification. In: BMVC (2011)
Google Scholar
Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: CVPR (2012)
Google Scholar
Liu, B., Liu, J., Lu, H.: Adaptive spatial partition learning for image classification. Neurocomputing 142, 282–290 (2014)
Article Google Scholar
Huang, F.J., lan Boureau, Y., Lecun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: CVPR (2007)
Google Scholar
Hinton, G.E., Osindero, S.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML (2009)
Google Scholar
Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: CVPR (2011)
Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet Google Scholar
Tieleman, T.: Training restricted boltzmann machines using approximations to the likelihood gradient. In: ICML (2008)
Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Google Scholar
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Google Scholar
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Google Scholar
Zhou, X., Cui, N., Li, Z., Liang, F., Huang, T.: Hierarchical gaussianization for image classification. In: ICCV (2009)
Google Scholar
Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric lp-norm feature pooling for image classification. In: CVPR (2011)
Google Scholar
Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: NIPS (2010)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, California Institute of Technology (2007)
Google Scholar

Download references

Acknowledgement

This work was supported by 863 Program (2014AA015104) and National Natural Science Foundation of China (61332016, 61272329, 61472422, and 61273034).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Bingyuan Liu, Jing Liu & Hanqing Lu
School of Computer Science, Nanjing University of Science and Technology, Nanjing, China
Zechao Li

Authors

Bingyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zechao Li
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bingyuan Liu .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, B., Liu, J., Li, Z., Lu, H. (2015). Image Representation Learning by Deep Appearance and Spatial Coding. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-16865-4_43
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics