Skip to main content

Large Scale Visual Classification via Learned Dictionaries and Sparse Representation

  • Conference paper
Artificial Intelligence and Computational Intelligence (AICI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6319))

Abstract

We address the large scale visual classification problem. The approach is based on sparse and redundant representations over trained dictionaries. The proposed algorithm firstly trains dictionaries using the images of every visual category, one category has one dictionary. In this paper, we choose the K-SVD algorithm to train the visual category dictionary. Given a set of training images from a category, the K-SVD algorithm seeks the dictionary that leads to the best representation for each image in this set, under strict sparsity constraints. For testing images, the traditional classification method under the large scale condition is the k-nearest-neighbor method. And in our method, the category result is through the reconstruction residual using different dictionaries. To get the most effective dictionaries, we explore the large scale image database from the Internet [2] and design experiments on a nearly 1.6 million tiny images on the middle semantic level defined based on WordNet. We compare the image classification performance under different image resolutions and k-nearest-neighbor parameters. The experimental results demonstrate that the proposed algorithm outperforms k-nearest-neighbor in two aspects: 1) the discriminative capability for large scale visual classification task, and 2) the average running time of classifying one image.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aharon, M., Elad, M., Bruckstein, A.M.: The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 4311–4322 (2006)

    Article  Google Scholar 

  2. Torralba, A., Fergus, R., Freeman, W.T.: Tiny Images. Technical Report, Computer Science and Artificial Intelligence Lab., MIT (2007)

    Google Scholar 

  3. Fellbaum, C.: Wordnet: An Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  4. Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing 15, 3736–3745 (2006)

    Article  MathSciNet  Google Scholar 

  5. Donoho, D.L., Elad, M., Temlyakov, V.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory 147, 185–195 (2007)

    MATH  Google Scholar 

  6. Bryt, O., Elad, M.: Compression of facial images using the K-SVD algorithm. J. Vis. Commun. Image Represent 19(4), 270–283 (2008)

    Article  Google Scholar 

  7. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proc. CVPR (June 2005)

    Google Scholar 

  8. Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proc. ICCV (2005)

    Google Scholar 

  9. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)

    Google Scholar 

  10. Li, L., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proc. CVPR (June 2007)

    Google Scholar 

  11. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  12. Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, vol. 1, pp. 384–393 (2002)

    Google Scholar 

  13. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)

    Article  Google Scholar 

  14. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transaction on PAMI 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  15. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proc. CVPR (June 2006)

    Google Scholar 

  16. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: Proc. ICCV, pp. 370–377 (2005)

    Google Scholar 

  17. Yeh, T., Lee, J., Darrell, T.: Adaptive vocabulary forests for dynamic indexing and category learning. In: Proc. ICCV (October 2007)

    Google Scholar 

  18. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  19. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 43, 177–196 (2001)

    Article  MATH  Google Scholar 

  20. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. SIGIR (1999)

    Google Scholar 

  21. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: Proc. CVPR (2009)

    Google Scholar 

  22. Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: A database and web-based tool for image annotation. IJCV 77(1-3), 157–173 (2008)

    Article  Google Scholar 

  23. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Conf. Rec. 27th Asilomar Conf. Signals, Syst. Comput., vol. 1 (1993)

    Google Scholar 

  24. Tropp, J.A.: Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Donoho, D.: For Most Large Underdetermined Systems of Linear Equations the Minimal l1-Norm Solution Is Also the Sparsest Solution. Comm. Pure and Applied Math. 59(6), 797–829 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  26. Candes, E., Romberg, J., Tao, T.: Stable Signal Recovery from Incomplete and Inaccurate Measurements. Comm. Pure and Applied Math. 59(8), 1207–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Chen, S., Donoho, D., Saunders, M.: Atomic Decomposition by Basis Pursuit. SIAM Rev. 43(1), 129–159 (2001)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, Z., Lu, H., Deng, N., Cai, N. (2010). Large Scale Visual Classification via Learned Dictionaries and Sparse Representation. In: Wang, F.L., Deng, H., Gao, Y., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2010. Lecture Notes in Computer Science(), vol 6319. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16530-6_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16530-6_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16529-0

  • Online ISBN: 978-3-642-16530-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics