Abstract
We address the large scale visual classification problem. The approach is based on sparse and redundant representations over trained dictionaries. The proposed algorithm firstly trains dictionaries using the images of every visual category, one category has one dictionary. In this paper, we choose the K-SVD algorithm to train the visual category dictionary. Given a set of training images from a category, the K-SVD algorithm seeks the dictionary that leads to the best representation for each image in this set, under strict sparsity constraints. For testing images, the traditional classification method under the large scale condition is the k-nearest-neighbor method. And in our method, the category result is through the reconstruction residual using different dictionaries. To get the most effective dictionaries, we explore the large scale image database from the Internet [2] and design experiments on a nearly 1.6 million tiny images on the middle semantic level defined based on WordNet. We compare the image classification performance under different image resolutions and k-nearest-neighbor parameters. The experimental results demonstrate that the proposed algorithm outperforms k-nearest-neighbor in two aspects: 1) the discriminative capability for large scale visual classification task, and 2) the average running time of classifying one image.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aharon, M., Elad, M., Bruckstein, A.M.: The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 4311–4322 (2006)
Torralba, A., Fergus, R., Freeman, W.T.: Tiny Images. Technical Report, Computer Science and Artificial Intelligence Lab., MIT (2007)
Fellbaum, C.: Wordnet: An Electronic Lexical Database. Bradford Books (1998)
Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing 15, 3736–3745 (2006)
Donoho, D.L., Elad, M., Temlyakov, V.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory 147, 185–195 (2007)
Bryt, O., Elad, M.: Compression of facial images using the K-SVD algorithm. J. Vis. Commun. Image Represent 19(4), 270–283 (2008)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proc. CVPR (June 2005)
Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proc. ICCV (2005)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)
Li, L., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proc. CVPR (June 2007)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, vol. 1, pp. 384–393 (2002)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transaction on PAMI 27(10), 1615–1630 (2005)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proc. CVPR (June 2006)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: Proc. ICCV, pp. 370–377 (2005)
Yeh, T., Lee, J., Darrell, T.: Adaptive vocabulary forests for dynamic indexing and category learning. In: Proc. ICCV (October 2007)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 43, 177–196 (2001)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. SIGIR (1999)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: Proc. CVPR (2009)
Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: A database and web-based tool for image annotation. IJCV 77(1-3), 157–173 (2008)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Conf. Rec. 27th Asilomar Conf. Signals, Syst. Comput., vol. 1 (1993)
Tropp, J.A.: Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)
Donoho, D.: For Most Large Underdetermined Systems of Linear Equations the Minimal l1-Norm Solution Is Also the Sparsest Solution. Comm. Pure and Applied Math. 59(6), 797–829 (2006)
Candes, E., Romberg, J., Tao, T.: Stable Signal Recovery from Incomplete and Inaccurate Measurements. Comm. Pure and Applied Math. 59(8), 1207–1223 (2006)
Chen, S., Donoho, D., Saunders, M.: Atomic Decomposition by Basis Pursuit. SIAM Rev. 43(1), 129–159 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fu, Z., Lu, H., Deng, N., Cai, N. (2010). Large Scale Visual Classification via Learned Dictionaries and Sparse Representation. In: Wang, F.L., Deng, H., Gao, Y., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2010. Lecture Notes in Computer Science(), vol 6319. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16530-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-16530-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16529-0
Online ISBN: 978-3-642-16530-6
eBook Packages: Computer ScienceComputer Science (R0)