Abstract
This study explores the idea of learning visual codebook using spectral clustering, which we call spectral visual codebook learning (SVCL). Though spectral clustering has been widely applied into unsupervised segmentation, clustering, and manifold learning, using it to learn codebooks on standard image benchmark datasets has not been thoroughly studied. We show how learned codebooks by SVCL can be used for scene classification, texture recognition and image categorization. We describe several implementations for constructing the similarity graph and addressing the large-scale local image patches problem. We show that our approach captures nonlinear manifolds of semantic image patches. Another advantage is that both label and spatial information can be incorporated without increasing its model complexity. We validate SVCL on datasets such as KTH-TIPS, Scene-15, Graz-02, and Caltech-101.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chang Y, Lin C (2008) Ranking feature using linear svm. In: JMLR workshop, pp 53–64
Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop of European conference on computer vision, pp 1–16
Fischer B, Buhmann J (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25:513–518
Forsyth D, Toor P, Zisserman A (2008) Kernel codebooks for scence categorization. In: European conference on computer vision, pp 696–709
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26:214–224
Fred A, Jain A (2004) Robust data clustering. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International conference on computer vision, pp 604–610
Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process ASSP 29(6):1153–1160
Lanckriet G, Cristianini N, Ghaoui L, Bartlett P, Jordan J (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Lazebnik S, Raginshy M (2007) Learning nearest-neighbor quantizers from labeled data by information loss minimization. In: AI statistics, pp 251–258
Lazebnik S, Schmid C, Ponce J (2003) Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: International conference on computer vision, pp 649–655
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognition natural scene categories. In: IEEE conference on computer vision and pattern recognition, pp 2169–2178
Leibe B, Mikolajczyk K, Schiele B (2006) Efficient clustering and matching for object class recognition. In: British conference on computer vision, pp 1–10
Leung T, Malik J (1999) Recognizing surfaces using three-dimensional textons. In: International conference on computer vision, p 1010
Li F, Fergus R, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE conference on computer vision and pattern recognition, pp 524–531
Lim J, Ho J, Yang M, Lee K, Kriegman D (2004) Image clustering with metric, local linear structure and affine symmetry. In: European conference on computer vision, pp 456–468
Liu D, Hua G, Viola P, Chen T (2008) Integrated feature selection and higher-order spatial feature extraction for object categorization. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Liu J, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: IEEE conference on computer vision and pattern recognition, pp 461–468
Liu L, Wang L, Shen C (2011) A generalized probabilistic framework for compact codebook creation. In: IEEE conference on computer vision and pattern recognition, pp 1537–1544
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–100
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
Mallapragada P, Jin R, Jain A (2010) Online visual vocabulary pruning using pairwise constraints. In: IEEE conference on computer vision and pattern recognition, pp 3073–3080
Mikulik A, Perdoch M, Chum O, Matas J (2010) Learning a fine vocabulary. In: European conference on computer vision, pp 1–14
Miladenic D, Brank J, Grobelnik M, Milic-Frayling N (2004) Feature selection using linear classifier weights: interaction with classification model. In: ACM SIGIR conference on research and development in information retrieval, pp 234–241
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Neural information processing systems, pp 985–992
Ng A, Jordan M, Weiss Y (2002) On spectral clusterings: analysis and an algorithm. In: Neural information processing systems, pp 849–856
Nguyen H, Fablet R, Boucher J (2011) Visual textures as realizations of multivariate log-gaussian cox processes. In: IEEE conference on computer vision and pattern recognition, pp 2945–2952
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference on computer vision and pattern recognition, pp 2161–2168
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, pp 490–503
Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In: European conference on computer vision, pp 71–84
Shi J, Malik J (2000) Normilzed cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905
Sivic J, Aisserman Z (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp 1470–1477
Sonnenburg S, Ratsch G, Schafer C, Scholkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
Strehl A, Ghosh J (2002) Clustering ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Wang J, Yang J, Yu K, Lu F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3360—3367
Wu J, Rehg JM (2009) Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: International conference on computer vision, pp 630–637
Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp 1794–1801
Zhang J, Marszalek M, Lazebnik S, Schimd C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238
Zhu Q, Song G, Shi J (2007) Untangling cycles for contour grouping. In: International conference on computer vision, pp 1–8
Zhu S, Guo C, Wu Y, Wang Y (2002) What are textons. In: European conference on computer vision, pp 793–807
Acknowledgements
This research is supported in part by the Outstanding Young Academic Talents Start-up Funds of Wuhan University No. 216-410100004, the Fundamental Research Funds for the Central Universities of China No. 2042015kf0042, the National Natural Science Foundation of China No. 61502351, and the Nature Science Foundation of Hubei, China No. 2015CFB340.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Yi Hong declares that he has no conflict of interest. Author Weiping Zhu declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Communicated by A. Di Nola.
Rights and permissions
About this article
Cite this article
Hong, Y., Zhu, W. Learning visual codebooks for image classification using spectral clustering. Soft Comput 22, 6077–6086 (2018). https://doi.org/10.1007/s00500-017-2937-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2937-4