Hierarchical BoW with segmental sparse coding for large scale image classification and retrieval

Zhou, Jianshe; Narentuya; Tang, Sheng; Liu, Jie

doi:10.1007/s11042-018-5955-z

Hierarchical BoW with segmental sparse coding for large scale image classification and retrieval

Published: 05 May 2018

Volume 77, pages 22319–22338, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jianshe Zhou¹,
Narentuya¹,
Sheng Tang² &
…
Jie Liu³

205 Accesses
3 Citations
Explore all metrics

Abstract

The bag-of-words (BoW) has been widely regarded as the most successful algorithms for content-based image related tasks, such as large scale image retrieval, classification, and object categorization. Large visual words acquired by BoW quantization through large vocabulary or codebooks have been receiving much attention in the past years. However, not only construction of large vocabulary but also the quantization process impose a heavy burden in terms of time and memory complexities. In order to tackle this issue, we propose an efficient hierarchical BoW (HBoW) to achieve large visual words through quantization by a compact vocabulary instead of large vocabulary. Our vocabulary is very compact since it is only composed of two small dictionaries which is learned through segmental sparse decomposition of local features. To generate the BoW with large size, we first divide the local features into two half parts, and use the two small dictionaries to compute their sparse codes. Then, we map the two indices of the maximum elements of the two sparse codes to a large set of visual words based upon the fact that data with similar properties will share the same base weighted with the largest sparse coefficient. To further make similar patches have higher probability of select the same dictionary base to get similar BoW vectors, we propose a novel collaborative dictionary learning method by imposing the similarity regularization factor together with the row sparsity regularization across data instances during group sparse coding. Additionally, based on index combination of top-2 large sparse codes of local descriptors, we propose a soft BoW assignment method so that our proposed HBoW can tolerate different word selection for similar patches. By employing the inverted file structure built through our HBoW, K-nearest neighbors (KNN) can be efficiently retrieved. After incorporation of our fast KNN search into the SVM-KNN classification method, our HBoW can be used for efficient image classification and logo recognition. Experiments on serval well-known datasets show that our approach is effective for large scale image classification and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Article Open access 02 April 2024

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

Article 11 April 2024

References

Avrithis Y, Kalantidis Y (2012) Approximate gaussian mixtures for large scale vocabularies. In: Proc of ECCV
Chua T-S, Tang S, Trichet R, Tan HK, Song Y (2009) Moviebase: A movie database for event detection and behavioral analysis. In: Proceedings of ACM multimedia 2009 workshop on Web-Scale multimedia corpus
Deng J, Berg AC, Li K, Fei-Fei L (2010) What does classifying more than 10,000 image categories tell us? . In: Proc of the 11th European conference on computer vision: Part V, ECCV’10. Springer, Berlin, pp 71–84
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proc of conference on computer vision and pattern recognition (CVPR). http://image-net.org/
Girod B, Chandrasekhar V, Chen DM, Cheung NM, Grzeszczuk R, Reznik Y, Takacs G, Tsai SS, Vedantham R. (2011) Mobile visual search. IEEE Signal Processing Magazine, Special Issue on Media Search in Mobile Devices 28 (4):61–76
Article Google Scholar
Hastie T, Iain J, Efron B, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet MATH Google Scholar
Jegou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 87(3):316–336
Article Google Scholar
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33:117–128
Article Google Scholar
Jiang Y-G, Yang J, Ngo C-W, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection A comprehensive study. IEEE Trans Multimedia 12(1):42–53
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article MATH Google Scholar
Li D, Yang L, Hua XS, Zhang HJ (2010) Large-scale robust visual codebook construction. In: ACM Multimedia ’10
Li P, Lu X, Wang Q (2015) From dictionary of visual words to subspaces Locality-constrained affine subspace coding. In: Proc of conference on computer vision and pattern recognition (CVPR), pp 2348–2357
Li Y, Tang S, Lin M, Zhang Y, Li J, Yan S (2018) Implicit negative sub-categorization and sink diversion for object detection. IEEE Trans Image Process 27(4):1561–1574
Article MathSciNet Google Scholar
Liu J, Tang S, Li Y (2017) Collaborative dictionary learning and soft assignment for sparse coding of image features. In: MultiMedia Modeling - 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I, pp 443–451
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
MathSciNet MATH Google Scholar
Mantziou E, Papadopoulos S, Kompatsiaris Y (2013) Scalable training with approximate incremental laplacian eigenmaps and pca. In: Proceedings of the 21st ACM international conference on Multimedia, ACM Multimedia, pp 381–384
Mikulik A, Perdoch M, Chum O, Matas J (2013) Learning vocabularies over a fine quantization. Int J Comput Vis 103(1):163–175
Article MathSciNet Google Scholar
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Article Google Scholar
Nie L, Yan S, Wang M, Hong R, Chua T-S (2012) Harvesting visual concepts for image search with complex queries. In: Proc of ACM multimedia 2012 conference
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proc of conference on computer vision and pattern recognition (CVPR), pp 2161–2168
Petitcolas FAP (2000) Watermarking schemes evaluation. IEEE Signal Process 17(5):117–128
Article Google Scholar
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proc of conference on computer vision and pattern recognition (CVPR), pp 1–8
Philbin J , Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proc of conference on computer vision and pattern recognition (CVPR)
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proc of ICCV, pp 1470–1477
Strelow D, Bengio S, Pereira F, Singer Y (2009) Group sparse coding. In: Neural information processing systems - NIPS
Stricker MA, Orengo M (1995) Similarity of color images. In: SPIE conference on storage and retrieval for image and video databases III, vol 2420, pp 381–392
Tang S, Chen H, Lv K, Zhang YD (2015) Large visual words for large scale image classification. In: 2015 IEEE international conference on image processing (ICIP), pp 1170–1174
Tang S, Li J-T, Li M, Xie C, Liu Y-Z, Tao K, Xu S-X (2008) TRECVID 2008 High-Level feature extraction By MCG-ICT-CAS. In: Proceedings of TRECVID 2008 Workshop
Tang S, Li Y, Deng L, Zhang Y-D (2017) Object localization based on proposal fusion. IEEE Trans Multimedia 19(9):2105–2116
Article Google Scholar
Tang S, Zhang YD, Chen H (2015) Scalable logo recognition based on compact sparse dictionary for mobile devices. In: 2015 IEEE 17th international workshop on multimedia signal processing (MMSP), pp 1–6
Tang S, Zhang YD, Xua Z-X, Li H, Zheng Y-T, Li J-T (2015) An efficient concept detection system via sparse ensemble learning. Neurocomputing 69:124–133
Article Google Scholar
Tang S, Zheng Y-T, Wang Y, Chua T-S (2012) Sparse ensemble learning for concept detection. IEEE Trans Multimedia 14(1):43–54
Article Google Scholar
Wang M, Hua X-S, Hong R, Tang J, Qi G-J, Song Y (2010) Unified video annotation via multi-graph learning. IEEE Trans Circuits Syst Video Technol 19 (5):733–746
Article Google Scholar
Wu C (2007) SiftGPU: a GPU implementation of scale invariant feature transform (SIFT) http://cs.unc.edu/ccwu/siftgpu
Xie H, Ke G, Zhang Y, Tang S, Li J, Liu Y (2011) Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Trans Multimedia 13(6):1319–1332
Article Google Scholar
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proc of Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang H, Berg AC, Maire M, Malik J (2006) Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: Proc of conference on computer vision and pattern recognition (CVPR), pp 2126–2136
Zhang YD, Wang Y, Tang S, Hoi SCH, Li JT (2014) Fsph: fitted spectral hashing for efficient similarity search. Comput Vis Image Underst 124:3–11
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing, 100048, People’s Republic of China
Jianshe Zhou & Narentuya
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, People’s Republic of China
Sheng Tang
College of Information and Engineering, Capital Normal University, Beijing, 100048, People’s Republic of China
Jie Liu

Authors

Jianshe Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Narentuya
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Liu.

Additional information

The preliminary version of this paper was partly published in the IEEE International Conference on Image Processing (ICIP) 2015, and partly in the 17th IEEE International Workshop on Multimedia Signal Processing (MMSP 2015), and partly in the 19th International Conference on Multimedia Modeling (MMM 2017).

This work was supported by National Nature Science Foundation of China (61371194, 61672361), Beijing Natural Science Foundation (4152012), Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016009)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Narentuya, Tang, S. et al. Hierarchical BoW with segmental sparse coding for large scale image classification and retrieval. Multimed Tools Appl 77, 22319–22338 (2018). https://doi.org/10.1007/s11042-018-5955-z

Download citation

Received: 28 September 2017
Revised: 04 March 2018
Accepted: 28 March 2018
Published: 05 May 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5955-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical BoW with segmental sparse coding for large scale image classification and retrieval

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical BoW with segmental sparse coding for large scale image classification and retrieval

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation