Skip to main content
Log in

Constructing a discriminative visual vocabulary with macro and micro sense of visual words

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Ancuti C, Bekaert P (2007) SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms. IEEE Int Symp Image Signal Process Anal 130–135

  2. Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. Proc Assoc Comput Mach Spec Interes Group Inf Retr 96–103

  3. Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2001) On feature distributional clustering for text categorization. Proc. Assoc Comput Mach Spec Interes Group Inf Retr 146–153

  4. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  MathSciNet  Google Scholar 

  5. Blei DM, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Bolovinou A, Pratikakis I, Perantonis S (2012) Bag of spatio-visual words for context inference in scene classification. Pattern Recognit 46(2013):1039–1053

    Google Scholar 

  7. Bosch A, Zisserman A, Muñoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727

    Article  Google Scholar 

  8. Cao Y, Sun F, Wang D, Zhou J (2012) Image cluster and retrieval with latent Dirichlet allocation model. Int J Digit Content Technol Appl 6(18):89–98

    Article  Google Scholar 

  9. Deng Y, Manjunath BS, Kenney C, Moore MS, Shin H (2001) An efficient color representation for image retrieval. IEEE Trans Image Proc 10(1)

  10. Hörster E, Lienhart R, Slaney M (2007) Image retrieval on large-scale image databases. Proceedings of the 6th ACM international conference on Image and video retrieval. 17–24

  11. Ji R, Yao H, Liu W, Sun X, Tian Q (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293

    Article  MathSciNet  Google Scholar 

  12. Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimedia 12(1):42–53

    Article  Google Scholar 

  13. Kesorn K, Poslad S (2012) An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans Multimedia 14(1):211–222

    Article  Google Scholar 

  14. Kuo C, Yang NC, Kuo CM, Huang LK (2015) Image retrieval using point- and block-based visual vocabulary. IEEE 2015 Int Sympo Next Gener Electron 1–4

  15. Li T, Mei T, Kweon IS, Hua XS (2011) Contextual bag-of-words for visual categorization. IEEE Trans Circ Syst Video Technol 21(4):381–392

    Article  Google Scholar 

  16. Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. IEEE Comput Vis Pattern Recognit 2:524–531

    Google Scholar 

  17. Liu H, Zhang C (2007) Codebook design of keyblock based image retrieval. LNCS Entertain Comput Icec470–474

  18. López-Sastre RJ, Tuytelaars T, RodrÍguez FJA, Bascón SM (2010) Towards a more discriminative and semantic visual vocabulary. Comput Vis Image Underst 115(2011):415–425

    Google Scholar 

  19. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  20. Ma WY, Deng Y, Manjunath BS (1997) Tools for texture/color based search of images. Proc SPIE 3106:496–507

    Article  Google Scholar 

  21. Manjunath BS, Ohm JR, Vasudevan VV, Yamada A (2001) Color and texture descriptors. IEEE Trans Circ Syst Video Technol 11(6):703–714

    Article  Google Scholar 

  22. Mojsilovic A, Hu J, Soljanin E (2002) Extraction of perceptually important colors and similarity measurement for image matching, retrieval, and analysis. IEEE Trans Image Proc 11(11)

  23. Mojsilovic A, Kovacevic J, Hu J, Safranek RJ, Ganapathy SK (2000) Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE Trans Image Proc 9(1)

  24. Perronnin F (2008) Universal and adapted vocabularies for generic visual categorization. IEEE Trans Pattern Anal Mach Intell 30(7):1243–1256

    Article  Google Scholar 

  25. Qin J, Yung NC (2009) Scene categorization via contextual visual words. Pattern Recognit 43(2010):1874–1888

    MATH  Google Scholar 

  26. Ren R, Collomosse J (2012) Visual sentences for pose retrieval over low-resolution cross-media dance collections. IEEE Trans Multimedia 14(6):1652–1661

    Article  Google Scholar 

  27. Rocha A, Carvalho T, Jelinek HF, Goldenstein S, Wainer J (2012) Points of interest and visual dictionaries for automatic retinal lesion detection. IEEE Trans Biomed Eng 59(8):2244–2253

    Article  Google Scholar 

  28. Sudderth EB, Torralba A, Freeman WT, Willsky AS (2005) Describing visual scenes using transformed dirichlet processes. Adv Neural Inf Proc Syst 1297–1304

  29. Thibos L (1989) Image processing by the human eye. Adv Intell Robot Syst Conf 1989:1148–1153

    Google Scholar 

  30. Wang C, Blei D, Li FF (2009) Simultaneous image classification and annotation. IEEE Comput Vis Pattern Recog (CVPR) 1903–1910

  31. Ward M, Grinstein G, Keim D (2010) Interactive data visualization: foundations, techniques, and application, chapter 3. Hum Percept Inf Proc 73–128, A K Peters/CRC Press

  32. Wei S, Cheng C (2009) Wood image retrieval algorithm based on keyblock distribution. IEEE Int Conf Comput Intell Softw Eng

  33. Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Proc 19(7):1908–1920

    Article  MathSciNet  Google Scholar 

  34. Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370

    Article  Google Scholar 

  35. Yamada A, Pickering M, Jeannin S, Jens LC (2001) MPEG-7 visual part of experimentation model version 9.0-part 3 dominant color. ISO/IEC JTC1/SC29/WG11/N3914, Pisa

  36. Yang NC, Chang WH, Kuo CM, Li TH (2008) A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. J Vis Commun Image Represent 19:92–105

    Article  Google Scholar 

  37. Zhang S, Tian Q, Hua G, Huang Q, Gao W (2011) Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Trans Image Proc 20(9):3664–2677

    MathSciNet  Google Scholar 

  38. Zhou W, Li H, Lu Y, Tian Q (2012) Principal visual word discovery for automatic license plate detection. IEEE Trans Image Proc 21(9):4269–4279

    Article  MathSciNet  Google Scholar 

  39. Zhu L, Rao A, Zhang A (2002) Theory of keyblock-based image retrieval. ACM Trans Inf Syst 224–257

  40. Zhu L, Tang C, Rao A, Zhang A (2001) Using thesaurus to model keyblock-based image retrieval. IEEE Int Conf Multimedia Expo 237–240

  41. Zhu L, Zhang A, Rao A, Cedar RS (2000) Keyblock: an approach for content-based image retrieval. ACM Multimedia 157–166

Download references

Acknowledgments

The authors would like to express their sincere thanks to the anonymous reviewers for their invaluable comments and suggestions. This work was supported by the National Science Counsel of R.O.C. Granted NSC. 102-2221-E-214 -040.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chung-Ming Kuo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuo, CM., Hsieh, CH., Yang, NC. et al. Constructing a discriminative visual vocabulary with macro and micro sense of visual words. Multimed Tools Appl 75, 16983–17017 (2016). https://doi.org/10.1007/s11042-015-2970-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2970-1

Keywords

Navigation