Skip to main content
Log in

Improving keyword based web image search with visual feature distribution and term expansion

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This paper discusses techniques for improving the performance of keyword-based web image queries. Firstly, a web page is segmented into several text blocks based on semantic cohesion. The text blocks which contain web images are taken as the associated texts of corresponding images and TF*IDF model is initially used to index those web images. Then, for each keyword, both relevant web image set and irrelevant web image set are selected according to their TF*IDF values. And visual feature distributions of both positive image and negative image are modeled using Gaussian Mixture Model. An image’s relevance to the keyword with respect to visual feature is thus defined as the ratio of positive distribution density over negative distribution density. We combine the text-based relevance model with visual feature relevance model to improve the performance. Thirdly, a query expansion model is used to improve the performance further. Expansion terms are selected according to their cooccurrences with the query terms in the top-relevant set of the original query. Our experiments show that our approach yield significant improvement over the traditional keyword based query model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attar R, Fraenkel AS (1977) Local feedback in fulltext retrieval systems. J ACM 24(3): 397–417

    Article  MATH  Google Scholar 

  2. Baeza-Yates R, Ribeiro-Neto B (eds) (1999) Modern information retrieval. Addison-Wesley, Reading

    Google Scholar 

  3. Buckley C, Salton G, Alan J et al (1995) Automatic query expansion using SMART. In: Proceedings of third text retrieval conference (TREC-3). National Institute of Standards and Technology, Gaithersburg, pp 69–80

  4. Cascia ML, Sethi S, Sclaroff S (1998) Combining textual and visual cues for content-based image retrieval on the World Wide Web. In: Proceedings of IEEE workshop content-based access of image and video libraries, Santa Barbara, CA, USA, pp 24–28

  5. Chen Z, Liu WY, Zhang F et al (2001) Web mining for web image retrieval. J Am Soc Inf Sci Technol 52(10): 831–839

    Article  Google Scholar 

  6. Chiang T-W, Tsai T (2008) Querying color images using user-specified wavelet features. Knowl Inf Syst 15(1): 109–129

    Article  MathSciNet  Google Scholar 

  7. Deerwester S, Dumai ST, Furnas GW et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6): 391–407

    Article  Google Scholar 

  8. Feng H, Chua T-S (2003) A bootstrapping approach to annotating large image collection. In: Sebe N, Lew MS, Djeraba C (eds) Proceedings of ACM internatioanl conference on multimedia information retrieval, pp 55–62

  9. Gevers T, Smeuldres AWM (1999) The PicToSeek WWW image search system. In: Proceedings of IEEE international conference on multimedia computing and systems, Florence, Italy, pp 246–269

  10. Gong Z, Hou UL, Cheang CW (2006) Web image indexing by using associated texts. Knowl Inf Syst 10(2): 234–265

    Article  Google Scholar 

  11. Harmandas V, Sanderson M, Dunlop MD (1997) Image retrieval by hypertext links. In: Proceedings of SIGIR-97, 20th ACM international conference on research and development in information retrieval, Philadelphia PA, USA, pp 296–303

  12. Jing Y, Balujia S (2008) PageRank for product image search. In: Proceedings of ACM World Wide Web conference, Beijing, China, pp 307–315

  13. Jones KS (1971) Automatic keyword classification for information retrieval. Butterworths, London

    Google Scholar 

  14. Karypis G, HAN E-H, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8): 68–75

    Google Scholar 

  15. Kherfi ML, Ziou D, Bernardi A (2003a) Atlas WISE: A web-based image retrieval engine. In: Proceedings of international conference on image and signal processing (ICISP), Agadir, Morocco, pp 69–77

  16. Kherfi ML, Ziou D, Bernardi A (2003b) Combining positive and negative examples in relevance feedback for content-based image retrieval. J Vis Comm Image Rep 14(4): 428–457

    Article  Google Scholar 

  17. Lempel R, Soffer A (2002) PicASHOW: pictorial authority search by hyperlinks on the Web. ACM Trans Inf Syst 20(1): 1–24

    Article  Google Scholar 

  18. Lu G, Williams B (1999) An integrated WWW image retrieval system. In: Proceedings of Australian WWW conference, pp 17–20

  19. Mukherjea S, Hirata K, Hara Y (1997) Towards a multimedia World-Wide Web information retrieval engine. In: Proceedings of Sixth international WWW Conference, Santa Clara, CA, USA, pp 1181–1191

  20. Natsev A, Haubold A, Tesic J et al (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of 15th international ACM conference on multimedia, Augsburg, Bavaria, Germany, pp 991–1000

  21. Qiu Y, Frei H-P (1993) Concept based query expansion. In: Proceedings of sixteenth annual international ACM conference research and development in information retrieval (SIGIR’93). ACM Press, Pittsburgh, pp 160–169

  22. Render RA, Walker HF (1984) Mixture densities, maximum likelihood and EM algorithm. SIAM Rev 26(2): 195–239

    Article  MathSciNet  Google Scholar 

  23. Russell S, Norvig P (eds) (2003) Artificial intelligence: a modern approach, 2nd edn. Pearson Education Inc., Upper Saddle River

  24. Samet H (1984) The quad tree and related hierarchical data structures. ACM Comput Surv 16(2): 187–260

    Article  MathSciNet  Google Scholar 

  25. Sclaroff S (1995) World Wide Web image search engines. In: Proceedings of NSF workshop on visual information management, Cambridge MA, USA

  26. Sclaroff S, Taycher L, Cascia ML (1997) Image Rover: a content-based image browser for the World Wide Web. In: Proceedings of IEEE workshop on content-based access of image and video libraries, pp 2–9

  27. Shen HT, Ooi BC, Tan KL (2000) Giving meanings to WWW images. In: Proceedings of Eighth ACM international conference multimedia, pp 39–47

  28. Smith JR, Chang S-F (1996) Searching for images and videos on the World Wide Web. Technical report, No. 459-96-25, Center for Telecommunication Research, Columbia University, New York

  29. Smith JR, Chang S-F (1997) An image and video search engine for the World Wide Web. In: Proceedings of SPIE conference on storage and retrieval for image and video databases (IS&T/SPIE), San Jose, CA, pp 84–95

  30. Srihhari RK, Zhang Z, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2/3): 245–275

    Article  Google Scholar 

  31. Stehling RO, Nascimento MA, Falcao AX (2003) Cell histograms versus color histograms for image representation and retrieval. Knowl Inf Syst 5(3): 315–336

    Article  Google Scholar 

  32. Taycher L, Cascia ML, Sclaroff S (1997) Image digestion and relevance feedback in the Image Rover WWW search engine. In: Proceedings of second international conference visual information, San Diego, CA, USA, pp 85–95

  33. Voorhees E, Harman D (1998) Overview of the sixth text retrieval conference (TREC-6). In: Proceedings of sixth text retrieval conference. NIST Special Publication 500–240, pp 1–24

  34. Wang X-J, Ma W-Y, Xue G-R, et al (2004) Multi-model similarity propagation and its application for web image retrieval. In: Proceedings of ACM international conference on MM, pp 944–951

  35. Xu J, Croft WB (2000) Improving the effectiveness of information retrieval with local context analysis. ACM Trans Inf Syst 18(1): 79–112

    Article  Google Scholar 

  36. Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proceedings of Seventh ACM SIGMM international workshop multimedia information retrieval. ACM Press, New York, pp 57–64

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Gong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, Z., Liu, Q. Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 21, 113–132 (2009). https://doi.org/10.1007/s10115-008-0183-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0183-x

Keywords

Navigation