Skip to main content
Log in

Automatic image annotation and retrieval using weighted feature selection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The development of technology generates huge amounts of non-textual information, such as images. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words and discover hidden semantics, have been published. These models improve the annotation and retrieval of large image databases. However, image data usually have a large number of dimensions. Traditional clustering algorithms assign equal weights to these dimensions, and become confounded in the process of dealing with these dimensions. In this paper, we propose weighted feature selection algorithm as a solution to this problem. For a given cluster, we determine relevant features based on histogram analysis and assign greater weight to relevant features as compared to less relevant features. We have implemented various different models to link visual tokens with keywords based on the clustering results of K-means algorithm with weighted feature selection and without feature selection, and evaluated performance using precision, recall and correspondence accuracy using benchmark dataset. The results show that weighted feature selection is better than traditional ones for automatic image annotation and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, ACM, pp 61–72

  2. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM, pp 94–105

  3. Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    Article  Google Scholar 

  4. Blei D, Jordan M (2003) Modeling annotated data. 26th Annual Int. ACM SIGIR Conf., Toronto, Canada

  5. Cheng C-H, Ada WF, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 84–93

  6. Deprettere F (1988) SVD and signal processing: algorithms, analysis and applications, edited by Elsevier Science Publishers, North Holland

  7. Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of Seventh European Conference on Computer Vision (ECCV), Vol. 4, pp 97–112

  8. Forst CO, Taylor B, Noakes A, Markel S, Torres D, Darbenstott KM (2000) Browse and search patterns in a digital image database. Inf Retr 1:287–313

    Article  Google Scholar 

  9. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html

  10. http://www.cs.arizona.edu/people/kobus/research/data/eccv_2002

  11. http://corel.digitalriver.com/

  12. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using crossmedia relevance models. 26th Annual Int. ACM SIGIR Conference, Toronto, Canada

  13. Kang F, Jin R, Chai JY (2004) Regularizing translation models for better automatic image annotation. CIKM'04, Washington, DC, USA, November 8–13, pp 350–359

  14. Khan L, Wang L (2002, October) Automatic ontology derivation using clustering for image classification. In: Proc. of Eighth International Workshop on Multimedia Information Systems, Tempe, Arizona, pp 56–65

  15. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25:1075–1088

    Article  Google Scholar 

  16. Markkula M, Sormunen E (2000) End-user searching challenges indexing practices in the digital newspaper photo archive. Inf Retr 1(4):259–285

    Article  Google Scholar 

  17. Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM'99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management

  18. Nagesh HS (1999, June) High performance subspace clustering for massive data sets. Master's thesis, North-western University, 2145 Sheridan Road, Evanston Illinois 60208

  19. Pan J-Y, Yang H-J, Duygulu P, Faloutsos C (2004) Automatic image captioning. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)

  20. Shi J, Malik J (1997, June) Normalized cuts and image segmentation. IEEE Conf. Computer Vision and Pattern Recognition(CVPR), Puerto Rico

  21. Wang L, Liu L, Khan L (2004, Nov) Automatic image annotation and retrieval using subspace clustering algorithm. In: Proc. of ACM MMDB, Arlington, Virginia

  22. Zhao R, Grosky W (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2):189–200

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Latifur Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Khan, L. Automatic image annotation and retrieval using weighted feature selection. Multimed Tools Appl 29, 55–71 (2006). https://doi.org/10.1007/s11042-006-7813-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-006-7813-7

Keywords

Navigation