Abstract
The development of technology generates huge amounts of non-textual information, such as images. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words and discover hidden semantics, have been published. These models improve the annotation and retrieval of large image databases. However, image data usually have a large number of dimensions. Traditional clustering algorithms assign equal weights to these dimensions, and become confounded in the process of dealing with these dimensions. In this paper, we propose weighted feature selection algorithm as a solution to this problem. For a given cluster, we determine relevant features based on histogram analysis and assign greater weight to relevant features as compared to less relevant features. We have implemented various different models to link visual tokens with keywords based on the clustering results of K-means algorithm with weighted feature selection and without feature selection, and evaluated performance using precision, recall and correspondence accuracy using benchmark dataset. The results show that weighted feature selection is better than traditional ones for automatic image annotation and retrieval.
Similar content being viewed by others
References
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, ACM, pp 61–72
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM, pp 94–105
Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Blei D, Jordan M (2003) Modeling annotated data. 26th Annual Int. ACM SIGIR Conf., Toronto, Canada
Cheng C-H, Ada WF, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 84–93
Deprettere F (1988) SVD and signal processing: algorithms, analysis and applications, edited by Elsevier Science Publishers, North Holland
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of Seventh European Conference on Computer Vision (ECCV), Vol. 4, pp 97–112
Forst CO, Taylor B, Noakes A, Markel S, Torres D, Darbenstott KM (2000) Browse and search patterns in a digital image database. Inf Retr 1:287–313
http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html
http://www.cs.arizona.edu/people/kobus/research/data/eccv_2002
http://corel.digitalriver.com/
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using crossmedia relevance models. 26th Annual Int. ACM SIGIR Conference, Toronto, Canada
Kang F, Jin R, Chai JY (2004) Regularizing translation models for better automatic image annotation. CIKM'04, Washington, DC, USA, November 8–13, pp 350–359
Khan L, Wang L (2002, October) Automatic ontology derivation using clustering for image classification. In: Proc. of Eighth International Workshop on Multimedia Information Systems, Tempe, Arizona, pp 56–65
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25:1075–1088
Markkula M, Sormunen E (2000) End-user searching challenges indexing practices in the digital newspaper photo archive. Inf Retr 1(4):259–285
Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM'99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management
Nagesh HS (1999, June) High performance subspace clustering for massive data sets. Master's thesis, North-western University, 2145 Sheridan Road, Evanston Illinois 60208
Pan J-Y, Yang H-J, Duygulu P, Faloutsos C (2004) Automatic image captioning. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
Shi J, Malik J (1997, June) Normalized cuts and image segmentation. IEEE Conf. Computer Vision and Pattern Recognition(CVPR), Puerto Rico
Wang L, Liu L, Khan L (2004, Nov) Automatic image annotation and retrieval using subspace clustering algorithm. In: Proc. of ACM MMDB, Arlington, Virginia
Zhao R, Grosky W (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2):189–200
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, L., Khan, L. Automatic image annotation and retrieval using weighted feature selection. Multimed Tools Appl 29, 55–71 (2006). https://doi.org/10.1007/s11042-006-7813-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-006-7813-7