Abstract
Measuring image similarity is an important task for various multimedia applications. Similarity can be defined at two levels: at the syntactic (lower, context-free) level and at the semantic (higher, contextual) level. As long as one deals with the syntactic level, defining and measuring similarity is a relatively straightforward task, but as soon as one starts dealing with the semantic similarity, the task becomes very difficult. We examine the use of simple readily available syntactic image features combined with other multimodal features to derive a similarity measure that captures the weak semantics of an image. The weak semantics can be seen as an intermediate step between low level image understanding and full semantic image understanding. We investigate the use of single modalities alone and see how the combination of modalities affect the similarity measures. We also test the measure on multimedia retrieval task on a tv series data, even though the motivation is in understanding how different modalities relate to each other.
Similar content being viewed by others
Notes
In this paper we use both the concept of image similarity and the concept of distance between images quite freely. The relation between the two concepts is an inverse relation, however. The higher the similarity the smaller the distance and vice versa.
ISO/IEC 13818-2:2000—information technology—generic coding of moving pictures and associated audio information: video.
Called slices in the standard
The standard specifies different kinds of blocks, but since we are considering only I-pictures, all blocks are so called intra blocks.
DC-coefficients are the zero frequency coefficients for the DCT and AC-coefficients the rest of the coefficients.
We limit ourselves to the use of visual words as the basic syntactic feature for images.
Feature counts are always positive or zero.
The query features can be both visual features or textual features. In the experiments of this paper we use only textual query features, however.
For our purposes we would need a multimedia data that contains video and text. The TREC video track data is not usable for us, since e.g. in the TRECVID ASR corpora the amount of text is too low for meaningful textual modeling. Even with the additional text that we currently use, the amount of text is at the lower side.
References
Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2009) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022. MIT Press
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117
Buntine W, Jakulin A (2006) Discrete component analysis, subspace, latent structure and feature selection techniques, pp 1–33
Chen W, Liu C, Lander K, Fu X (2009) Comparison of human face matching behavior and computational image similarity measure. Science China Information Sciences 52(2):316–321
Csillaghy A, Hinterberger H, Benz AO (2000) Content-based image retrieval in astronomy. In: Information retrieval, vol 3(3). Kluwer Academic Publishers, pp 229–241
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV international workshop on statistical learning in computer vision, pp 1–22
Durán ML, Rodríguez PG, Arias-Nicolás JP, Martín J, Disdier C (2009) A perceptual similarity method by pairwise comparison in a medical image case. Mach Vis Appl. doi:10.1007/s00138-009-0201-3
Felipe JC, Traina Jr C, Machado Traina AJ (2009) A new family of distance functions for perceptual similarity retrieval of medical images. J Digit Imaging 22(2):183–201
Fu KS (1974) Syntactic methods in pattern recognition. Academic, NY
Gile N, Wang N, Nathalie C, Siewe F, Lin X, Xu D (2008) A case study of image retrieval on lung cancer chest X-ray pictures. In: 9th international conference on signal processing 2008 (ICSP 2008), pp 924–927
Grigorova A, De Natale F, Dagli C, Huang T (2007) Content-based image retrieval by feature adaptation and relevance feedback. IEEE Trans Multimedia 9:1183–1192
Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 50–57
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley Interscience
Jing Y, Baluja S (2008) PageRank for product image search. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp 307–315
Kak A, Pavlopoulou C (2002) Content-based image retrieval from large medical databases. In: First international symposium on 3D data processing visualization and transmission 2002, pp 138–147
Li M, Chen X, Li X, Ma B, Vitányi P (2004) The similarity metric. IEEE Trans Inf Theory 50:3250–3264
Lin W, Jin R, Hauptmann A (2003) Web image retrieval re-ranking with relevance model. In: IEEE/WIC/ACM international conference on Web intelligence
Lu SY, Fu KS (1978) A syntactic approach to texture analysis. CGIP 7:303–330
Marchand-Maillet S, Worring M (2006) Benchmarking image and video retrieval: an overview. In: MIR ’06: Proceedings of the 8th ACM international workshop on multimedia information retrieval. Santa Barbara, CA, USA, pp 297–300
McDonald K, Smeaton AF (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: 4th international conference on image and video retrieval (CIVR), pp 61–70
Perkiö J, Hyvärinen A (2009) Modelling image complexity by independent component analysis, with application to content-based image retrieval. In: ICANN ’09: Proceedings of the 19th international conference on artificial neural networks, pp 704–714
Perkiö J, Tuominen A, Myllymäki P (2009) Image similarity: from syntax to weak semantics using multimodal features with application to multimedia retrieval. In: International conference on multimedia information networking and security, pp 213–219
Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, pp 1470–1477
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380. IEEE Computer Society
Souvannavong F, Merialdo B, Huet B (2004) Latent semantic analysis for an effective region-based video shot retrieval system. In: MIR ’04: Proceedings of the 6th ACM SIGMM international workshop on multimedia information retrieval, pp 243–250
Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans Multimedia 8:716–727. IEEE Computer Society
Zhang J, Ye L (2009) Content based image retrieval using unclean positive examples. IEEE Trans Image Process 18(10):2370–2375
Zhang RF, Zhang ZFM (2004) Hidden semantic concept discovery in region based image retrieval. In: CVPR04, pp 996–1001
Acknowledgements
This work was supported in part by the IST Programme of the European Community under the PASCAL Network of Excellence and under the CLASS project, and by the Academy of Finland under projects VISCI and HPE, and by the Finnish Funding Agency for Technology and Innovation under the project MIFSAS.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Perkiö, J., Tuominen, A., Vähäkangas, T. et al. Image similarity: from syntax to weak semantics. Multimed Tools Appl 57, 5–27 (2012). https://doi.org/10.1007/s11042-010-0562-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0562-7