Skip to main content
Log in

Image similarity: from syntax to weak semantics

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Measuring image similarity is an important task for various multimedia applications. Similarity can be defined at two levels: at the syntactic (lower, context-free) level and at the semantic (higher, contextual) level. As long as one deals with the syntactic level, defining and measuring similarity is a relatively straightforward task, but as soon as one starts dealing with the semantic similarity, the task becomes very difficult. We examine the use of simple readily available syntactic image features combined with other multimodal features to derive a similarity measure that captures the weak semantics of an image. The weak semantics can be seen as an intermediate step between low level image understanding and full semantic image understanding. We investigate the use of single modalities alone and see how the combination of modalities affect the similarity measures. We also test the measure on multimedia retrieval task on a tv series data, even though the motivation is in understanding how different modalities relate to each other.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In this paper we use both the concept of image similarity and the concept of distance between images quite freely. The relation between the two concepts is an inverse relation, however. The higher the similarity the smaller the distance and vice versa.

  2. http://images.google.com

  3. http://images.search.yahoo.com/images

  4. ISO/IEC 13818-2:2000—information technology—generic coding of moving pictures and associated audio information: video.

  5. Called slices in the standard

  6. The standard specifies different kinds of blocks, but since we are considering only I-pictures, all blocks are so called intra blocks.

  7. DC-coefficients are the zero frequency coefficients for the DCT and AC-coefficients the rest of the coefficients.

  8. We limit ourselves to the use of visual words as the basic syntactic feature for images.

  9. Feature counts are always positive or zero.

  10. The query features can be both visual features or textual features. In the experiments of this paper we use only textual query features, however.

  11. For our purposes we would need a multimedia data that contains video and text. The TREC video track data is not usable for us, since e.g. in the TRECVID ASR corpora the amount of text is too low for meaningful textual modeling. Even with the additional text that we currently use, the amount of text is at the lower side.

References

  1. Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2009) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629

    Article  Google Scholar 

  2. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022. MIT Press

    MATH  Google Scholar 

  3. Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117

    Article  Google Scholar 

  4. Buntine W, Jakulin A (2006) Discrete component analysis, subspace, latent structure and feature selection techniques, pp 1–33

  5. Chen W, Liu C, Lander K, Fu X (2009) Comparison of human face matching behavior and computational image similarity measure. Science China Information Sciences 52(2):316–321

    Article  MATH  Google Scholar 

  6. Csillaghy A, Hinterberger H, Benz AO (2000) Content-based image retrieval in astronomy. In: Information retrieval, vol 3(3). Kluwer Academic Publishers, pp 229–241

  7. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV international workshop on statistical learning in computer vision, pp 1–22

  8. Durán ML, Rodríguez PG, Arias-Nicolás JP, Martín J, Disdier C (2009) A perceptual similarity method by pairwise comparison in a medical image case. Mach Vis Appl. doi:10.1007/s00138-009-0201-3

    Google Scholar 

  9. Felipe JC, Traina Jr C, Machado Traina AJ (2009) A new family of distance functions for perceptual similarity retrieval of medical images. J Digit Imaging 22(2):183–201

    Article  Google Scholar 

  10. Fu KS (1974) Syntactic methods in pattern recognition. Academic, NY

    MATH  Google Scholar 

  11. Gile N, Wang N, Nathalie C, Siewe F, Lin X, Xu D (2008) A case study of image retrieval on lung cancer chest X-ray pictures. In: 9th international conference on signal processing 2008 (ICSP 2008), pp 924–927

  12. Grigorova A, De Natale F, Dagli C, Huang T (2007) Content-based image retrieval by feature adaptation and relevance feedback. IEEE Trans Multimedia 9:1183–1192

    Article  Google Scholar 

  13. Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 50–57

  14. Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley Interscience

  15. Jing Y, Baluja S (2008) PageRank for product image search. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp 307–315

  16. Kak A, Pavlopoulou C (2002) Content-based image retrieval from large medical databases. In: First international symposium on 3D data processing visualization and transmission 2002, pp 138–147

  17. Li M, Chen X, Li X, Ma B, Vitányi P (2004) The similarity metric. IEEE Trans Inf Theory 50:3250–3264

    Article  Google Scholar 

  18. Lin W, Jin R, Hauptmann A (2003) Web image retrieval re-ranking with relevance model. In: IEEE/WIC/ACM international conference on Web intelligence

  19. Lu SY, Fu KS (1978) A syntactic approach to texture analysis. CGIP 7:303–330

    Google Scholar 

  20. Marchand-Maillet S, Worring M (2006) Benchmarking image and video retrieval: an overview. In: MIR ’06: Proceedings of the 8th ACM international workshop on multimedia information retrieval. Santa Barbara, CA, USA, pp 297–300

    Chapter  Google Scholar 

  21. McDonald K, Smeaton AF (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: 4th international conference on image and video retrieval (CIVR), pp 61–70

  22. Perkiö J, Hyvärinen A (2009) Modelling image complexity by independent component analysis, with application to content-based image retrieval. In: ICANN ’09: Proceedings of the 19th international conference on artificial neural networks, pp 704–714

  23. Perkiö J, Tuominen A, Myllymäki P (2009) Image similarity: from syntax to weak semantics using multimodal features with application to multimedia retrieval. In: International conference on multimedia information networking and security, pp 213–219

  24. Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137

    Article  Google Scholar 

  25. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, pp 1470–1477

  26. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380. IEEE Computer Society

    Article  Google Scholar 

  27. Souvannavong F, Merialdo B, Huet B (2004) Latent semantic analysis for an effective region-based video shot retrieval system. In: MIR ’04: Proceedings of the 6th ACM SIGMM international workshop on multimedia information retrieval, pp 243–250

  28. Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans Multimedia 8:716–727. IEEE Computer Society

    Article  Google Scholar 

  29. Zhang J, Ye L (2009) Content based image retrieval using unclean positive examples. IEEE Trans Image Process 18(10):2370–2375

    Article  MathSciNet  Google Scholar 

  30. Zhang RF, Zhang ZFM (2004) Hidden semantic concept discovery in region based image retrieval. In: CVPR04, pp 996–1001

Download references

Acknowledgements

This work was supported in part by the IST Programme of the European Community under the PASCAL Network of Excellence and under the CLASS project, and by the Academy of Finland under projects VISCI and HPE, and by the Finnish Funding Agency for Technology and Innovation under the project MIFSAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jukka Perkiö.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Perkiö, J., Tuominen, A., Vähäkangas, T. et al. Image similarity: from syntax to weak semantics. Multimed Tools Appl 57, 5–27 (2012). https://doi.org/10.1007/s11042-010-0562-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0562-7

Keywords

Navigation