Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues

Mojsilović, Aleksandra; Gomes, José; Rogowitz, Bernice

doi:10.1023/B:VISI.0000004833.39906.33

Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues

Published: January 2004

Volume 56, pages 79–107, (2004)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Aleksandra Mojsilović¹,
José Gomes¹ &
Bernice Rogowitz¹

289 Accesses
51 Citations
Explore all metrics

Abstract

Abstract image semantics resists all forms of modeling, very much like any kind of intelligence does. However, in order to develop more satisfying image navigation systems, we need tools to construct a semantic bridge between the user and the database. In this paper we present an image indexing scheme and a query language, which allow the user to introduce cognitive dimension to the search. At an abstract level, this approach consists of: (1) learning the “natural language” that humans speak to communicate their semantic experience of images, (2) understanding the relationships between this language and objective measurable image attributes, and then (3) developing corresponding feature extraction schemes.

More precisely, we have conducted a number of subjective experiments in which we asked human subjects to group images, and then explain verbally why they did so. The results of this study indicated that a part of the abstraction involved in image interpretation is often driven by semantic categories, which can be broken into more tangible semantic entities, i.e. objective semantic indicators. By analyzing our experimental data, we have identified some candidate semantic categories (i.e. portraits, people, crowds, cityscapes, landscapes, etc.) and their underlying semantic indicators (i.e. skin, sky, water, object, etc.). These experiments also helped us derive important low-level image descriptors, accounting for our perception of these indicators.

We have then used these findings to develop an image feature extraction and indexing scheme. In particular, our feature set has been carefully designed to match the way humans communicate image meaning. This led us to the development of a “semantic-friendly” query language for browsing and searching diverse collections of images.

We have implemented our approach into an Internet search engine, and tested it on a large number of images. The results we obtained are very promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Ranjay Krishna, Yuke Zhu, … Li Fei-Fei

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Graph neural networks in vision-language image understanding: a survey

Article Open access 29 March 2024

Henry Senior, Gregory Slabaugh, … Luca Rossi

References

Ambrosio, L. and Soner, H.M. 1996. Level set approach to mean curvature flowin arbitrary codimension. J. of Diff. Geom., 43:693–737.
Google Scholar
Bach, J.R., Paul, S., and Jain, R. 1993. A visual information management system for the interactive retrieval of faces. KnowData, 5(4):619–628.
Google Scholar
Belpaeme, T. 2002. Factors Influencing the Origins of Colour Categories, Ph.D Thesis, Vrije Universiteit Brussel.
Berlin, B. and Kay, P. 1969. Basic Color Terms: Their Universality and Evolution. University of California: Berkeley.
Google Scholar
Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press: Oxford.
Google Scholar
Colombo, C., Del Bimbo, A., and Pala, P. 1999. Semantics in visual information retrieval. IEEE Multimedia, 6(3).
Comaniciu, D. and Meer, P. 1999. Mean shift analysis and applications. IEEE Int. Conf. Comp. Vis., Greece, pp. 1197–1203.
Corridoni, J.M., Del Bimbo, A., and Pala, P. 1999. Image retrieval by color semantics. ACM Multimedia Systems, 7(7).
Fleck, M., Forsyth, D.A., and Bregler, C. 1996. Finding naked people. Proc. European Conf. Computer Vision, 2:593–602.
Google Scholar
Forsyth, D. and Fleck, M. 1997. Body plans. In Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, San Juan, pp. 678–683.
Garcia, C. and Tziritas, G. 1999. Face detection using quantized skin color regions merging and wavelet packet analysis. IEEE Transactions on Multimedia, 1(3):264–277.
Google Scholar
Gomes, J. and Mojsilovic, A. 2002. Variational approach to recovering a manifold from sample points. In Proc. of the Seventh European Conference on Computer Vision (to appear).
Grayson, M. 1987. The heat equation shrinks embedded plane curves to round points. J. Differential Geom., 26:285–314.
Google Scholar
Hahn, G. and Shapiro, S. 1967. Statistical Models in Engineering. John Wiley & Sons.
Hays, S., Statistics, W., and Holt, Y. 1981. Statistics. Holt, Rinehart and Winston: New York.
Google Scholar
Jain, A.K. and Vailaya, A. 1998. Shape–Based Retrieval: A case study with trademark images. Pattern Recognition, 31(9):1369–1390.
Google Scholar
Kachigan, S. 1986. Statistical Analysis. Radius Press.
Kandel, E.R., Schwartz, J.H., and Jessel, T.M. 1991. Principles of Neural Science. Appleton and Lange: New York.
Google Scholar
Kelly, K. and Judd, D. 1955. The ISCC-NBS color names dictionary and the universal color language (The ISCC-NBS method of designating colors and a dictionary of color names). NBS Circular 553.
Koenderinck, J. 1984. The structure of images. Biol. Cybern., 50:363–370.
Google Scholar
Lamberson, L.R., Gruska, G.F., and Mirkhani, K. 1967. Non-Normal data Analysis. Multiface Publishing: Garden City, MI.
Google Scholar
Lammens, J.M. 1994. A computational model of color perception and color naming. Ph.D. Thesis, Univ. of Buffalo.
Li, Y., Tao, B., Kei, S., and Wolf, W. 1997. Semantic image retrieval through human subject segmentation and characterization. In Proc. SPIE Storage and Retrieval for Image and Video Databases, 3022:340–351.
Google Scholar
Lorigo, L., Faugeras, O., Grimson, W.E.L., Keriven, R., Kikinis, R., and Westin, C.-F. 1999. Co-dimension 2 geodesic active contours for MRI segmentation. Proc. of the International Conference on Information Processing in Medical Imaging, pp. 126–139.
Ma, W.Y. and Manjunath, B.S. 1997. Netra: A toolbox for navigating large image databases. In Proc. IEEE Int. Conf. on Image Processing, vol. I, pp. 568–571.
Google Scholar
Maerz, A. and Paul, M.R. 1930. A Dictionary of Color. McGraw-Hill Book Co., Inc.: New York, NY.
Google Scholar
Marr, D. 1982. Vision. W.H. Freeman and Co.: San Francisco.
Google Scholar
Mojsilovic, A. 2002. A method for color naming and description of color composition in images. In Proc. IEEE Int. Conf. Image Processing, Rochester: New York.
Google Scholar
Mojsilovic, A. and Rogowitz, B. 2001. Capturing image semantics with low-level descriptors. In Proc. IEEE International Conference on Image Processing, ICIP 2001. Thessaloniki, Greece.
Google Scholar
Mojsilovic, A., Kovacevic, J., Hu, J., Safranek, R.J., and Ganapathy, K. 2000a. Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE Trans. on Image Proc., 9(1):38–54.
Google Scholar
Mojsilovic, A., Kovacevic, J., Kall, D., Safranek, R.J., and Ganapathy, K. 2000b. Vocabulary and grammar of color patterns. IEEE Trans. on Image Processing, 9(3):417–431.
Google Scholar
Munsell, H. 1946. A Color Notation. Munsell Color Company. Baltimore MD.
Google Scholar
Niblack, W., Berber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1994. The QBIC project: Quering images by content using color, texture and shape. In Proc. SPIE Storage and Retrieval for Image and Video Data Bases, pp. 172–187.
Palmer, S.E. 1999. Vision Science: Photons to phenomenology. MIT Press.
Pentland, A., Picard, R.W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233–254.
Google Scholar
Perona, P. and Malik, J. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):629–639.
Google Scholar
Rabenhorst, D. Opal: Users manual, IBM Research Internal Document. (See also www.research.ibm.com/visualanalysis/ Opal.html)
Rogowitz, B., Frese, T., Smith, J., Bouman, C.A., and Kalin, E. 1997. Perceptual image similarity experiments. In Proc. of SPIE.
Rui, Y., Huang, T.S., and Mehrotra, S. 1997. Content-based image retrieval with relevance feed-back in Mars. In Proc. IEEE Conf. on Image Processing, vol. II, pp. 815–818.
Google Scholar
Santini, S. 2001a. Mixed media search in image databases using text and visual similarity. In ICME 2001, IEEE International Conference on Multimedia and Expo. Tokyo, Japan.
Santini, S. 2001b. A query paradigm to discover the relation between text and images. In Proceedings of SPIE, vol. 4315, Storage and Retrieval for Media Databases 2001, San Jose.
Schroeder, W., Martin, K., and Lorensen, B. 1996. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Prentice Hall.
Sethian, J.A. 1996. Level Set Methods. Cambridge University Press.
Smeulders, A.W.M., Worring, M., Santini, S., Gupta A., and Jain, R. 2000. Content-based image retrieval: The end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1349–1380.
Google Scholar
Smith, J.R. and Chang, S. 1996. VisualSeek: A fully automated content-based query system. In Proc.ACMMultimedia, pp. 87–98.
Sobottka, K. and Pitas, I. 1996. Extraction of facial regions and features using color and shape information. In International Conference on Pattern Recognition (ICPR'96), Vienna, Austria, vol. III, pp. C421–C425.
Google Scholar
StatSoft Inc., Electronic Statistics Textbook, vol. http://www. statsoftinc.com/ textbook/ stathome.html.Tulsa, StatSoft, 2001.
Szummer, M. and Picard, R. 1998. Indoor-outdoor classification. In Proc. Int.Workshop on Content-Based Access of Image and Video Databases. Bombay, India, pp. 42–51.
Tominaga, S. 1985. A colour-naming method for computer color vision. In Proc. of the IEEE Int. Conf. on Cybernetics and Society, Tucson, Arizona, pp. 573–577.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. J. Cogn. Neurosci., 3:71–86.
Google Scholar
Vailaya, A., Jain, A., and Zhang, H.J. 1988. On image classification: City versus landscape. In Proc. IEEE Workshop Content-based Access of Image and Video Libraries, pp. 3–8.
Wang, J.Z., Li J., and Wiederhold, G. 2001. SIMPLIcity: Semanticssensitive integrated matching for picture libraries. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(9).
Wang, J.Z., Li, J., Wiederhold, G., and Firschein, O. 1998. System for screening objectionable images. Computer Comm., 21(15):1355–1360.
Google Scholar
Wertheimer, M. 1958. Untersuchungen zur Lehe von der Gestalt II. Psychol. Forsch., 4, 1923. Translated as Principles of perceptual organization. In Readings in Perception, D. Beardslee and M. Wertheimer (Eds.). Van Nostrand: Princeton, N.J., pp. 115–135.
Google Scholar
Witkin, A.P. and Tenenbaum, J.M. 1983. On the role of structure in vision. In Human and Machine Vision, Beck, Hope & Rosenfeld (Eds.). Academic Press: New York, pp. 481–543.
Google Scholar
Wyszecki, G. and Stiles, W. 1982. Color science: Concepts and Methods, Quantitative Data and Formulae, John Wiley & Sons, New York.
Google Scholar
Zhu, W. and Syeda-Mahmood, T. 1998. Image organization and retrieval using a flexible shape model. In Proc. Int. Workshop on Content Based Access of Image and Video Databases. Bombay, India, pp. 31–39.
Zucker, S. 1983. Computational and psychophysical experiments in grouping: Early orientation selection. In Human and Machine Vision, Beck, Hope and Rosenfeld (Eds.). Academic Press: New York, pp. 545–567.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532, USA
Aleksandra Mojsilović, José Gomes & Bernice Rogowitz

Authors

Aleksandra Mojsilović
View author publications
You can also search for this author in PubMed Google Scholar
José Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Bernice Rogowitz
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mojsilović, A., Gomes, J. & Rogowitz, B. Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues. International Journal of Computer Vision 56, 79–107 (2004). https://doi.org/10.1023/B:VISI.0000004833.39906.33

Download citation

Issue Date: January 2004
DOI: https://doi.org/10.1023/B:VISI.0000004833.39906.33

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Graph neural networks in vision-language image understanding: a survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Graph neural networks in vision-language image understanding: a survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation