Color CENTRIST: Embedding color information in scene categorization

https://doi.org/10.1016/j.jvcir.2014.01.013Get rights and content

Highlights

  • A color index scheme is designed to embed color information into the framework of CENTRIST.

  • Performance of the proposed descriptor is evaluated based on various datasets.

  • Extensive performance comparison with original CENTRIST descriptors, local descriptors, and local patterns.

  • An application on object detection is proposed to applying color CENTRIST in different domains.

Abstract

A new color descriptor has now been proposed to embed color information into the framework of CENsus TRansform hISTogram (CENTRIST), so that such a state-of-the-art visual descriptor can be further improved to categorize image scenes. In the proposed color CENTRIST descriptor, global structure characteristics are described by both gradients derived from intensity values and color variations between image pixels. The spatial pyramid scheme has also been adopted to convey information in different scales. Comprehensive studies based on various datasets were conducted to verify the effectiveness of the color CENTRIST from different aspects, including the way to quantize the color space, selection of color space, and categorization performance on various datasets. We demonstrated that the color CENTRIST descriptor was not only easy to implement, but also reliably achieved superior performance over CENTRIST. An application was also proposed to demonstrate the possibility of applying the color CENTRIST in various domains.

Introduction

Scene categorization, or scene recognition, has become a fundamental process for efficient image browsing, retrieval, and organization. For example, if an image’s scene category can be recognized, we would reduce the search space of object recognition, or more accurately, detect semantic concepts present in this image. The results of scene categorization may also help a robot to localize itself in a building. Detecting semantic category of an image is undoubtedly important, and devising good visual descriptors plays the core role in this task.

In the literature, many visual descriptors have been proposed for image scene recognition. They can be roughly divided into two groups: (1) part-based representation, with the consideration of multiple scales or spatial distributions, and (2) holistic representation that directly models global configurations. The former approach describes texture information in image patches, and has been proven to be extremely effective when detecting objects under various conditions. By considering the distribution of local descriptors over image patches, sometimes in a multiscale manner, global information is captured. One of the most popular part-based descriptors is Scale-Invariant Feature Transform (SIFT) [7], and one of the most prominent approaches to consider the global distribution is the spatial pyramid approach [3]. Despite the SIFT descriptors associated with the bag of visual words model [8] which have shown discriminative power on scene categorization, directly modeling global texture information often more reliably describes spatial structure of a scene. The same scene may be taken from various viewpoints, and objects with significantly different appearances may appear in the same type of scene. In contrast with the local texture information, holistic representation, such as GIST [2], captures global structure and achieves high accuracy in natural scene categorization. Recently, CENsus TRansform hISTogram (CENTRIST) [1] was proposed to provide accurate and stable performance on various scene image datasets.

We found that most works were targeted on gray images, and existing visual descriptors mainly relied on oriented gradient calculated based on intensity values. However, we argue that color information also plays an important role, although it would not be as important as intensity, and should not be neglected in scene categorization. Fig. 1 shows an example about how color information is used in distinguishing scene categories. Without color information, these two images have similar structure and are hard to recognize. With color information we realize that the open country image has a blue region on the top half, while the coast image has two distinct blue regions at the top half and the bottom half, respectively. It is clear that considering color information benefits scene categorization.

In this work, we devise a visual descriptor called color CENTRIST to embed color information into the framework of CENTRIST, and demonstrate its effectiveness through evaluating various color image datasets. Through comprehensive evaluation, we verify effectiveness of the color CENTRIST. The main contributions of this work are briefly described as follows, which were also shown in our preliminary work [21].

  • We devise a color index scheme to embed HSV color information into the framework of CENTRIST. Information of three color channels is encapsulated into an 8-bit representation, so that the framework of CENTRIST can be directly employed, and various performance comparisons can be impartially conducted. We verify that different color channels should be allocated different numbers of bits to more accurately characterize image content.

  • Performance of the proposed descriptor is evaluated based on various datasets, including the 8-class scene dataset, the 8-event dataset, the 67-indoor scene dataset, the KTH-IDOL and the KTH-INDECS datasets. Working on various datasets shows robustness and effectiveness of the proposed descriptor.

The unique contributions of this work over our previous work [21] are described as follows.

  • We verify the best multilevel representation of the proposed descriptor by carefully evaluating performances obtained by different levels of descriptors. Moreover, statistical analysis is conducted to show that the performance superiority is statistically significant.

  • We verify that extracting the proposed descriptor from the HSV color space gives stable performance.

  • We verify that combining the proposed descriptor with CENTRIST further yields better performance.

  • We compare performance obtained by the proposed descriptor with that obtained by SIFT, and its color variants, based on the bag of words framework.

  • We compare performance obtained by the proposed descriptor with that obtained by several promising color LBPs.

  • An application on object detection is proposed to demonstrate the possibility of applying the color CENTRIST in different domains.

The rest of this paper is organized as follows. Section 2 provides a literature survey. The color CENTRIST descriptor is proposed after briefly reviewing conventional CENTRIST in Section 3. Preliminary analysis of different descriptor settings is described in Section 4. We provide comprehensive evaluation on various datasets in Section 5, and a novel application based on color CENTRIST in Section 6. Section 7 concludes this paper with discussions of the proposed descriptor and future research.

Section snippets

Related works

In recent years, significant advancement had been made for scene recognition by the computer vision and pattern recognition community. Some studies focused on feature/descriptor design to more reliably describe scene characteristics, while some studies focused on distance metric or recognition scheme to achieve a more accurate classification. Because related literature was rich, we just made a brief survey from the perspective of feature/descriptor design in the following.

CENTRIST

To handle scene categorization, Wu and Rehg described desired properties of appropriate visual descriptors [1], including that holistic representation may be robust, structural properties should be captured, rough geometry of scenes may be helpful, and the descriptor should be general for different scene categories. By considering these, Wu and Rehg proposed a holistic representation modeling distribution of local structures, called CENsus TRansform hISTogram (CENTRIST). Rough geometrical

Analysis of descriptor settings

This section presents how different experiment settings influence scene categorization accuracy. Experiments in this section (and also in Section 5) were conducted in five runs, and the recognition accuracies of five runs are averaged to show the overall performance. At each run, parts of a dataset in each scene category were randomly selected for training, and the remaining is for testing. We call it the five-random-run scheme in the following.

In the following experiments, we remove the two

Performance evaluation

With the experiment settings discovered above, the color CENTRIST descriptor is tested based on four data sets: 8-class scene category [2], 8-class sports event [9], 67-class indoor scene recognition [10], and KTH-IDOL/KTH-INDECS [14], [15]. These datasets include a variety of images with various visual characteristics.

Applications

We have shown that cCENTRIST can be extracted faster than most of the other descriptors. This property is suitable to be applied in studies which need a great number of operations. In this section, we apply cCENTRIST on object detection in a high-resolution image, which is an important step in many research topics. Object detection is conducted based on the panorama of a grocery store produced in [23]. The resolution of the panorama is 14569 × 711 pixels. For convenience, we only use the left

Conclusion

In our presentation we have shown that embedding color information into the CENTRIST framework consistently provides better performance on scene categorization, through comprehensive evaluation studies from various perspectives, including color quantization, color index, color space selection, multilevel representation, and dimension reduction. By appropriately quantizing the HSV color space and with the designed color index scheme, color information is elaborately represented and incorporated

Acknowledgment

The work was partially supported by the National Science Council of Taiwan under the grants NSC101-2221-E-194-055-MY2.

References (42)

  • W. Liu et al.

    Multiview Hessian discriminative sparse coding for image annotation

    Comput. Vis. Image Und.

    (2014)
  • J. Yu et al.

    Pairwise constraints based multiview features fusion for scene classification

    Pattern Recogn.

    (2013)
  • J. Wu et al.

    CENTRIST: a visual descriptor for scene categorization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • A. Oliva et al.

    Modeling the shape of the scene: a holistic representation of the spatial envelope

    Int. J. Comput. Vision

    (2001)
  • S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...
  • R. Zabih et al.

    Non-parametric local transforms for computing visual correspondence

    Proc. Eur. Conf. Comput. Vision

    (1994)
  • L. Fei-Fei, L. Perona, A Bayesian hierarchical model for learning natural scene categories, in: Proceedings of IEEE...
  • J.C. Van Gemert et al.

    Kernel codebooks for scene categorization

    Proc. Eur. Conf. Comput. Vision

    (2008)
  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vision

    (2004)
  • J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: Proceedings of IEEE...
  • L.-J. Li, L. Fei-Fei, What, where and who? Classifying events by scene and object recognition, in: Proceedings of IEEE...
  • A. Quattoni, A. Torralba, Recognizing indoor scenes, in: Proceedings of IEEE Computer Society Conference on Computer...
  • A. Bosch et al.

    Scene classification using a hybrid generative/discriminative approach

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • A. Pronobis, B. Caputo, P. Jensfelt, H.I. Christensen, A discriminative approach to robust visual place recognition,...
  • K.E.A. Van de Sande et al.

    Evaluating color descriptors for object and scene recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • J. Luo, A. Pronobis, B. Caputo, P. Jensfelt, The KTH-IDOL2 database. Technical Report CVAP304, Kungliga Tekniska...
  • A. Pronobis, B. Caputo, The KTH-INDECS database. Technical Report CVAP297, Kungliga Tekniska Hoegskolan, CVAP,...
  • J. Devore

    Probability and Statistics for Engineering and the Sciences

    (2011)
  • D. Song et al.

    Biologically inspired feature manifold for scene classification

    IEEE Trans. Image Process.

    (2010)
  • J. Vogel et al.

    Semantic modeling of natural scenes for content-based image retrieval

    Int. J. Comput. Vision

    (2007)
  • W.-T. Chu, C.-H. Chen, Color CENTRIST: a color descriptor for scene categorization, in: Proceedings of ACM...
  • Cited by (5)

    View full text