Elsevier

Information Sciences

Volume 424, January 2018, Pages 235-249
Information Sciences

Perceptual uniform descriptor and ranking on manifold for image retrieval

https://doi.org/10.1016/j.ins.2017.10.010Get rights and content

Abstract

Incompatibility of image descriptor and ranking has been often neglected in image retrieval. In this paper, Manifold Learning and Gestalt Psychology Theory are involved to solve the problem of incompatibility. A new holistic descriptor called Perceptual Uniform Descriptor (PUD) based on Gestalt psychology is proposed, which combines color and gradient direction to imitate human visual uniformity. PUD features in the same class images distributes on one manifold in most cases, as PUD improves the visual uniformity of the traditional descriptors. Thus, we use manifold ranking and PUD to realize image retrieval. Experiments were carried out on four benchmark data sets, and the proposed method is shown to greatly improve the accuracy of image retrieval. Our experimental results in Ukbench and Corel-1K datasets demonstrate that N-S score reached 3.58 (HSV 3.4) and mAP at 81.77% (ODBTC 77.9%) respectively by utilizing PUD which has only 280 dimensions. The results are higher than other holistic image descriptors including local ones as well as state-of-the-arts retrieval methods.

Introduction

Feature extraction and ranking are two important topics in Content Based Image Retrieval (CBIR).It is well known that image representation plays an important part in CBIR systems [5], [35], and thus the performance of these systems depends mainly on the discrimination and effectiveness of features. The process of feature extracted can be divided into three steps: 1) image preprocessing; 2) detection of discriminative image regions; 3) feature statistical strategy in these regions. Most of the works concentrated on one or more steps to improve their descriptors.

First, in order to describe certain properties of natural images which may contain various types of image noise, image preprocessing is an indispensable step. Many image denoising and image sharpening algorithms have been presented to reduce the effect of noise on image content and strengthen discriminative information in some regions.

Second, discriminative image regions are detected. Based on that, the descriptors can be classified into global-based and local-based. Color Histogram (CH) [23], Local Binary Patterns (LBP) [25], [26] and Histogram of Gradient (HOG) [4], which describe the color, texture and edge features respectively, are provided based on the global image regions. Motivated by the visual perception mechanisms for image retrieval, Liu et al. provided micro-structure descriptor (MSD) [21] which defined the micro-structures through the similarity of edge orientation and the underlying colors, and introduced structure element correlation statistics to characterize the spatial correlation among them (Color difference histogram (CDH) [22] is another version.). On the contrary, local-based descriptors focus on describing local regions which contain certain information. Lowe et al. [15] introduced a local descriptor called scale-invariant feature transform (SIFT), which aimed at detecting and describing some local neighborhoods around key points in scale space. HMAX model [32] based on the hierarchical visual processing in the primary visual cortex (V1) utilized Gabor filters in different scales and orientations in S1 unit [29]. More details about performance comparisons among other local descriptors are presented in [24].

Finally, corresponding feature statistics methods in these regions are provided. As one of the most common methods, the histogram-based strategy has been applied in many descriptors, such as CH, LBP and HOG. Moreover, color moment [36], color correlogram [13] and color coherence vector [28] were proposed to emphasize the spatial relationship of feature elements.

Besides image feature extraction methods, image Ranking methods have also been rapidly developed in CBIR. Lots of researches have been devoted to improving the ranking results, such as L1-norm [1], Euclidean distance [18], Hamming distance [7] etc. Previous research has showed that ranking by L1-norm is simple and can obtain a better result than that by Euclidean distance [27], [37]. In addition, the graph based ranking methods, such as PageRank [31] and manifold ranking [10], are also widely used for image retrieval. The manifold ranking is proposed based on manifold learning and relates to perception.

In most image retrieval schemes, image feature extraction and ranking are two independent processes. This likely accounts for the incompatibility between descriptor and ranking method (for example, an image representation which is compatible with L1-norm ranking, may not obtain expectant results while using manifold ranking methods, see Section 7).

In computer vision, we hope the computer to imitate human’s perception for learning image and other visual data [16]. In the process of human cognition, visual uniformity is beneficial to learn image, and has been used for the extraction of the image features [14]. Visual uniformity is consistent with human perception of the image. Thus, we point out that the image features extraction by visual uniformity is more likely to distribute on the manifold. In 2000, three types of research related to manifold learning were published in “Science” [11], [33], [38], in which Lee [33] points out that “human perception is in the way of manifold” (This phenomenon is illustrated in Section 2). In this paper, we construct the image feature and ranking model based on the manifold, which aims to realize the uniformity in CBIR.

In this paper, according to visual organization principle and the theory in “The manifold way of perception”, we use human’s visual perception to construct the image visual feature, and retrieve images via manifold ranking. The main contributions of this paper are stated as follows:

(1) Perceptual Uniform Descriptor (PUD) is proposed by using the visual principle of Gestalt psychology, so that it can better distribute on a manifold.

(2) The incompatible problem between image descriptors and ranking methods is analyzed. The concept of a manifold is involved as a bridge for descriptors and ranking methods in CBIR.

The rest of the paper is organized as follows: Section 2 states the motivation of our proposed image retrieval scheme. Principles of Gestalt psychology are introduced in Section 3. Sections 4 and 5 present our image descriptor. Sections 6 and 7 refer to manifold ranking for image retrieval. In Section 8, experimental results and analysis are reported. Section 9 concludes the paper.

Section snippets

Motivation

The human visual system can pinpoint and analyze objects in complex images in a very short time. The main aims of many studies related to human brain visual mechanism and cognitive psychology are to simulate vision systems that have the equal performance to humans in object recognition. According to the analysis that the image variability can actually be considered as a manifold embedded in the image space, Seung and Lee [34] introduced the idea that human visual perception can be expressed by

Principles of Gestalt psychology

Gestalt psychology [17], which is designed based on the understanding of human visual perceptions, allows visually similar objects to be grouped into unity. And this idea implies that “the whole is greater than the sum of the parts”. The principles of Gestalt psychology are highly relevant to the perception of the world, and can be applied to help design visual communication models. This paper focuses on three main principles in Gestalt psychology, namely proximity, similarity and good

Perceptually uniform regions

In this paper, perceptually uniform regions are defined as local image regions where pixels have a similar property with their neighbors. According to the law of proximity and similarity in the Gestalt Laws of perception, pixels that are close to each other or have the similar property are more likely to be grouped into unity. The comparison and detection can be processed in the patterns with fixed size. As Julesz’s textons theory claimed, the image can be seen as the formation of regular

Perceptually uniform feature representation

Image can be regarded as a collection of pixels. Spatial structure and contrast are important and orthogonal features where the spatial structure is the correlation among pixels and contrast represents the difference of pixels. Perceptually uniform color difference have shown superior performance in CBIR [22]. The Euclidean distance between two pixels in color space measures the degree of visual perceptual difference. Even though neighboring pixels may have identical quantized color and

Manifold ranking (MR)

Manifold Ranking (MR) is a transductive ranking method which outperforms inductive ones in most cases in CBIR. The notation and the ranking process of MR can be described in details as follows.

Given a set of features H={H1,H2,,Hn}RM×n. Assuming the q-th image is the query. Let d:H×HR is a map (metric) for each pair Hi and Hj, where d(Hi, Hj) is the distance between Hi and Hj. We denote f=[f1,f2,,fn]TRn as the ranking results, where the ranking score fi corresponds to image Hi. The

The compatibility between PUD and manifold

In coil100 dataset (see details in Section 8), we employ locally linear embedding (LLE) [30], local tangent space alignment (LTSA) [43] and maximal similarity embedding (MSE) [8] dimensionality reduction methods to give visualizations of LBP, MSD, CDH, HSV histogram and PUD on 2-dimensional space, with neighborhood parameter k=6, as shown in Fig. 8. It can be seen from Fig. 8(a)–(e) that a toy cat is captured by rotating from 0° to 360°. LBP, MSD and HSV all fail to recover manifold structure

Experimental results

Extensive experiments are conducted to test and illustrate the effectiveness of our proposed scheme. In the experiments, we mainly compare our image descriptor with local binary patterns (LBP) [26], micro-structure descriptor (MSD) [21], color difference histogram (CDH) [22] and HSV color histogram [45]. In ranking step, L1-norm (L1), L2-norm (L2), a manifold ranking based on L1-norm (MR1) and based on L2-norm (MR2) are involved. Some previous works related [42] to our scheme are also

Conclusion

In this paper, an effective holistic image feature extraction method is proposed based on Gestalt psychology, namely Perceptual Uniform Descriptor. By manifold learning method and visualization, we proved that our descriptor is more suitable to use manifold ranking than other descriptors mentioned in this paper. Furthermore, the experimental results show that the combination of PUD and manifold ranking is effective for image retrieval in most cases. However, in few cases, the L1-norm ranking

Acknowledgment

This work was supported by National Natural Science Foundation of P.R. China (61370200, 61210009, 61602082, 61672130) and the Open Program of State Key Laboratory of Software Architecture (Item number SKLSAOP1701)

References (46)

  • A. Babenko et al.

    Neural codes for image retrieval[m]

    Computer Vision - ECCV 2014

    (2014)
  • J. Chen et al.

    Content-based image retrieval using rotation-invariant histograms of oriented gradients

    Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

    (2015)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005

    (2005)
  • R. Datta et al.

    Content-based image retrieval: approaches and trends of the new age

    Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

    (2005)
  • M. Douze et al.

    Evaluation of gist descriptors for web-scale image search

    Proceedings of the ACM International Conference on Image and Video Retrieval

    (2009)
  • J. He et al.

    Manifold-ranking based image retrieval

    Proceedings of the 12th annual ACM international conference on Multimedia

    (2004)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • H. Jégou et al.

    Improving bag-of-features for large scale image search

    IJCV

    (2010)
  • J. Huang et al.

    Image indexing using color correlograms

    1997 IEEE Computer Society Conference on. IEEE

    (1997)
  • J. Huang et al.

    Integrating visual saliency and consistency for re-ranking image search results

    Multimedia, IEEE Trans.

    (2011)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints[j]

    Int. J. Comput. Vis.

    (2004)
  • K. Koffka

    Principles of Gestalt Psychology

    (2013)
  • M. Kokare et al.

    Comparison of similarity metrics for texture image retrieval

    TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region

    (2003)
  • Cited by (27)

    • Deep-seated features histogram: A novel image retrieval method

      2021, Pattern Recognition
      Citation Excerpt :

      In this subsection, we focus on image representation for image retrieval using low-level and deep features that can be extracted via deep learning techniques. Image retrieval based on global features usually utilizes color, texture, shape, and spatial layout features to represent image content [2,3,7-26], which gives the representation good discriminative power for color, texture, edge, and spatial layout features. Various algorithms have confirmed that a combination of multiple visual features or improved classical methods can improve discriminative power [2,3,9-26].

    • HesGCN: Hessian graph convolutional networks for semi-supervised classification

      2020, Information Sciences
      Citation Excerpt :

      Manifold assumption can be easily applied to the GSSL algorithms. In the past few decades, many manifold assumption-based SSL methods have been successfully applied to the computer vision [13–16] and machine learning areas [17–20]. A large number of manifold assumption-based SSL (MSSL) algorithms have been proposed and have been divided into two parts, i.e. methods based on the graph regularization or graph embedding.

    • An optimized unsupervised manifold learning algorithm for manycore architectures

      2019, Information Sciences
      Citation Excerpt :

      Diffusion process [18,50], graph-based learning methods [47], and iterative re-ranking techniques [36,37] are some approaches proposed in the literature. The intrinsic structure of the datasets has been also exploited by other recent manifold learning methods [2,25,33,34], yielding significant effectiveness gains in different scenarios. In spite of the importance of unsupervised distance learning methods for improving the effectiveness of content-based multimedia retrieval systems, little attention has been given to efficiency and scalability issues [31].

    • A new approach for query expansion using Wikipedia and WordNet

      2019, Information Sciences
      Citation Excerpt :

      For example, terms ‘buy’ and ‘purchase’ have the same meaning; however, only one of these can be present in the documents’ index while the other one can be the user’s query term. This makes it difficult to retrieve the information actually wanted by the user [21,28]. An effective strategy to fill this gap is to use the Query Expansion (QE) technique, which enhances the retrieval effectiveness by adding expansion terms to the initial query.

    View all citing articles on Scopus
    View full text