Perceptual uniform descriptor and ranking on manifold for image retrieval
Introduction
Feature extraction and ranking are two important topics in Content Based Image Retrieval (CBIR).It is well known that image representation plays an important part in CBIR systems [5], [35], and thus the performance of these systems depends mainly on the discrimination and effectiveness of features. The process of feature extracted can be divided into three steps: 1) image preprocessing; 2) detection of discriminative image regions; 3) feature statistical strategy in these regions. Most of the works concentrated on one or more steps to improve their descriptors.
First, in order to describe certain properties of natural images which may contain various types of image noise, image preprocessing is an indispensable step. Many image denoising and image sharpening algorithms have been presented to reduce the effect of noise on image content and strengthen discriminative information in some regions.
Second, discriminative image regions are detected. Based on that, the descriptors can be classified into global-based and local-based. Color Histogram (CH) [23], Local Binary Patterns (LBP) [25], [26] and Histogram of Gradient (HOG) [4], which describe the color, texture and edge features respectively, are provided based on the global image regions. Motivated by the visual perception mechanisms for image retrieval, Liu et al. provided micro-structure descriptor (MSD) [21] which defined the micro-structures through the similarity of edge orientation and the underlying colors, and introduced structure element correlation statistics to characterize the spatial correlation among them (Color difference histogram (CDH) [22] is another version.). On the contrary, local-based descriptors focus on describing local regions which contain certain information. Lowe et al. [15] introduced a local descriptor called scale-invariant feature transform (SIFT), which aimed at detecting and describing some local neighborhoods around key points in scale space. HMAX model [32] based on the hierarchical visual processing in the primary visual cortex (V1) utilized Gabor filters in different scales and orientations in S1 unit [29]. More details about performance comparisons among other local descriptors are presented in [24].
Finally, corresponding feature statistics methods in these regions are provided. As one of the most common methods, the histogram-based strategy has been applied in many descriptors, such as CH, LBP and HOG. Moreover, color moment [36], color correlogram [13] and color coherence vector [28] were proposed to emphasize the spatial relationship of feature elements.
Besides image feature extraction methods, image Ranking methods have also been rapidly developed in CBIR. Lots of researches have been devoted to improving the ranking results, such as L1-norm [1], Euclidean distance [18], Hamming distance [7] etc. Previous research has showed that ranking by L1-norm is simple and can obtain a better result than that by Euclidean distance [27], [37]. In addition, the graph based ranking methods, such as PageRank [31] and manifold ranking [10], are also widely used for image retrieval. The manifold ranking is proposed based on manifold learning and relates to perception.
In most image retrieval schemes, image feature extraction and ranking are two independent processes. This likely accounts for the incompatibility between descriptor and ranking method (for example, an image representation which is compatible with L1-norm ranking, may not obtain expectant results while using manifold ranking methods, see Section 7).
In computer vision, we hope the computer to imitate human’s perception for learning image and other visual data [16]. In the process of human cognition, visual uniformity is beneficial to learn image, and has been used for the extraction of the image features [14]. Visual uniformity is consistent with human perception of the image. Thus, we point out that the image features extraction by visual uniformity is more likely to distribute on the manifold. In 2000, three types of research related to manifold learning were published in “Science” [11], [33], [38], in which Lee [33] points out that “human perception is in the way of manifold” (This phenomenon is illustrated in Section 2). In this paper, we construct the image feature and ranking model based on the manifold, which aims to realize the uniformity in CBIR.
In this paper, according to visual organization principle and the theory in “The manifold way of perception”, we use human’s visual perception to construct the image visual feature, and retrieve images via manifold ranking. The main contributions of this paper are stated as follows:
(1) Perceptual Uniform Descriptor (PUD) is proposed by using the visual principle of Gestalt psychology, so that it can better distribute on a manifold.
(2) The incompatible problem between image descriptors and ranking methods is analyzed. The concept of a manifold is involved as a bridge for descriptors and ranking methods in CBIR.
The rest of the paper is organized as follows: Section 2 states the motivation of our proposed image retrieval scheme. Principles of Gestalt psychology are introduced in Section 3. Sections 4 and 5 present our image descriptor. Sections 6 and 7 refer to manifold ranking for image retrieval. In Section 8, experimental results and analysis are reported. Section 9 concludes the paper.
Section snippets
Motivation
The human visual system can pinpoint and analyze objects in complex images in a very short time. The main aims of many studies related to human brain visual mechanism and cognitive psychology are to simulate vision systems that have the equal performance to humans in object recognition. According to the analysis that the image variability can actually be considered as a manifold embedded in the image space, Seung and Lee [34] introduced the idea that human visual perception can be expressed by
Principles of Gestalt psychology
Gestalt psychology [17], which is designed based on the understanding of human visual perceptions, allows visually similar objects to be grouped into unity. And this idea implies that “the whole is greater than the sum of the parts”. The principles of Gestalt psychology are highly relevant to the perception of the world, and can be applied to help design visual communication models. This paper focuses on three main principles in Gestalt psychology, namely proximity, similarity and good
Perceptually uniform regions
In this paper, perceptually uniform regions are defined as local image regions where pixels have a similar property with their neighbors. According to the law of proximity and similarity in the Gestalt Laws of perception, pixels that are close to each other or have the similar property are more likely to be grouped into unity. The comparison and detection can be processed in the patterns with fixed size. As Julesz’s textons theory claimed, the image can be seen as the formation of regular
Perceptually uniform feature representation
Image can be regarded as a collection of pixels. Spatial structure and contrast are important and orthogonal features where the spatial structure is the correlation among pixels and contrast represents the difference of pixels. Perceptually uniform color difference have shown superior performance in CBIR [22]. The Euclidean distance between two pixels in color space measures the degree of visual perceptual difference. Even though neighboring pixels may have identical quantized color and
Manifold ranking (MR)
Manifold Ranking (MR) is a transductive ranking method which outperforms inductive ones in most cases in CBIR. The notation and the ranking process of MR can be described in details as follows.
Given a set of features . Assuming the q-th image is the query. Let is a map (metric) for each pair Hi and Hj, where d(Hi, Hj) is the distance between Hi and Hj. We denote as the ranking results, where the ranking score fi corresponds to image Hi. The
The compatibility between PUD and manifold
In coil100 dataset (see details in Section 8), we employ locally linear embedding (LLE) [30], local tangent space alignment (LTSA) [43] and maximal similarity embedding (MSE) [8] dimensionality reduction methods to give visualizations of LBP, MSD, CDH, HSV histogram and PUD on 2-dimensional space, with neighborhood parameter as shown in Fig. 8. It can be seen from Fig. 8(a)–(e) that a toy cat is captured by rotating from 0° to 360°. LBP, MSD and HSV all fail to recover manifold structure
Experimental results
Extensive experiments are conducted to test and illustrate the effectiveness of our proposed scheme. In the experiments, we mainly compare our image descriptor with local binary patterns (LBP) [26], micro-structure descriptor (MSD) [21], color difference histogram (CDH) [22] and HSV color histogram [45]. In ranking step, L1-norm (L1), L2-norm (L2), a manifold ranking based on L1-norm (MR1) and based on L2-norm (MR2) are involved. Some previous works related [42] to our scheme are also
Conclusion
In this paper, an effective holistic image feature extraction method is proposed based on Gestalt psychology, namely Perceptual Uniform Descriptor. By manifold learning method and visualization, we proved that our descriptor is more suitable to use manifold ranking than other descriptors mentioned in this paper. Furthermore, the experimental results show that the combination of PUD and manifold ranking is effective for image retrieval in most cases. However, in few cases, the L1-norm ranking
Acknowledgment
This work was supported by National Natural Science Foundation of P.R. China (61370200, 61210009, 61602082, 61672130) and the Open Program of State Key Laboratory of Software Architecture (Item number SKLSAOP1701)
References (46)
A note on the gradient of a multi-image
Comput. Vis. Graphics Image Process.
(1986)- et al.
Maximal similarity embedding
Neurocomputing
(2013) - et al.
Image indexing using the color and bit pattern feature fusion
J. Vis. Commun. Image Represent.
(2013) - et al.
Perceptual color descriptor based on spatial distribution: a top-down approach
Image Vis. Comput.
(2010) - et al.
A smart content-based image retrieval system based on color and texture feature
Image Vis. Comput.
(2009) - et al.
Content-based image retrieval using computational visual attention model
Pattern Recognit.
(2015) - et al.
Image retrieval based on micro-structure descriptor
Pattern Recognit.
(2011) - et al.
Content-based image retrieval using color difference histogram[j]
Pattern Recognit.
(2013) - et al.
Fusion framework for effective color image retrieval
J. Vis. Commun. Image Represent.
(2014) - et al.
Distance measures for color image retrieval
Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on. IEEE
(1998)
Neural codes for image retrieval[m]
Computer Vision - ECCV 2014
Content-based image retrieval using rotation-invariant histograms of oriented gradients
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
Histograms of oriented gradients for human detection
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005
Content-based image retrieval: approaches and trends of the new age
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Evaluation of gist descriptors for web-scale image search
Proceedings of the ACM International Conference on Image and Video Retrieval
Manifold-ranking based image retrieval
Proceedings of the 12th annual ACM international conference on Multimedia
Reducing the dimensionality of data with neural networks
Science
Improving bag-of-features for large scale image search
IJCV
Image indexing using color correlograms
1997 IEEE Computer Society Conference on. IEEE
Integrating visual saliency and consistency for re-ranking image search results
Multimedia, IEEE Trans.
Distinctive image features from scale-invariant keypoints[j]
Int. J. Comput. Vis.
Principles of Gestalt Psychology
Comparison of similarity metrics for texture image retrieval
TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region
Cited by (27)
Adaptively and spatially constrained dual-level trimap generation from sparse inputs
2021, Information SciencesDeep-seated features histogram: A novel image retrieval method
2021, Pattern RecognitionCitation Excerpt :In this subsection, we focus on image representation for image retrieval using low-level and deep features that can be extracted via deep learning techniques. Image retrieval based on global features usually utilizes color, texture, shape, and spatial layout features to represent image content [2,3,7-26], which gives the representation good discriminative power for color, texture, edge, and spatial layout features. Various algorithms have confirmed that a combination of multiple visual features or improved classical methods can improve discriminative power [2,3,9-26].
Robust trimap generation based on manifold ranking
2020, Information SciencesHesGCN: Hessian graph convolutional networks for semi-supervised classification
2020, Information SciencesCitation Excerpt :Manifold assumption can be easily applied to the GSSL algorithms. In the past few decades, many manifold assumption-based SSL methods have been successfully applied to the computer vision [13–16] and machine learning areas [17–20]. A large number of manifold assumption-based SSL (MSSL) algorithms have been proposed and have been divided into two parts, i.e. methods based on the graph regularization or graph embedding.
An optimized unsupervised manifold learning algorithm for manycore architectures
2019, Information SciencesCitation Excerpt :Diffusion process [18,50], graph-based learning methods [47], and iterative re-ranking techniques [36,37] are some approaches proposed in the literature. The intrinsic structure of the datasets has been also exploited by other recent manifold learning methods [2,25,33,34], yielding significant effectiveness gains in different scenarios. In spite of the importance of unsupervised distance learning methods for improving the effectiveness of content-based multimedia retrieval systems, little attention has been given to efficiency and scalability issues [31].
A new approach for query expansion using Wikipedia and WordNet
2019, Information SciencesCitation Excerpt :For example, terms ‘buy’ and ‘purchase’ have the same meaning; however, only one of these can be present in the documents’ index while the other one can be the user’s query term. This makes it difficult to retrieve the information actually wanted by the user [21,28]. An effective strategy to fill this gap is to use the Query Expansion (QE) technique, which enhances the retrieval effectiveness by adding expansion terms to the initial query.