Rotation and scale invariant hybrid image descriptor and retrieval

https://doi.org/10.1016/j.compeleceng.2015.04.011Get rights and content

Highlights

Abstract

Accurate image retrieval is required to index and retrieve large number of images from huge databases. In this paper, an efficient approach is presented to encode the color and textural features of images from the local neighborhood of each pixel. The color features are extracted by quantizing the RGB color space into a single channel with reduced number of shades. The texture information is encoded with structuring patterns generated from the locally structured elements chosen as a basis. Color and textural features are fused together to construct the inherently rotation and scale-invariant hybrid image descriptor (RSHD). This fusion is carried out by extracting textural cues over each shade independently. RSHD has been tested on the Corel dataset and experimental results suggest that RSHD outperforms state-of-the-art descriptors. The performance of the RSHD is promising under rotation and scaling. It can also be effectively used under more complex image transformations.

Introduction

The demand for efficient image retrieval is rapidly increasing. In the early days, text based approaches were being used for image retrieval, but since the scope of such methods got reduced upon the existence of content-based image retrieval (CBIR), because retrieving images from its content is more visually accurate. In the published literature, content-based approaches describe more objectively and effectively, than text based approaches [1]. The main aim of CBIR is to facilitate efficient searching, browsing and matching over large datasets either offline or online opens an active research area in the field of computer vision and image processing from more than decades. Some recently reported typical applications of image retrieval are computerized facial diagnosis and retrieving human actions from realistic video databases [2]. The efficiency of the any CBIR system primarily depends upon the discriminating power present in their image feature descriptions. A CBIR system must be able to retrieve the images having details oriented randomly in nature. Recently, semantic based approaches became popular for image retrieval because it copes with the problem of CBIR [3], [4]. To describe the semantic concepts and to be able to get the results close to human perception, relevance feedback algorithms are used by some researchers [5] in semantic image retrieval. They considered the priority provided by the users and bridge the semantic gap, but drawback of such methods is that it is not fully automated and it requires user intervention to provide the feedback. Several methods to encode the information present in the images are proposed through published literature for image retrieval [6], [7], [8], [9], [10], [11]. These methods used pixel level details (i.e. low level feature descriptions) of the image including color, texture, shape, gradient, orientation, etc. in the form of pattern to represent the whole image and images matched with their pixel level details. These features have been used efficiently in each type of CBIR systems such as based on global feature, region based feature, local feature, and structuring feature. The main shortcoming of these methods is with the utilization of different more robust features and lack of multiple type of information because by using just one type of features the method may not be able to characterize the image more accurately.

Global feature-based CBIR [12], [13], [14] does not divide the image into multiple regions. Chen et al. [12] used only color information to represent the image features by using the image color distributions. The color distributions are preserved up to the third moment in their method. The results are better than other color based features, but it should be noted that they have not considered any features other than color while the proposed method extracts structuring patterns over quantized shades distributions which increases the distinctiveness of proposed descriptor. Color difference histogram (CDH) is designed for color image analysis [13]. Color, texture and shape features are used for global image representation effectively by Wang et al. [14]. They have used only dominant set of color information with steerable filter decomposition and pseudo-Zernike moments in their descriptor construction. These methods have not taken care of local neighboring structures required to encode the relationship among the neighboring pixels.

Region based approaches [15], [16] also used by several researchers to integrate the spatial location with the feature description. Hsiao et al. [15] partitioned images into five regions with fixed absolute locations. Similar to the case of semantic retrieval, their approach also needs user intervention in the middle of the retrieval process. On the other hand, proposed method considers only local neighborhoods of a given pixel which boost it with local discriminative power. In order to represent the image’s spatial and color arrangements, Lin et al. [16] introduced three kinds of feature descriptors. To extract these features, they used K-means clustering approach to partition the whole image into different groups (i.e. clusters) using its intensity values. These regions based approaches have shown promising results with the expense of large dimensional descriptions and high computation.

Over the year’s local binary pattern (LBP) [17] and various LBP based approaches [18], [19], [20], [21], [22], [23] have been reported and became more popular because of their highly accurate performance and simplicity. LBP is constructed by comparing the each pixel in the image with its neighboring pixels and according to the sign of comparison the LBP is generated [17]. To reduce the size of LBP, only orthogonal pairs are compared in the center-symmetric local binary pattern (CS-LBP) [18]. Based on the multi-resolution, Zhu and Wang [19] have used multiple local patterns to encode the textural feature. Their method is also promising in the case of rotation and scaling, but lack of color information restricts their use in image retrieval. The complete local binary pattern is used for the fruit disease recognition [20]. To achieve invariance toward monotonic intensity change an order based descriptor local intensity order pattern (LIOP) is proposed [21]. Promising results using LIOP have been reported in the case of a monotonic intensity change, but this type of methods fails if there is a change in the objects of the image because using orders only does not consider actual scenarios. Dubey et al. [22] have extended the concept of LIOP over interleaved neighboring sets and designed the interleaved intensity order based local descriptor (IOLD) for image matching. They also proposed an illumination compensation mechanism to cope with the varying illumination for brightness-invariant image retrieval [23]. LBP and LBP based descriptors can be used efficiently to match the images having some geometric and photometric transformations. Most of these methods have not considered the fine structures of the images which can be seen as a basic building block of the image and the performance of such methods can be boosted effectively using fine structures in their framework.

Images are also represented by different types of structures present in the image [24], [25], [26]. Liu et al. [24] represented the co-occurrence matrix properties using the histogram to compound the advantages of histogram with co-occurrence matrix and proposed a multi-texton histogram (MTH) as a feature descriptor for image retrieval. Liu et al. [25] have introduced microstructure descriptors (MSD). MSD integrates color, texture, shape and spatial layout properties of the image for efficient content-based image retrieval. An efficient Structure Element Histogram (SEH) is presented by Xingyuan and Zongyu [26] which integrates texture with color feature. These structures based methods shown promising results in image retrieval, but their performance degrades under rotation and scaling. In the image retrieval and image classification problems, it is not possible to encode the exact information contained by an image using only one type of features such as color or texture. Therefore, it becomes highly desirable to merge these features in such a way that dimensionality should not increase too much.

Color and texture information are used by Wang et al. [27] to design a CBIR system. They used Zernike chromaticity distribution moments to capture the color features from the opponent chromaticity space which is a rotation and flip-invariant. They also used the Contourlet transform to encode the texture feature which is a rotation and scale-invariant. In [27], the color and texture features are first encoded separately and then combined to form the final feature vector, whereas we formed the descriptor by simultaneous encoding of color and texture in a hybrid manner. The main difference between the approach in [27] and proposed is that the color and texture features are processed independently in [27], whereas we processed the texture feature in conjunction with the color feature. Curvelet transform is also adopted by Youssef and integrated with enhanced dominant colors for texture analysis in CBIR [28]. The HSV color space is quantized to encode the color information. The Curvelet captures accurate texture information as well as directional features by tuning to different orientations, but also increases the dimensionality of the descriptor. Another major difference is the approach in [28] is region based which results in the increased dimension of the descriptor, whereas in our case the dimension is limited and based on the number of structuring element.

To overcome the drawbacks of the above-mentioned descriptors, a rotation and scale-invariant hybrid descriptor (RSHD) is proposed in this paper. The proposed approach considers the whole image as a single region and constructs the descriptor over it. The RSHD is the fusion of color and textural cues present in the image in an efficient manner.

The rest of the paper is organized as follows: Section 2 is dedicated to the RGB color space quantization; Section 3 presents two descriptors intended for benchmarking and the RSHD descriptor; Section 4 explores some distance measure concepts used in this work to compute the similarity score between two images and evaluation criteria; Section 5 employs special image retrieval databases to show the promising experimental results from the RSHD method, including its scale invariance property, its efficiency and how it comes up as a solution to the problems identified in the construction of discriminative image feature descriptors; and, finally, Section 6 concludes this article.

Section snippets

Quantization of RGB color space into shades

RGB color images are being frequently considered for the extraction of color features. RGB color images contain three channels representing Red (R), Green (G) and Blue (B) colors respectively. According to the intensities of these three colors, the original color is defined. The benefit with the RGB color space is its similarity with the actual color of the natural scenes. It is highly desirable to handle the colors in an efficient manner because color is a key factor in the image retrieval for

Overview of some benchmark descriptors and the RSHD

This section introduces the reader to the two descriptors used for the sake of comparison with the new descriptor: the structure element histogram (SEH) and the color difference histogram (CDH). Next, it describes the proposed rotation and scale-invariant hybrid descriptor (RSHD), which has five rotation-invariant structure elements used to encode the texture information. We have used SEH and CDH as the benchmark descriptors for three reasons: (1) these methods also utilized the concept of

Performance measures

The RSHD descriptor can be used in those problems, where image description is needed. We evaluate proposed descriptor in content-based image retrieval, where the main goal is to find the most similar images of an image from a database of images. The similar images are extracted on the basis of the similarity score between the descriptor of query image and database images. The main problem in the evaluation is the computation of similarity score between two descriptors. We refer F = {f1, f2,  , fdim

Experimental observation, result and discussion

This section presents the result obtained by applying various descriptors for content-based image retrieval. We test the robustness and discriminative power of proposed rotation and scale-invariant hybrid descriptor (RSHD) under rotation and scale transformations.

We also compare proposed descriptor with state-of-the-art descriptors for image matching in CBIR system. In this section, first we discuss about the Corel image matching dataset which is used in this paper for evaluation, and then we

Conclusion

This paper presented an efficient image color and texture hybrid feature description for the content-based image retrieval. The proposed descriptor used the concept of structure element into local neighborhood of any pixel to achieve the inherent rotation invariance. RGB color space is quantized into 64 shades to represent the color feature of the image and local neighboring structure patterns are used to encode the textural information of the image. We extracted the structure pattern over each

Shiv Ram Dubey is a PhD Research scholar at Indian Institute of Information Technology Allahabad, India. His area of research interest is Image Processing, Image Feature Description, and Computer Vision. He was a Project Officer in the Computer Vision Lab, Indian Institute of Technology, Madras, India. He has completed his M. Tech in Computer Science & Engineering from GLA University, Mathura, India in 2012. Previously, he obtained his B. Tech in Computer Science & Engineering from Gurukul

References (30)

  • G.H. Liu et al.

    Image retrieval based on multi-texton histogram

    Pattern Recogn

    (2010)
  • G.H. Liu et al.

    Image retrieval based on micro-structure descriptor

    Pattern Recogn

    (2011)
  • X.Y. Wang et al.

    A new content-based image retrieval technique using color and texture information

    Comput Electr Eng

    (2013)
  • S.M. Youssef

    ICTEDCT-CBIR: Integrating curvelet transform with enhanced dominant colors extraction and texture analysis for efficient content-based image retrieval

    Comput Electr Eng

    (2012)
  • G. Quellec et al.

    Fast wavelet-based image characterization for highly adaptive image retrieval

    IEEE Trans Image Process

    (2012)
  • Cited by (0)

    Shiv Ram Dubey is a PhD Research scholar at Indian Institute of Information Technology Allahabad, India. His area of research interest is Image Processing, Image Feature Description, and Computer Vision. He was a Project Officer in the Computer Vision Lab, Indian Institute of Technology, Madras, India. He has completed his M. Tech in Computer Science & Engineering from GLA University, Mathura, India in 2012. Previously, he obtained his B. Tech in Computer Science & Engineering from Gurukul Kangari Vishwavidyalaya, Haridwar, India in 2010. Currently, he is the secretary of IEEE student branch, IIIT Allahabad, India.

    Satish Kumar Singh is currently working as assistant professor in Indian Institute of Information Technology, Allahabad, India. He has completed his Ph.D., M. Tech. & B. Tech in 2010, 2005 and 2003 respectively. He is having more than 10 years of experience in academic and research institutions. He has several publications in international journal and conference proceedings of repute. He is member of various professional societies like, IEEE and IETE. He is an Executive Committee Member of IEEE Uttar Pradesh Section-2014. He is serving as editorial board member and reviewer for many international journals. His current research interests are in the areas of digital image processing, pattern recognition, multimedia data indexing and retrieval, watermarking and biometrics.

    Rajat Kumar Singh received bachelor’s degree in electronics and instrumentation engineering from the Bundelkhand Institute of Engineering and Technology, Jhansi, UP, India, in 1999, and the master’s degree in communication engineering from the Birla Institute of Technology and Science, Pilani, Raj., India, in 2001, and the Ph.D. degree from the Indian Institute of Technology Kanpur, UP, India, in 2007 with a focus on architecture of optical packet switching incorporating various buffering techniques. He is currently working as an Assistant Professor in the Division of Electronics Engineering, Indian Institute of Information Technology, Allahabad, UP, India. His current research interests are in the areas of optical networking and switching, Wireless Sensor Network & Image Processing.

    Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. Ferat Sahin.

    View full text