Dimensionality reduction for histogram features: A distance-adaptive approach

doi:10.1016/j.neucom.2015.03.123

Neurocomputing

Volume 173, Part 2, 15 January 2016, Pages 181-195

https://doi.org/10.1016/j.neucom.2015.03.123 Get rights and content

Abstract

Histogram representations of visual features, such as high dimensional Bag-of-Features (BOF) and Spatial Pyramid Matching (SPM) representation, have been widely studied and adopted in image classification and retrieval due to their simplicity and performance. Problems involving high dimensional feature vectors usually require much computational cost and huge storage space. Moreover, it may additionally suffer from low accuracy because of the noise in data. In this paper we propose a novel distance-adaptive dimensionality reduction framework, namely generalized Multidimensional Scaling, with linear coding time to create compact and discriminative BOF or SPM representations. Comparing with traditional MDS, our approach exhibits two advantages, on one hand it is adaptive to many measures; on the other hand, it is able to map arbitrary query points into the new space. Exhaustive experimental results show that a very low dimension of BOF or SPM is sufficient for the retrieval task without losing accuracy. Comparatively, the state-of-the-art methods cannot achieve high accuracy on the low dimension. Aside from image retrieval task, we also show that our approach is much more effective than the original histogram representations when applied in image classification task.

Introduction

Image relevant applications [1], [2], [3], such as object retrieval, scene recognition, etc., have been widely studied during last decade. The problem can be generally described as finding the images in a database that are the most similar to a query image. The significant growth in both the number and size of digital images and video collections on the web has brought in imperious demands for more powerful image retrieval tools. In recent years, many solutions have been proposed to improve the quality of search result. In particular, a sustained line of research has been initiated by the histogram features such as Bag-of-Features representation (BOF) [4], which have been justified to be effective in datasets with up to millions of images [5]. The general process are as follows. Firstly, an image is represented by a collection of local descriptors such as Scale-Invariant Feature Transform (SIFT) [6]. Then these descriptors are aggregated into a single histogram representation, which collects the statistics of so-called “visual words”. Additionally, BOF has been further extended to Spatial Pyramid Matching (SPM) [7], which is a spatial extension of an orderless BOF image representation.

However, aforementioned process may not be applicable in recent applications where the number of images to perform query may exceed tens of millions. In that case, to well approximate the distribution of visual words in an image, the BOF representation is chosen to be high dimensional, probably up to more than thousands of dimensions [8], [9]. Consequently, the corresponding histogram representation of BOF is very sparse. Similar phenomenon also exists in SPM [7], where 2100–4200 is set as the range for optimal number of dimensions. Philbin et al. reported a memory usage of 4.3 G for approximately 1.1 M images [10], which indicates that each BOF vector needs huge storage space. Moreover, the computational efficiency of BOF in large scale datasets will succumb to the well-known “curse of dimensionality”. To summarize, there are two factors limiting the number of images that can be indexed in practice: the efficiency of the search process and the memory required to represent an image.

To solve the problems above, dimensionality reduction techniques have been proposed to generate compact and discriminative histogram representations. Among them, linear algorithms are convenient to create a kernel to project the original points into the new low-dimensional space, including Principal Component Analysis (PCA)[11] and Linear Discrimination Analysis (LDA) [12]. While histograms can be viewed as points on a statistical manifold, it is typically concentrated on a lower dimensional manifold of the measurement space. The study of these low dimensional manifolds has led to a specific research topic of machine learning, namely manifold learning, based on which series of nonlinear dimensionality reduction algorithms have been proposed, including Isomap [13], Local Linear Embedding (LLE) [14], and Laplacian Eigenmaps (LE) [15]. However, none of these approaches are specifically designed for histogram features, which is a representative method for image description and has shown its success in both image classification and retrieval. To the best of our knowledge, histogram features have not been thoroughly studied in the aspect of dimensionality reduction.

MDS has been one of the most popular approaches to reduce dimensionality [16], it is used as a way to represent perceived similarities between a pair of stimuli by minimizing the ratio of differences between inter-point distances in the original high-dimensional space and the projected low-dimensional space. However, state-of-the-art implementations of MDS are not suitable for the large-scale datasets nowadays, due to two limitations: (1) traditional MDS is not specifically designed for the histogram representations. As is well known to all, dimensionality reduction techniques which aim to map high-dimensional data into much lower dimensional space have been thoroughly studied [17], [18]. However, most of existing dimensionality reduction techniques (including traditional MDS) are designed for the Euclidean space, which are not suitable for the scenarios such as BOF (resp. SPM), where data samples are represented using histogram. (2) MDS cannot map an arbitrary query into the lower dimensional space, thus are not applicable in image query tasks. Given an arbitrary query point, the MDS similarity matrix needs to be updated and it is impossible to make a direct mapping into low-dimensional space.

In this paper, we propose a distance-adaptive approach, namely generalized MDS (gMDS) method. Our method is not only applicable for different distance measures (e.g. intersection kernel between histograms from BOF or SPM) but also exhibits limited computation time and space, which makes it suitable for large scale image retrieval. Our method is inspired by MDS techniques, but is in fact a linear approach in the aspect that it learn from MDS mapping an approximate transforming matrix such that an arbitrary query can be mapped into the new lower dimensional space. Moreover, it costs linear time to code an arbitrary query point, which makes our method more useful and scalable in large scale retrieval tasks than other approaches or distance measures in most cases. Remarkably, rather than focusing on reducing the dimensionality of histogram directly, we attempt to find a low-dimensional Euclidean embedding such that distances or the dissimilarity between BOFs or SPMs can be preserved.

In general, our contributions in this paper can be summarized as follows:

•
We propose a general framework of dimensionality reduction for scenarios where data points are represented using histograms and corresponding variants. In this framework, we firstly generate a similarity matrix between each pair of histograms using some distance measure, and then find a low-dimensional Euclidean embedding of the original histograms such that similarity between histograms can be preserved in the new low-dimensional space. Inspired by the similarity measurement between SPMs using the kernel function, we use the kernel to build the similarity matrix. Furthermore, we find more latent semantic information of images by embedding the original high dimensional histograms into a more befitting low-dimensional Euclidean space.
•
We have significantly improved MDS framework to be applicable in general scenarios. As we all know, for an arbitrary query points, MDS needs to update the similarity matrix, and it is not tolerated for big data applications. In our new framework, we learn a transition matrix between the original high-dimensional space and new low-dimensional space. Through the learned transition matrix, arbitrary query data can be mapped into the new low-dimensional space simply and fast. To the best of our knowledge, we are the first to develop a mapping scheme for MDS and apply it in image retrieval task. On the other hand, our framework is adaptive to many distance measures other, which makes it applicable in more general tasks.
•
Some impressive findings based on exhaustive experiments have been observed by our novel dimensionality reduction algorithm on the histogram representation. Extensive experimental study shows that using intersection kernel to build similarity matrix can achieve better performance than Euclidean distance in most cases. Therefore, a small number of dimension of BOF or SPM is sufficient for the learning and retrieval tasks without loss of accuracy. Comparatively, the state-of-the-art methods cannot achieve high accuracy on the same low dimension.

The remainder of the paper is organized as follows. Section 2 briefly presents some related work of dimensional reduction technologies for BOF and SPM. In Section 3, we introduce some preliminary knowledge of dimensionality reduction. Our method is presented in Section 4, including the construction of similarity matrix and the generalized MDS (gMDS) framework. In Section 5, we show experimental results conducted using several real-world datasets from the aspects of both image retrieval and classification problems. Within the experimental study, we report and compare our work with state-of-the-art methods in the aspect of accuracy, time and storage. Finally, in Section 6 we conclude this work and propose some future directions.

Section snippets

Related works

In this section, we briefly review some unsupervised dimensionality reduction techniques for the histogram features, including linear dimensionality reduction and nonlinear dimensionality reduction.

A straightforward way of reducing the dimensionality of BOF is to create small-sized codebooks. However, this will quickly bring down the discriminability of BOF representations and degrade recognition performance. Simply selecting a small number of most discriminative visual words or linearly

Bag-of-Features and Spatial Pyramid Matching

BOFs popularity benefits from powerful local features, such as the SIFT descriptor [6], [32] and some other variants [33], [34], [35], [36]. Using local features to represent image has been gaining popularity because of the superior discriminability. It is unpractical to match local features between two images since its high complexity in both storage and computation. Although the application of dimensionality reduction techniques is promising, the required computational time to calculate the

Generalized MDS

Although MDS has been justified to be superior to PCA as well as many other approaches, it cannot be directly applied in other distance measures or retrieval tasks (as shown in Fig. 2), which limits its use in image classification and retrieval. We now present a generalized MDS which not only is able to map an arbitrary query image into low-dimensional space but also is adaptive to many distance measures of BOF (resp. SPM), thus it is applicable in image classification and retrieval. The

Experimental results

In this section, we first describe the real-world datasets as well as evaluation measures used during the experimental study. We then evaluate dimension reduction quality of our method in two different tasks: scene retrieval and object recognition. After that, we perform extensive experiments on real-life datasets to demonstrate the practical effectiveness of our method.

Conclusion

In this paper, we have presented a simple but effective dimensionality reduction approach that is adaptive to many distance measures for histogram-like data. Through that, we are able to obtain a meaningful compact representation of BOF and SPM. Different from the standard approaches of dimensional reduction, our method extracts latent semantic information in the high dimensional BOF representations by embedding them into a more suitable low-dimensional Euclidean space. Especially, on one hand,

Acknowledgments

Jiangtao Cui and Hui Li are supported by National Nature Science Foundation of China (Nos. 61173089, 61202179, 61472298 and U1135002), National High Technology Research and Development Program (863 Program) (No. 2015AA016007), SRF for ROCS, SEM and the Fundamental Research Funds for the Central Universities.

Miaomiao Cui received her B.S. degree in Computer Science and Technology from Shanxi University in 2012. She is currently now pursuing her M.S. degree in Computer Science and Technology at Xidian University. Her research interests include statistical manifold learning and dimensionality reduction.

References (54)

M. Zang et al.
A novel topic feature for image scene classification
Neurocomputing
(2015)
J. Yang et al.
Scene and place recognition using a hierarchical latent topic model
Neurocomputing
(2015)
H. Yang et al.
Recent advances and trends in visual trackinga review
Neurocomputing
(2011)
C. Zhang et al.
A general kernelization framework for learning algorithms based on kernel PCA
Neurocomputing
(2010)
K. Thangavel et al.
Dimensionality reduction based on rough set theorya review
Appl. Soft Comput.
(2009)
Y. Pang et al.
Scale invariant image matching using triplewise constraint and weighted voting
Neurocomputing
(2012)
Y. Pang et al.
Fully affine invariant SURF for image matching
Neurocomputing
(2012)
J. Qin et al.
Scene categorization via contextual visual words
Pattern Recognit.
(2010)
L. Fei-Fei et al.
Learning generative visual models from few training examplesan incremental Bayesian approach tested on 101 object categories
Comput. Vis. Image Underst.
(2007)
C. Zhang et al.
Object categorization in sub-semantic space
Neurocomputing
(2014)

C. Zhang et al.

Beyond visual featuresa weak semantic image representation using exemplar classifiers for classification

Neurocomputing

(2013)

L. Zhang et al.

Low-rank decomposition and Laplacian group sparse coding for image classification

Neurocomputing

(2014)

J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: 9th IEEE...

D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree, in: IEEE Conference on Computer Vision and...

D.G. Lowe

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

(2004)

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...

H. Jégou, M. Douze, C. Schmid, Packing bag-of-features, in: 12th IEEE International Conference on Computer Vision,...

H. Jégou et al.

Aggregating local image descriptors into compact codes

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Lost in quantization: improving particular object retrieval in...

A.M. Martínez et al.

PCA versus LDA

IEEE Trans. Pattern Anal. Mach. Intell.

(2001)

D.M. Blei et al.

Latent Dirichlet allocation

J. Mach. Learn. Res.

(2003)

J.B. Tenenbaum et al.

A global geometric framework for nonlinear dimensionality reduction

Science

(2000)

S.T. Roweis et al.

Nonlinear dimensionality reduction by locally linear embedding

Science

(2000)

M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: NIPS, vol. 14,...

W.S. Torgerson

Multidimensional scalingI. Theory and method

Psychometrika

(1952)

S. Yan et al.

Graph embedding and extensionsa general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

Y. Pang et al.

Ranking graph embedding for learning to rerank

IEEE Trans. Neural Netw. Learn. Syst.

(2013)

Cited by (4)

Retrieval of flower videos based on a query with multiple species of flowers
2021, Artificial Intelligence in Agriculture
Citation Excerpt :
It can be reduced with the feature dimensionality reduction techniques. The dimensionality reduction techniques such as Principal Component Analysis (PCA) (Geetha et al., 2009), Fisher Discriminant Ratio (Shen et al., 2016), Linear Discriminant Analysis (Gao et al., 2009), semi-supervised linear discriminant analysis (Wang et al., 2016), supervised linear dimensionality reduction (Cui et al., 2016), nonparametric discriminant analysis (Khan et al., 2012) are utilized to reduce the feature dimension in other video retrieval systems. In proposed work, to design a flower video retrieval system the features of previous work (Guru et al., 2018a, 2018b) such as GLCM (Haralick et al., 1973), LBP (Ojala et al., 2002) and SIFT Lowe (2004) are utilized.
Searching, recognizing and retrieving a video of interest from a large collection of a video data is an instantaneous requirement. This requirement has been recognized as an active area of research in computer vision, machine learning and pattern recognition. Flower video recognition and retrieval is vital in the field of floriculture and horticulture. In this paper we propose a model for the retrieval of videos of flowers. Initially, videos are represented with keyframes and flowers in keyframes are segmented from their background. Then, the model is analysed by features extracted from flower regions of the keyframe. A Linear Discriminant Analysis (LDA) is adapted for the extraction of discriminating features. Multiclass Support Vector Machine (MSVM) classifier is applied to identify the class of the query video. Experiments have been conducted on relatively large dataset of our own, consisting of 7788 videos of 30 different species of flowers captured from three different devices. Generally, retrieval of flower videos is addressed by the use of a query video consisting of a flower of a single species. In this work we made an attempt to develop a system consisting of retrieval of similar videos for a query video consisting of flowers of different species.
Interclass boundary preservation (IBP): a data reduction algorithm
2023, International Journal of Information Technology (Singapore)
Retrieval of videos of flowers using deep features
2021, Lecture Notes on Data Engineering and Communications Technologies
A Web-based micro-service architecture for comparing parallel implementations of dissimilarity measures
2019, Advances in Intelligent Systems and Computing

Jiangtao Cui graduated from the Department of Computer Software of Xidian University in 1998. He received the Master's degree of Engineering in 2001, and Ph.D. degree in 2005, respectively. He engaged a visiting research in the University of Queensland from 2007 to 2008 as a national public school visiting scholar. Now he is a professor of the School of Computer Science and Technology in Xidian University, a doctoral tutor, and the associate dean of School of Computer Science and Technology. Besides, he is a senior member of the China Computer Federation, an ACM member and also an executive director of the Computer Education Society of Shaanxi Province. His research interests include image/video processing, pattern recognition and high-dimensional indexing.

Hui Li received the B.Eng. from Harbin Institute of Technology in 2005 and Ph.D. degree from Nanyang Technological University, Singapore, in July 2012, respectively. He is an associate professor in the School of Cyber Engineering, Xidian University, China. His research interests include data mining and knowledge discovery.

View full text

Dimensionality reduction for histogram features: A distance-adaptive approach

Abstract

Introduction

Section snippets

Related works

Bag-of-Features and Spatial Pyramid Matching

Generalized MDS

Experimental results

Conclusion

Acknowledgments

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Appl. Soft Comput.

Neurocomputing

Neurocomputing

Pattern Recognit.

Comput. Vis. Image Underst.

Neurocomputing

Neurocomputing

Neurocomputing

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

Aggregating local image descriptors into compact codes

IEEE Trans. Pattern Anal. Mach. Intell.

PCA versus LDA

IEEE Trans. Pattern Anal. Mach. Intell.

Latent Dirichlet allocation

J. Mach. Learn. Res.

A global geometric framework for nonlinear dimensionality reduction

Science

Nonlinear dimensionality reduction by locally linear embedding

Science

Multidimensional scalingI. Theory and method

Psychometrika

Graph embedding and extensionsa general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Ranking graph embedding for learning to rerank

IEEE Trans. Neural Netw. Learn. Syst.