Binary code learning via optimal class representations
Introduction
With the explosive growth of images on the web, nearest neighbor search has attracted great attention in computer vision, machine learning, information retrieval and related area [5], [16], [8], [37], [18], [17], [19], [48], [38]. When the images are high dimensional, searching efficiently becomes challenging and crucial. Two mainstream retrieval approaches are tree based and hashing based methods. With tree based methods, searching is speeded up by exploiting spatial partitions of data space via various tree structures. Decision trees [27] and kd-trees [23] are two such methods. However, storage and time consumption grow exponentially with dimension growing, which lead to an inefficient search.
To search high-dimensional data efficiently, hashing becomes a promising approach. Hashing methods map the high-dimensional vector onto a low-dimensional binary code vector, and the mapped binary codes are used for efficient search. Besides search, binary code has also been widely applied in various vision applications [21], [20], [22]. Existing hashing techniques can be divided into two categories: data-dependent and data-independent. Locality Sensitive Hashing (LSH) [5] is one of the most popular data-independent methods. In LSH, the random hyperplane-based hash function involves a random projection sampled from a Gaussian distribution. In addition to Euclidean distance, Developed LSH methods employed several other distance measures such as p-norm distances [2], the Mahalanobis metric [12], and kernel versions [11], [28]. The LSH family, however, needs long binary codes for achieving high search performances, which lead to a high storage consumption.
Instead of generating hash functions randomly, data-dependent hashing methods learn similarity-preserving binary code from training data. Various data-dependent methods have been proposed in the literature. Representative methods in this category can be divided into two parts: supervised methods and unsupervised methods. Unsupervised methods use the sole unlabelled data to generate binary codes. For example, PCA Hashing [47], ITQ [6], Isotropic hashing [9], Spectral Hashing (SH) [41] and Asymmetric Inner Product Binary Coding (AIBC) [29], are some widely used methods. These unsupervised methods, however, do not consider the supervision information. Therefore many supervised methods are proposed to handle this issue such as the supervised minimal loss hashing (MLH) [24], kernel-based supervised hashing (KSH) [15], supervised discrete hashing (SDH) [30], FastHash [13], graph cuts coding (GCC) [4] etc.
A few hashing methods propose to generate the hash functions in the kernel space as the extension of linear hashing methods, such as binary reconstructive embeddings (BRE) [10], KLSH [11]. Recently, it is shown that compact similarity-preserving hash codes can be obtained by considering the non-linear manifold structure. One of the most popular methods in this category is spectral hashing [41], which generates hash codes by solving the relaxed mathematics program that is similar to Laplacian eigenmaps [1]. As the extension of SH, anchor graph hashing (AGH) [16] use the anchor graph affinity, which makes training and the out-of-sample extension problem tractable for large-scale dataset. Inductive manifold hashing (IMH) [31], [32] proposed a new framework for generating nonlinear hash functions. Other related methods include the multidimensional spectral hashing (MDSH) [40] and DGH [14].
In general, supervised methods outperform unsupervised methods due to the usage of supervision information of the training data. In SSH, a matrix S is defined incorporating the pairwise labeled information. In SDH, the label information is used for classifying binary codes. However, most existing supervised methods simply focus on the label information and pay no attention to the relationship between classes. We believe that the semantic relationship between classes gives more detailed and specific information than label information, and improve the retrieval performance.
In this work, we propose a new method to compute the binary codes for classes as the optimal representations, assuming that an optimal representation can be representative for its corresponding class and reflect the relationship with other classes. By considering the semantic similarity between classes, a matrix is constructed to depict the semantic similarity between classes. Thus, the optimal class representations are computed according to this matrix. Our contributions are as follows:
- 1.
We propose a new supervised hashing method that each class is assigned to an optimal binary code as their class representation, considering that optimal class representations preserve semantic similarity between classes well.
- 2.
We construct a semantic relatedness matrix to depict the semantic similarity, and then a set of binary codes is computed to preserve the similarity in the Hamming space. The binary codes of data are expected to be close with their corresponding optimal class representation. After solving a straightforward optimization problem, the binary codes and hash functions are learned efficiently.
Section snippets
Learning the optimal class binary representations
Suppose that we have n samples . Our aim is to get a set of binary codes to preserve their semantic similarities well. Here we want to get the optimal class representations which can capture the semantic similarities between classes, and then the set of binary codes B is learnt according to the corresponding optimal class representations.
For c classes, we compute a matrix , and every row piT in P is the optimal class representation
Nonlinear embedding
The hash functions map the data onto a Hamming space to obtain binary codes. In this work, we learn hash functions in a kernel embedding, and RBF kernel mapping is a simple yet effectively choose, and such kernel is widely adopted in hash function in, e.g., BRE [10], KSH [15] and SDH [30]:where is a m-dimensional vector obtained by the RBF kernel mapping: , where are the m anchor points randomly selected from the training data and σ
Experimental results
Our experiments are conducted on three widely used datasets: CIFAR-10,1 SUN397 [43] and ImageNet [3]. The proposed method is compared against several state-of-art supervised hashing methods including SDH [30], CCA-ITQ [6] and KSH [11]. For the three methods, we employed the implementations of these compared methods provided by the original authors. For KSH, we sampled 2000 images from training data to build pairwise label matrix according to [15]. For
Conclusions
In this paper, we introduced a new hashing method for image retrieval. Different from the existing supervised methods, we focused on the semantic similarity between classes under the assumptions that the semantic relationship between classes provided useful information to improve retrieval performance. The most important part of our method was to find a set of binary codes for classes as their optimal representations. For utilizing the semantic similarity between classes, we built a semantic
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Project 61502081, Project 61472063 and Project 61473154 and Project of Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology), Grant no. 30920140122007.
Xiang Zhou is currently a undergraduate student in University of Electronic Science and Technology of China. His major research interests include computer vision and machine learning.
References (49)
- et al.
Structure sensitive hashing with adaptive product quantization
IEEE Trans. Cybern.
(2015) - et al.
Multiple feature kernel hashing for large-scale visual search
Pattern Recognit.
(2014) - et al.
Large-scale unsupervised hashing with shared structure learning
IEEE Trans. Cybern.
(2015) - et al.
Locality constrained representation based classification with spatial pyramid patches
Neurocomputing
(2013) - M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Proceedings of...
- M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in:...
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in:...
- T. Ge, K. He, J. Sun, Graph cuts for supervised binary coding, in: Proceedings of the European Conference on Computer...
- A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in: Proceedings of International...
- et al.
Iterative quantizationa procrustean approach to learning binary codes for large-scale image retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
Fast similarity search for learned metrics
IEEE Trans. Pattern Anal. Mach. Intell.
Cost-sensitive local binary feature learning for facial age estimation
IEEE Trans. Image Process.
Learning compact binary face descriptor for face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Scalable nearest neighbor algorithms for high dimensional data
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (0)
Xiang Zhou is currently a undergraduate student in University of Electronic Science and Technology of China. His major research interests include computer vision and machine learning.
Fumin Shen received his B.S. and Ph.D. degree from Shandong University and Nanjing University of Science and Technology, China, in 2007 and 2014, respectively. Currently he is a lecturer in school of Computer Science and Engineering, University of Electronic of Science and Technology of China, China. His major research interests include computer vision and machine learning, including face recognition, image analysis, hashing methods, and robust statistics with its applications in computer vision.
Yang Yang is currently with University of Electronic Science and Technology of China. He was a Research Fellow in National University of Singapore during 2012–2014. He was conferred his Ph.D. degree in 2012 from The University of Queensland, Australia. During the Ph.D. study, Yang Yang was supervised by Prof. Heng Tao Shen and Prof. Xiaofang Zhou. He obtained Master׳s degree in 2009 and Bachelor׳s degree in 2006 from Peking University and Jilin University, respectively.
Guangwei Gao received the B.S. degree in Information and Computation Science from Nanjing Normal University, Nanjing, China, in 2009, and the Ph.D. degree in Pattern Recognition and Intelligence Systems from Nanjing University of Science and Technology, Nanjing, China, in 2014. From March 2011 to September 2011 and February 2013 to August 2013, he was an exchange student of Department of Computing, Hong Kong Polytechnic University. Now, he is an Assistant Professor in the Institute of Advanced Technology, Nanjing University of Posts and Telecommunications. His research interests include face recognition, face hallucination and biometrics.
Yuan Wang is a Research Fellow at Department of Industrial and Systems Engineering in National University of Singapore. She received her Ph.D. degree in Operations Research, with a specialization in Maritime Transportation Optimization and Simulation, at National University of Singapore. Her research interests include Mathematical Modeling, Complex System simulation and optimization heuristics. She is currently working on Data driven and on-demand scheduling problems in center of next generation logistics NUS.