Elsevier

Neurocomputing

Volume 282, 22 March 2018, Pages 32-41
Neurocomputing

Learning binary codes with local and inner data structure

https://doi.org/10.1016/j.neucom.2017.12.005Get rights and content

Abstract

Recent years have witnessed the promising capacity of hashing techniques in tackling nearest neighbor search because of the high efficiency in storage and retrieval. Data-independent approaches (e.g., Locality Sensitive Hashing) normally construct hash functions using random projections, which neglect intrinsic data properties. To compensate this drawback, learning-based approaches propose to explore local data structure and/or supervised information for boosting hashing performance. However, due to the construction of Laplacian matrix, existing methods usually suffer from the unaffordable training cost. In this paper, we propose a novel supervised hashing scheme, which has the merits of (1) exploring the inherent neighborhoods of samples; (2) significantly saving training cost confronted with massive training data by employing approximate anchor graph; as well as (3) preserving semantic similarity by leveraging pair-wise supervised knowledge. Besides, we integrate discrete constraint to significantly eliminate accumulated errors in learning reliable hash codes and hash functions. We devise an alternative algorithm to efficiently solve the optimization problem. Extensive experiments on various image datasets demonstrate that our proposed method is superior to the state-of-the-arts.

Introduction

During the last decades, visual search for large-scale dataset has drawn much attention in many content-based multimedia applications [1], [2], [3]. Among in large-scale vision problems, binary coding has attracted growing attention in image retrieval [4], [5], [6], image classification [7], [8] and etc. Especially in recent years, the demand for retrieving relevant content among massive images is stronger than ever under such a big-data era. For most occasions, users input a keyword and intend to obtain the semantic similar images precisely and efficiently. For servers, it is almost impossible to linearly search the objects especially when they confront vast amount of images (e.g., the photo sharing website Flickr possesses more than five billion images and people still upload the photos at the rate of over 3000 images per minute) [9]. In contrast to the cost of the storage, the search task consumes more computational resources due to the massive search request. Therefore, recent years people devoted a lot of efforts to handle this problem. Among these proposed methods, hash shows great superiority than others. In simple terms, hashing methods map high-dimensional images into short binary codes and preserve the similarity of the original data at the same time. In this way, searching similar images converts to finding neighbor hashing codes in Hamming space which is simple and practical. Consequently, the technique leads to significant efficiency in multimedia, computer vision, information retrieval, machine learning and pattern matching [10], [11], [12], [13], [14], [15], [16], [17], [18], [19].

For many applications, approximate nearest neighbor (ANN) search is sufficient enough [20], [21], [22], [23], [24]. Given a query instance, the algorithm aims to find the similar instance instead of returning the exact nearest neighbor. Therefore, efficient data structure is required to store data for fast search. Under such background, tree-based indexing approaches are proposed for approximate nearest neighbor (ANN) search typically with the sub-linear complexity of O(log(N)). Among them, KD tree [25], [26], [27], R tree [28] and metric tree [29] are most representative. However, as the image technology develops, the descriptors of one image usually reach to hundreds dimensions and with the dimension increasing, the tree-based methods’ performance dramatically degrades and it need more space to store data which costs a lot. In consideration of the inefficiency of tree-based indexing methods, hashing approaches have been proposed to map entire dataset into discrete codes and preserve the similarity of the data at the same time. The similarity between data points can be measured by Hamming distance which costs little time to calculate for computers.

The existing hash methods can be roughly divided into two groups [30]: data-independent and data-dependent. One of the most classic data-independent methods is Locality-Sensitive Hashing (LSH) [31] that has been widely used to handle massive data due to its simplicity. LSH uses a hash function that randomly projects or permutates nearby data points into same binary codes. However, LSH needs long binary codes to achieve promising retrieval performance which increases the storage space and computation costs. Moreover, LSH ignores the underlying distributions and manifold structure of the data on account of random-projection.

Realizing this deficiency, Weiss et al. proposed Spectral Hashing (SH) [32] utilizing the subset of thresholded eigenvectors of the graph Laplacian by relaxing the original problem which improves the retrieval accuracy to some extent. Yet it charges more time to build a neighborhood graph. Liu et al. delivered some improvements to SH and proposed the Anchor Graph Hashing (AGH) [33] that uses anchor graphs to obtain low-rank adjacency matrices. Formulation of AGH costs constant time by extrapolating graph Laplacian eigenvectors to eigenfunctions. Note that SH and AGH is data-dependent cause it exploits the feature information of the data and preserve the metric structure. This kind of methods are called unsupervised methods like principal component analysis based hashing (PCAH) [34], iterative quantization (ITQ) [35], isotropic hashing (Iso-Hash) [36] and an affinity-preserving k-means hashing (KMH) [37].

However, hashing methods mentioned above can not achieve high retrieval performance with a simple approximate affinity matrix [38]. Due to the semantic gap where the visual similarity often differs from semantic similarity, returning the nearest neighbors in metric space can not guarantee search quality [34], [39]. To solve this problem, images that are artificially labeled as similar or dissimilar are used by supervised hashing methods [5], [40], [41], [42], [43], [44], [45], [46], [47]. The following are most popular ones among them. Kernel-Based Supervised Hash (KSH) [48] maps data into Hamming space where similar items have minimal Hamming distance and simultaneously dissimilar items have maximum distance. Binary reconstructive embeddings (BRE) [49] proposed the hash function which can minimize the reconstruction error between the metric space and Hamming space. Canonical Correlation Analysis with Iterative Quantization (CCA-ITQ) [35] and Supervised Discrete Hashing (SDH) [50] are proposed to satisfy the semantic similarity. By leveraging pairwise labeled information, the performance of supervised methods has been remarkably improved. Moreover, some hashing learning approaches based deep neural networks have been recently proposed to perform simultaneous.

No matter the method is supervised or unsupervised, the object function with discrete constraints involves a mixed binary-integer problem [50] which is NP-hard. To tackle this problem, most hash methods relax the discrete constraints. They first calculate a continuous solution then threshold it to obtain binary codes without realizing the importance of discrete optimization. This technique will leads to the significant information loss during the learning process [51]. It has been shown that the quality of the codes degrades quickly especially when the code length increases if we ignore the discrete constraints. Some methods try to improve the quality by replacing the sign function with more smoothing sigmoid function [52]. However, it does not solve the problem explicitly. Recently, only few methods directly generate hash codes in discrete space. In addition, deep neural network has been applied in recent retrieval progress [53], [54], [55]. Deep-learning-based hashing obtains high efficiency by learning the image representation and hash codes tightly coupled [56], [57]. Nonetheless, due to the resulting computational expense and storage cost are huge, we do not make much comparison.

In this paper, we aim to design a supervised hash method which can efficiently generate high-quality compact codes. We utilize the anchor graph which is built based on the pairwise similarity to exploit the inner structure of the original data, in the process of learning hash function, we also take the supervision information to preserve the pairwise similarity to improve the accuracy of retrieval. To avoid accumulated errors caused by continuous relaxation, we choose to directly optimize the binary codes. With the discrete constraints added to objective function, we propose a novel hashing framework, termed Local and Inner Data Structure Supervised Hashing (LISH) which is able to efficiently generate codes and satisfy the semantic similarity at the same time. We enhance our work by conduct more experiments to assure the results much more detailed and accurate [58]. We also simplify some of our mathematical derivation. Our main contributions are summarized as follows:

  • Our method uses graph Laplacian to captures the local neighborhoods to enhance hashing codes’ quality. And semantic gap can be properly solved by utilizing labeled information. By this way, both metric and semantic similarity are preserved by our method which contribute a lot to improve the performance significantly.

  • Most existing hash method solve the problem with the relaxation of the discrete constraints, since we directly optimize our method and each bit can be sequentially learned by the algorithm, our method outperforms in an alternative and efficient manner.

  • We evaluate our method on three popular large-scale image datasets and obtain superior accuracy than state-of-the-arts.

The remainder of this paper is organized as follows. A brief review of related work is in Section 2. We present the detailed formulation of proposed LISH method in Section 3. Experimental results are given in Section 4. We conclude our work in Section 5.

Section snippets

Related work

Suppose we have n samples xiRd, i=1,,n, deposited in matrix X=[x1,x2,,xn]TRn×d, where d is the dimensionality of the feature space. In the this section, we briefly review a few representative methods including Spectral Hashing (SH), Kernel Supervised Hashing (KSH).

Local and inner data structure supervised hashing

In this section, we mainly introduce the algorithm of our method in detail. We propose an alternative optimization model and sequentially learn each bit. The hash functions are learned during the optimization process simultaneously.

We defined X above and denote the label matrix Y=[y1,y2,,yn]T{0,1}n×c where c is the number of classes of labels. When xi belongs to class j, yij equals 1, otherwise equals 0. By means of finding the mapping relation between the original feature space and Hamming

Experiments

We conduct extensive experiments to evaluate our proposed method on three publicly available large-scale image datasets: CIFAR-10, SUN397 and YouTube Faces. CIFAR-10 dataset is a labeled subset of 80-million tiny images collection [64], which contains 60K 32 × 32 color images of ten categories and each of them has 6000 samples. The entire dataset is partitioned into two parts: a training subset of 59,000 images and a test subset of 1000 images. Each image is represented by a 512-dimensional

Conclusion

In this paper, we exploited underlying manifold structure of samples by graph Laplacian. The approximate anchor graph was used to save training cost. To capture and preserve the semantic label information in the Hamming space, we explicitly formulated the tractable optimization function integrated with ℓ2 loss and decomposed it into several sub-problems which could be iteratively solved by our algorithm. We proposed a discrete supervised paradigm to directly generate hash codes without

Shiyuan He is currently an undergraduate student in Yingcai Honors College, University of Electronic Science and Technology of China. His major research interests include Image Retrieval, Hash and machine learning.

References (66)

  • GongY. et al.

    Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • LiuX. et al.

    Compact kernel hashing with multiple features

    Proceedings of the ACM International Conference on Multimedia

    (2012)
  • M. Hu et al.

    Hashing with angular reconstructive embeddings

    IEEE Trans. Image Process.

    (2018)
  • MuY. et al.

    Hash-SVM: scalable kernel machines for large-scale visual classification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2014)
  • ZhuX. et al.

    Block-row sparse multiview multilabel learning for image classification

    IEEE Trans. Cybern.

    (2016)
  • WangJ. et al.

    Learning to hash for indexing big data – a survey

    Proc. IEEE

    (2016)
  • C. Strecha et al.

    LDAHash: improved matching with smaller descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • LiX. et al.

    Learning hash functions using column generation

    Proceedings of the International Conference Machine Learning

    (2013)
  • HeX. et al.

    Neural collaborative filtering

    Proceedings of the Twenty Sixth International Conference on World Wide Web, WWW

    (2017)
  • SongJ. et al.

    Inter-media hashing for large-scale retrieval from heterogeneous data sources

    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD

    (2013)
  • ZhuX. et al.

    Robust joint graph sparse coding for unsupervised spectral feature selection

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • LiuX. et al.

    Mixed image-keyword query adaptive hashing over multilabel images

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2014)
  • ZhuX. et al.

    A sparse embedding and least variance encoding approach to hashing

    IEEE Trans. Image Process.

    (2014)
  • A. Gionis et al.

    Similarity search in high dimensions via hashing

    Proceedings of the Twenty Fifth International Conference on Very Large Data Bases, VLDB

    (1999)
  • YangY. et al.

    Multitask spectral clustering by exploring intertask correlation

    IEEE Trans. Cybern.

    (2015)
  • HeX. et al.

    Fast matrix factorization for online recommendation with implicit feedback

    Proceedings of the Thirty Ninth International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR

    (2016)
  • YangY. et al.

    Discrete nonnegative spectral clustering

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • A.W.M. Smeulders et al.

    Content-based image retrieval at the end of the early years

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • J.L. Bentley

    Multidimensional binary search trees used for associative searching

    Commun. ACM

    (1975)
  • J.H. Friedman et al.

    An algorithm for finding best matches in logarithmic expected time

    ACM Trans. Math. Softw.

    (1977)
  • C. Silpa-Anan et al.

    Optimised KD-trees for fast image descriptor matching

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2008)
  • V. Gaede et al.

    Multidimensional access methods

    ACM Comput. Surv. (CSUR)

    (1998)
  • WangS. et al.

    A multi-label least-squares hashing for scalable image search

    Proceedings of the SIAM International Conference on Data Mining, SIAM

    (2015)
  • Cited by (5)

    • Unsupervised adaptive hashing based on feature clustering

      2019, Neurocomputing
      Citation Excerpt :

      However, the difficulty of unsupervised hashing is yet existing caused by lacking of effective objectives. Thus, developing traditional hashing methods [10,40] is meaningful to provide theoretical inspirations for further deep learning. In this paper, to further prove the effectiveness of our method, we also compare the performances of hashing methods on deep features.

    • Efficient similarity join for certain graphs

      2021, Microsystem Technologies

    Shiyuan He is currently an undergraduate student in Yingcai Honors College, University of Electronic Science and Technology of China. His major research interests include Image Retrieval, Hash and machine learning.

    Guo Ye is the undergraduate student in Yingcai Honors College, University of Electronic Science and Technology of China. His interests include Image Retrieval, Hash and machine learning.

    Mengqiu Hu received the bachelors degree in 2015 from University of Electronic Science and Technology of China. He is currently a master student in University of Electronic Science and Technology of China. His major research interests include computer vision and machine learning.

    Yang Yang received the bachelors degree from Jilin University in 2006, the masters degree from Peking University in 2009, and the Ph.D. degree from The University of Queensland, Australia, in 2012, under the supervision of Prof. H. T. Shen and Prof. X. Zhou. He was a Research Fellow with the National University of Singapore from 2012 to 2014. He is currently with the University of Electronic Science and Technology of China.

    Fumin Shen received the B.S. from Shandong University in 2007 and the Ph.D. degree from the Nanjing University of Science and Technology, China, in 2014. He is currently an Associate Professor with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. His major research interests include computer vision and machine learning, including face recognition, image analysis, hashing methods, and robust statistics with its applications in computer vision.

    Heng Tao Shen received the B.Sc. degree (Hons.) and the Ph.D. degree from the Department of Computer Science, National University of Singapore, in 2000 and 2004, respectively. He joined The University of Queensland as a Lecturer, a Senior Lecturer, and a Reader, where he became a Professor in 2011. He is currently a Professor of Computer Science and an ARC Future Fellow with the School of Information Technology and Electrical Engineering, The University of Queensland. He is also a Visiting Professor with Nagoya University and the National University of Singapore. His research interests mainly include multimedia/ mobile/Web search, and big data management on spatial, temporal, multimedia, and social media databases. He has extensively published and served on program committees in most prestigious international publication venues of interests. He received the Chris Wallace Award for outstanding Research Contribution in 2010 conferred by the Computing Research and Education Association, Australasia. He is also an Associate Editor of the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. He will serve as the PC Co-Chair of the ACM Multimedia in 2015.

    1

    These authors contributed equally to this work.

    View full text