Elsevier

Pattern Recognition

Volume 95, November 2019, Pages 114-127
Pattern Recognition

Deep forest hashing for image retrieval

https://doi.org/10.1016/j.patcog.2019.06.005Get rights and content

Highlights

  • The proposed method considers three types of similarity metrics to preserve the semantic similarity and manifold similarity among the data points in the Hamming space.

  • Different sized sliding windows are used to extract multi-grained features from raw data. And the feature extraction phase is dependent on the hash function learning stage, which helps in learning better hash functions.

  • Compared with deep neural network-based hashing methods, the proposed method has fewer hyperparameters, faster training speed and easier theoretical analysis.

  • The proposed method learns shorter binary code representations to achieve effective and efficient image retrieval.

Abstract

Hashing methods have been intensively studied and widely used in image retrieval. Hashing methods aim to learn a group of hash functions to map original data into compact binary codes and simultaneously preserve some notion of similarity in the Hamming space. The generated binary codes are effective for image retrieval and highly efficient for large-scale data storage. The decision tree is a fast and interpretable model, but the current decision tree based hashing methods have insufficient learning ability due to the use of shallow decision trees. Most current deep hashing methods are based on deep neural networks. However, considering the deficiencies of deep neural network-based hashing, such as the presence of too many hyperparameters, poor interpretability, and requirement for expensive and powerful computational facilities during the training process, a non-deep neural network-based hashing model need to be designed to achieve efficient image retrieval with few hyperparameters, easy theoretical analysis and an efficient training process. The multi-grained cascade forest (gcForest) is a novel deep model that generates a deep forest ensemble classifier to process data layer-by-layer with multi-grained scanning and a cascade forest. To date, gcForest has not been used to generate compact binary codes; therefore, we propose a deep forest-based method for hashing learning that aims to learn shorter binary codes to achieve effective and efficient image retrieval. The experimental results show that the proposed method has better performance with shorter binary codes than other corresponding hashing methods.

Introduction

The rapid development of information technology has substantially increased the data accumulated in various fields, and the age of big data has arrived. Determining how to quickly and accurately search the data that users need in massive data has become an urgent problem to be solved. Efficient retrieval on large-scale data has become a popular research topic in academia and industry. Hashing learning has attracted considerable attention in recent years due to its excellent performance in processing high-dimensional data. Our paper focuses on the deep hashing model using deep forests as hash functions with considerably fewer hyperparameters, competitive performance and convincing theoretic analysis.

Hashing learning aims to transform every data item into a low-dimensional representation, i.e., equivalently, a short code consisting of a sequence of bits referred to as hash code [1], [2]. Hashing methods can be generally divided into two main categories: data-independent hashing and data-dependent hashing [3], [4], [5]. The main difference is that hash functions used by data-independent methods are either manually designed or randomly generated, while those in data-dependent methods are automatically learned from the data. The most representative data-independent hashing methods for image retrieval tasks are locality-sensitive hashing (LSH) [6] and its variants superbit LSH [7], nonmetric LSH [8], kernelized LSH [9] and LSH with faster computation of the hash functions [10]. LSH uses a set of randomly generated hyperplanes that are sampled from a Gaussian distribution, and then projects the original high-dimensional data onto the hyperplane, and finally thresholds the projection results as the output of the hash functions. The emergence of LSH greatly improves the efficiency of image retrieval and provides a new perspective for solving large-scale image retrieval. However, the hash functions in data-independent methods represented by LSH are randomly generated or manually specified, irrespective of the distribution of the original data, so the accuracy of the algorithm increases slowly when the number of bits increases. Therefore, it is difficult to obtain a stable retrieval result in practical applications. Different from data-independent methods, the hash function for each code in data-dependent methods is learned from the data, and has practical implications.

Data-dependent methods seem to be the trend in current research and applications. Spectral hashing (SH) [11], a classic data-dependent hash method, uses a complete dataset from the training set to construct a complete graph with the similarity (Gaussian similarity) between all data samples as the weights of the edges. Each hash function can be regarded as a cut of the graph. Each corresponding cut satisfies the condition that the sum of weights corresponding to the cut edge is the smallest, and the whole complete graph is as evenly divided into two parts as possible. This problem can be transformed into the classic normalized cut problem in graph theory, and the hash codes are solved by signing the eigenvectors corresponding to the minimum eigenvalues of the Laplacian matrix. Data-dependent method can be further categorized as supervised and unsupervised hashing according to the label information. Unsupervised hashing methods attempt to preserve the similarity in the original feature space and supervised hashing methods aim to preserve the semantic similarity. Examples of unsupervised methods include iterative quantization (ITQ) [3], isotropic hashing (IsoHash) [4], discrete graph hashing (DGH) [12], and scalable graph hashing (SGH) [13]. Examples of supervised methods include supervised hashing with kernels (KSH) [14], two step hashing (TSH) [15], fast supervised hashing (FastH) [16], [17], supervised discrete hashing (SDH) [18] and its fast version, fast supervised discrete hashing (FSDH) [19], supervised discrete discriminant hashing (SDDH) [20], ranking-based supervised hashing (RSH) [21] and discrete semantic ranking hashing (DSeRH) [22]. Quantization-based hashing (QBH) [23] is a general framework applied to both unsupervised and supervised hashing.

In recent years, deep learning has achieved outstanding results in various fields, especially in speech recognition and computer vision. Semantic hashing [24] is the first work using deep learning; afterwards, many scholars considered the combination of a hashing method and deep learning [25], [26], such as semi-supervised deep learning hashing (DLH) [27], network in network hashing (NINH) [28], convolutional neural network hashing (CNNH) [29], similarity-adaptive deep hashing (SADH) [30], deep semantic ranking-based hashing (DSRH) [31], deep hashing based on classification and quantization errors (DHCQ) [32], deep supervised discrete hashing (DSDH)[33] and deep pairwise-supervised hashing (DPSH) [34]. DPSH utilizes the deep neural network to perform simultaneous feature learning and hash code learning for applications with pairwise labels. Deep learning requires that to address complicated learning tasks, it is likely that learning models must go deep [35]. Currently, the most popular deep model is the deep neural network. Although the deep neural network is powerful, there are still some shortcomings. Deep neural networks need large-scale data for training. The training process usually requires powerful computing devices. There are too many hyperparameters to learn in deep neural networks. Finally, the performance depends heavily on parameter tuning. Considering the new deep model, gcForest [35] and the shallow level decision tree used in existing tree-based and forest-based hashing methods, we propose a deep hashing model using deep forests as hash functions. To the best of our knowledge, this model represents a new deep hashing method that is distinguished from deep hashing models based on deep neural networks. The following contributions should be highlighted:

  • The proposed method considers three types of similarity metrics to preserve the semantic similarity and manifold similarity among the data points in the Hamming space.

  • Different sized sliding windows are used to extract multi-grained features from raw data. And the feature extraction phase is dependent on the hash function learning stage, which helps in learning better hash functions.

  • Compared with deep neural network-based hashing methods, the proposed method has fewer hyperparameters, faster training speed and easier theoretical analysis.

  • The proposed method learns shorter binary code representations to achieve effective and efficient image retrieval.

The rest of this paper is organized as follows. Section 2 includes some related works. Section 3 proposes our deep forest hashing method. Section 4 gives an analysis about deep forest hashing. Section 5 reports the experimental results. Section 6 concludes this paper.

Section snippets

Tree-based and forest-based hashing

FastH [16], [17], the first attempt to use decision trees as hash functions, adopts a two-step learning strategy, binary code inference and learning boosted trees as hash functions to quickly learn hash codes based on supervised labels. ForestHash [36] embeds tiny convolutional neural networks (CNNs) into shallow random forests in which random trees act as hash functions by assigning the value of “1” to the visited tree leaf and “0” otherwise. Scalable forest hashing [37] utilizes multiple tree

Deep forest hashing

We believe that to address a complicated image retrieval task using hash codes, learned deep models are likely important and inevitable. However, deep neural networks for hash codes require a considerable amount of data to train. The deep neural networks used for hashing learning require expensive and powerful computational facilities during the training process and have too many hyperparameters. In addition, the internal structure is similar to a black box, which is not interpretable. Deep

Analysis

In this section, we provide an analysis of the time complexity of the DFH algorithm and the components of the multi-grained feature extraction, initial binary code inference and deep forest hash function learning in the DFH model.

Dataset and configuration

We conduct a series of experiments to evaluate DFH in image retrieval tasks with three benchmarks: MNIST1, CIFAR-102 and NUS-WIDE3. The MNIST dataset consists of 28 × 28 grayscale handwritten digit images of 0 to 9 with 7000 examples per class and a total of 70,000 images. The CIFAR-10 dataset consists of 60,000 32 × 32 color images in 10 categories and 6000 images per

Conclusion

In this work, we propose deep forest hashing (DFH) to learn shorter binary code representations to realize effective and efficient image retrieval. It is a two-stage hashing method by initial binary code inference and deep forest hash function learning. We consider three types of similarity metrics in our hash learning formulation to preserve the semantic similarity and manifold similarity of the data points in the Hamming space. We utilize a supervised manifold method to compute the manifold

Acknowledgments

The authors would like to thank the anonymous reviewers for their help. This work was supported by the National Natural Science Foundation of China (Grant no. 61672120) and Chongqing Postgraduate Research and Innovation Project (Grant no. CYS17224).

Meng Zhou, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. His current research interests mainly include hashing learning, image processing and ensemble learning.

References (42)

  • J. Ji et al.

    Super-bit locality-sensitive hashing

    NIPS

    (2012)
  • Y. Mu et al.

    Non-metric locality-sensitive hashing

    AAAI

    (2010)
  • B. Kulis et al.

    Kernelized locality-sensitive hashing for scalable image search

    ICCV

    (2009)
  • A. Shrivastava et al.

    Densifying one permutation hashing via rotation for fast near neighbor search

    ICML

    (2014)
  • Y. Weiss et al.

    Spectral hashing

    NIPS

    (2009)
  • W. Liu et al.

    Discrete graph hashing

    NIPS

    (2014)
  • Q.Y. Jiang et al.

    Scalable graph hashing with feature transformation

    IJCAI

    (2015)
  • W. Liu et al.

    Supervised hashing with kernels

    CVPR

    (2012)
  • G. Lin et al.

    A general two-step approach to learning-based hashing

    CVPR

    (2013)
  • G. Lin et al.

    Fast supervised hashing with decision trees for high-dimensional data

    CVPR

    (2014)
  • G. Lin et al.

    Supervised hashing using graph cuts and boosted decision trees

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • Cited by (49)

    • Label correlations-based multi-label feature selection with label enhancement

      2024, Engineering Applications of Artificial Intelligence
    • BH2I-GAN: Bidirectional Hash_code-to-Image Translation using Multi-Generative Multi-Adversarial Nets

      2023, Pattern Recognition
      Citation Excerpt :

      DHCQ [10] uses classification and quantization errors to learn feature representations, hashing functions and classifiers. DFH [11] incorporates three similarity metrics to preserve semantic similarity, and uses different sliding windows to extract multi-grained features. Then, DIHN [12] reaches higher accuracy and significantly decreases training time in incremental hashing manner.

    View all citing articles on Scopus

    Meng Zhou, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. His current research interests mainly include hashing learning, image processing and ensemble learning.

    Xianhua Zeng, is currently a professor with the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China. He received his Ph.D. degree in Computer software and theory from Beijing Jiaotong University in 2009. And he was a Visiting Scholar in the University of Technology, Sydney, from Aug. 2013 to Aug. 2014. His main research interests include medical image processing, machine learning and data mining.

    Aozhu Chen, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. Her current research interests mainly include manifold learning and image color perception.

    View full text