Deep forest hashing for image retrieval
Introduction
The rapid development of information technology has substantially increased the data accumulated in various fields, and the age of big data has arrived. Determining how to quickly and accurately search the data that users need in massive data has become an urgent problem to be solved. Efficient retrieval on large-scale data has become a popular research topic in academia and industry. Hashing learning has attracted considerable attention in recent years due to its excellent performance in processing high-dimensional data. Our paper focuses on the deep hashing model using deep forests as hash functions with considerably fewer hyperparameters, competitive performance and convincing theoretic analysis.
Hashing learning aims to transform every data item into a low-dimensional representation, i.e., equivalently, a short code consisting of a sequence of bits referred to as hash code [1], [2]. Hashing methods can be generally divided into two main categories: data-independent hashing and data-dependent hashing [3], [4], [5]. The main difference is that hash functions used by data-independent methods are either manually designed or randomly generated, while those in data-dependent methods are automatically learned from the data. The most representative data-independent hashing methods for image retrieval tasks are locality-sensitive hashing (LSH) [6] and its variants superbit LSH [7], nonmetric LSH [8], kernelized LSH [9] and LSH with faster computation of the hash functions [10]. LSH uses a set of randomly generated hyperplanes that are sampled from a Gaussian distribution, and then projects the original high-dimensional data onto the hyperplane, and finally thresholds the projection results as the output of the hash functions. The emergence of LSH greatly improves the efficiency of image retrieval and provides a new perspective for solving large-scale image retrieval. However, the hash functions in data-independent methods represented by LSH are randomly generated or manually specified, irrespective of the distribution of the original data, so the accuracy of the algorithm increases slowly when the number of bits increases. Therefore, it is difficult to obtain a stable retrieval result in practical applications. Different from data-independent methods, the hash function for each code in data-dependent methods is learned from the data, and has practical implications.
Data-dependent methods seem to be the trend in current research and applications. Spectral hashing (SH) [11], a classic data-dependent hash method, uses a complete dataset from the training set to construct a complete graph with the similarity (Gaussian similarity) between all data samples as the weights of the edges. Each hash function can be regarded as a cut of the graph. Each corresponding cut satisfies the condition that the sum of weights corresponding to the cut edge is the smallest, and the whole complete graph is as evenly divided into two parts as possible. This problem can be transformed into the classic normalized cut problem in graph theory, and the hash codes are solved by signing the eigenvectors corresponding to the minimum eigenvalues of the Laplacian matrix. Data-dependent method can be further categorized as supervised and unsupervised hashing according to the label information. Unsupervised hashing methods attempt to preserve the similarity in the original feature space and supervised hashing methods aim to preserve the semantic similarity. Examples of unsupervised methods include iterative quantization (ITQ) [3], isotropic hashing (IsoHash) [4], discrete graph hashing (DGH) [12], and scalable graph hashing (SGH) [13]. Examples of supervised methods include supervised hashing with kernels (KSH) [14], two step hashing (TSH) [15], fast supervised hashing (FastH) [16], [17], supervised discrete hashing (SDH) [18] and its fast version, fast supervised discrete hashing (FSDH) [19], supervised discrete discriminant hashing (SDDH) [20], ranking-based supervised hashing (RSH) [21] and discrete semantic ranking hashing (DSeRH) [22]. Quantization-based hashing (QBH) [23] is a general framework applied to both unsupervised and supervised hashing.
In recent years, deep learning has achieved outstanding results in various fields, especially in speech recognition and computer vision. Semantic hashing [24] is the first work using deep learning; afterwards, many scholars considered the combination of a hashing method and deep learning [25], [26], such as semi-supervised deep learning hashing (DLH) [27], network in network hashing (NINH) [28], convolutional neural network hashing (CNNH) [29], similarity-adaptive deep hashing (SADH) [30], deep semantic ranking-based hashing (DSRH) [31], deep hashing based on classification and quantization errors (DHCQ) [32], deep supervised discrete hashing (DSDH)[33] and deep pairwise-supervised hashing (DPSH) [34]. DPSH utilizes the deep neural network to perform simultaneous feature learning and hash code learning for applications with pairwise labels. Deep learning requires that to address complicated learning tasks, it is likely that learning models must go deep [35]. Currently, the most popular deep model is the deep neural network. Although the deep neural network is powerful, there are still some shortcomings. Deep neural networks need large-scale data for training. The training process usually requires powerful computing devices. There are too many hyperparameters to learn in deep neural networks. Finally, the performance depends heavily on parameter tuning. Considering the new deep model, gcForest [35] and the shallow level decision tree used in existing tree-based and forest-based hashing methods, we propose a deep hashing model using deep forests as hash functions. To the best of our knowledge, this model represents a new deep hashing method that is distinguished from deep hashing models based on deep neural networks. The following contributions should be highlighted:
- •
The proposed method considers three types of similarity metrics to preserve the semantic similarity and manifold similarity among the data points in the Hamming space.
- •
Different sized sliding windows are used to extract multi-grained features from raw data. And the feature extraction phase is dependent on the hash function learning stage, which helps in learning better hash functions.
- •
Compared with deep neural network-based hashing methods, the proposed method has fewer hyperparameters, faster training speed and easier theoretical analysis.
- •
The proposed method learns shorter binary code representations to achieve effective and efficient image retrieval.
The rest of this paper is organized as follows. Section 2 includes some related works. Section 3 proposes our deep forest hashing method. Section 4 gives an analysis about deep forest hashing. Section 5 reports the experimental results. Section 6 concludes this paper.
Section snippets
Tree-based and forest-based hashing
FastH [16], [17], the first attempt to use decision trees as hash functions, adopts a two-step learning strategy, binary code inference and learning boosted trees as hash functions to quickly learn hash codes based on supervised labels. ForestHash [36] embeds tiny convolutional neural networks (CNNs) into shallow random forests in which random trees act as hash functions by assigning the value of “1” to the visited tree leaf and “0” otherwise. Scalable forest hashing [37] utilizes multiple tree
Deep forest hashing
We believe that to address a complicated image retrieval task using hash codes, learned deep models are likely important and inevitable. However, deep neural networks for hash codes require a considerable amount of data to train. The deep neural networks used for hashing learning require expensive and powerful computational facilities during the training process and have too many hyperparameters. In addition, the internal structure is similar to a black box, which is not interpretable. Deep
Analysis
In this section, we provide an analysis of the time complexity of the DFH algorithm and the components of the multi-grained feature extraction, initial binary code inference and deep forest hash function learning in the DFH model.
Dataset and configuration
We conduct a series of experiments to evaluate DFH in image retrieval tasks with three benchmarks: MNIST1, CIFAR-102 and NUS-WIDE3. The MNIST dataset consists of 28 × 28 grayscale handwritten digit images of 0 to 9 with 7000 examples per class and a total of 70,000 images. The CIFAR-10 dataset consists of 60,000 32 × 32 color images in 10 categories and 6000 images per
Conclusion
In this work, we propose deep forest hashing (DFH) to learn shorter binary code representations to realize effective and efficient image retrieval. It is a two-stage hashing method by initial binary code inference and deep forest hash function learning. We consider three types of similarity metrics in our hash learning formulation to preserve the semantic similarity and manifold similarity of the data points in the Hamming space. We utilize a supervised manifold method to compute the manifold
Acknowledgments
The authors would like to thank the anonymous reviewers for their help. This work was supported by the National Natural Science Foundation of China (Grant no. 61672120) and Chongqing Postgraduate Research and Innovation Project (Grant no. CYS17224).
Meng Zhou, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. His current research interests mainly include hashing learning, image processing and ensemble learning.
References (42)
- et al.
Supervised discrete discriminant hashing for image retrieval
Pattern Recognit.
(2018) - et al.
Quantization-based hashing: a general framework for scalable image and video retrieval
Pattern Recognit.
(2018) - et al.
Semantic hashing
Int. J. Approx. Reason.
(2009) - et al.
Supervised deep hashing for scalable face image retrieval
Pattern Recognit.
(2018) - et al.
A survey on learning to hash
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
Hashing algorithms for large-scale learning
NIPS
(2011) - et al.
Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
(2013) - et al.
Isotropic hashing
NIPS
(2012) - et al.
Hashing with graphs
ICML
(2011) - et al.
Similarity search in high dimensions via hashing
VLDB
(1999)
Super-bit locality-sensitive hashing
NIPS
Non-metric locality-sensitive hashing
AAAI
Kernelized locality-sensitive hashing for scalable image search
ICCV
Densifying one permutation hashing via rotation for fast near neighbor search
ICML
Spectral hashing
NIPS
Discrete graph hashing
NIPS
Scalable graph hashing with feature transformation
IJCAI
Supervised hashing with kernels
CVPR
A general two-step approach to learning-based hashing
CVPR
Fast supervised hashing with decision trees for high-dimensional data
CVPR
Supervised hashing using graph cuts and boosted decision trees
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (49)
MLMQ-IR: Multi-label multi-query image retrieval based on the variance of Hamming distance
2024, Knowledge-Based SystemsLabel correlations-based multi-label feature selection with label enhancement
2024, Engineering Applications of Artificial IntelligenceDeep attention sampling hashing for efficient image retrieval
2023, NeurocomputingA hybrid deep forest-based method for predicting synergistic drug combinations
2023, Cell Reports MethodsBH2I-GAN: Bidirectional Hash_code-to-Image Translation using Multi-Generative Multi-Adversarial Nets
2023, Pattern RecognitionCitation Excerpt :DHCQ [10] uses classification and quantization errors to learn feature representations, hashing functions and classifiers. DFH [11] incorporates three similarity metrics to preserve semantic similarity, and uses different sliding windows to extract multi-grained features. Then, DIHN [12] reaches higher accuracy and significantly decreases training time in incremental hashing manner.
Meng Zhou, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. His current research interests mainly include hashing learning, image processing and ensemble learning.
Xianhua Zeng, is currently a professor with the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China. He received his Ph.D. degree in Computer software and theory from Beijing Jiaotong University in 2009. And he was a Visiting Scholar in the University of Technology, Sydney, from Aug. 2013 to Aug. 2014. His main research interests include medical image processing, machine learning and data mining.
Aozhu Chen, is currently a master degree candidate in the Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing. Her current research interests mainly include manifold learning and image color perception.