In this paper, we propose a new philosophy different from that of the well-known Locality-Sensitive Hashing (LSH): if two data points are close, we wish that the probability for them to fall into the same hash buckets is high; whereas if two data points are far away, we do not care the probability of them falling into the same hash buckets. Our new philosophy is a relaxation of the LSH requirement, by ignoring the side effects of placing differently labeled data points into the same hash bucket. Based on such relaxation, a new hashing method, namely the Laplacian Hashing, is derived, which is natural to incorporate any kernel functions and “similar” / “dissimilar” weakly supervised information. Another contribution of this paper is that, it is the first time that a fast hashing method is applied for the midway processing in a cascaded face detection structure. Experimental results show that, our method is on average not worse than the state of the arts in terms of accuracy, but much faster and thus can handle much larger training datasets within reasonable computation time.

This research work is funded by Natural Science Foundation of China (Grant No.11176016, 60872117), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20123108110014).
Here we outline the proof to show that, even when d = 2, the problem of balanced graph partitioning is a sub-problem of minimizing Eq. (2) subject to Eq. (3).
Equation (3) is a binarization step, thresholding the largest and smallest halves of the n scalars a T x i + b (1 ≤ i ≤ n) to be 1 and −1 respectively. This leads to the fact that, the value of sgn(a T x i + b) is just determined by the relative rank of a T x i + b among a T x 1 + b, a T x 2 + b…, a T x n + b: if a T x i + b ranks among the largest n/2 data points, sgn(a T x i + b) = 1, otherwise sgn(a T x i + b) = −1.
Obviously, each value assignment of the n binary hash codes sgn(a T x i + b) (1 ≤ i ≤ n) uniquely corresponds to an equal partition (P1,P2). Here an equal partition (P1,P2) is defined as P1∩P2 = ∅, P1∪P2 = {1,2,…n} and |P1| = |P2|. It is easy to find a relative rank a T x σ1 + b < a T x σ2 + b < … < a T x σn + b, such that P1 = {σ 1,σ 2…σ n/2}, P2 = {σ n/2+1,σ n/2+2…σ n }.
We now verify that, the support vector a always exists to generate any given relative rank a T x σ1 + b < a T x σ2 + b < … < a T x σn + b. For any two data points x i and x j , the support vector a t perpendicular to the line connecting x i and x j is the thresholding support vector such that, the projection of x i and x j onto a t is identical and a t T x i + b = a t T x j + b. All support vectors a within the half-plane at the clockwise direction of a t satisfy a T x i + b > a T x j + b; and vice versa. Thus by adjusting the angle of the support vector, we can arbitrarily choose a T x i + b > a T x j + b or a T x i + b < a T x j + b for any data point pairs x i and x j .
Therefore we see that, the support vector a always exists to generate any given value assignment of the n binary hash codes, or equivalently, any given equal partition (P1,P2). On the other side, whatever the support vector a is, only sgn(a T x i + b) (1 ≤ i ≤ n) finally affects the objective function E(a) in Eq. (2), so all support vectors leading to a specific value assignment of sgn(a T x i + b) (1 ≤ i ≤ n) can be regarded as being “represented” by such value assignment. Based on these two sides, we can conclude that, working on the vector a in Eq. (2) as the variable is equivalent to working on the n binary hash codes sgn(a T x i + b) (1 ≤ i ≤ n) (i.e. n binary bits) as variable.
Consider an undirected graph whose vertices are the n data points and the weight between point i and j is w ij . The equal partition (P1,P2) partitions the graph into two equal parts, P1 and P2. Now E(a) becomes the weight sum of the edges cut by the partition: E(a) = cut(P1,P2). Hence minimizing Eq. (2) subject to Eq. (3) is equivalent to minimizing cut(P1,P2) subject to |P1| = |P2| which is known to be NP hard [24].
Huang, Y., Guan, Y. Laplacian hashing for fast large-scale image retrieval and its applications for midway processing in a cascaded face detection structure. Multimed Tools Appl 75, 16315–16332 (2016). https://doi.org/10.1007/s11042-015-2932-7
DOI: https://doi.org/10.1007/s11042-015-2932-7