Double-phase locality-sensitive hashing of neighborhood development for multi-relational data

Ling, Ping; Rong, Xiangsheng; Dong, Yongquan; Hao, Guosheng

doi:10.1007/s00500-014-1343-4

Double-phase locality-sensitive hashing of neighborhood development for multi-relational data

Focus
Published: 28 June 2014

Volume 19, pages 1553–1565, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Ping Ling¹,
Xiangsheng Rong²,
Yongquan Dong¹ &
…
Guosheng Hao¹

186 Accesses
1 Citation
Explore all metrics

Abstract

Multi-relational (MR) data refer to the objects that involve multi-associated tables or a relational database, and they are widely used in diverse applications. In spite of rich achievements in concrete classification or regression tasks, a neighborhood development algorithm customized to MR data is still missing. The reason lies in the fact that MR data are high-dimensional and highly structured. To address these two difficulties, this paper presents a double-phase locality-sensitive hashing (DPLSH) algorithm to develop neighborhood for MR data. DPLSH consists of offline and online hashing schemas to draw and summarize local closeness information from each involved table. Based on the closeness information, three criteria of neighborhood development are proposed. DPLSH is encoded with parameterization heuristics to make the algorithm data adaptive and less costly. Extensive experiments indicate that for MR data, the quality of neighborhoods produced by DPLSH is better than its peers; besides, in common data environment, DPLSH also exhibits the competitive behaviors with the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DASH: Data Aware Locality Sensitive Hashing

A robust method based on locality sensitive hashing for K-nearest neighbors searching

Article 12 March 2022

Efficient locality-sensitive hashing over high-dimensional streaming data

Article 17 September 2020

References

Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of 47th annual IEEE symposium on foundations of computer science, pp 459–468
Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of 15th international conference on machine learning, pp 91–99
Ceci M, Appice A, Loglisci C, Caruso C, Fumarola F, Malerba D (2009) Novelty detection from evolving complex data streams with time windows. In: ISMIS’09: Proceedings of the 18th international symposium on foundations of intelligent systems, lecture notes in artificial intelligence, vol 5722. Springer, Berlin, Heidelberg
Coble J, Cook DJ, Holder LB (2006) Structure discovery in sequentially-connected data streams. Int J Artif Intell Tools 15(6):917–944
Article Google Scholar
Dolsak B, Bratko I, Jezernik A (1994) Finite element mesh design: an engineering domain for ILP application. In: Proceedings of the 4th international workshop on inductive logic programming, GMD-studien 237, pp 305–320
Domeniconi C, Peng J, Gunopulos (2001) An adaptive metric machine for pattern classification. Adv Neural Inf Process Syst 13:458–464
Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: Raedt LD, Flach P (eds) Proceedings of the 13th European conference on machine learning, lecture notes in artificial intelligence, vol 2167, pp 97–108
Džeroski S (2003) Multi-relational data mining: an introduction. ACM SIGKDD Explor Newsl 5(1):1–16
Article Google Scholar
Džeroski S (2010) Relational data mining. Springer, Washington, DC
Google Scholar
Flach P, Lachiche N (1999) 1BC: a first-order Bayesian classifier, ILP-99. LNAI 1634, pp 92–103
Friedman JH (1994) Flexible metric nearest neighbor classification. Technique report. Department of Statistics, Stanford University
Georgescu B, Shimshoni IL, Mee P (2003) Mean shift based clustering in high dimensions: a texture classification example. In: Proceedings of international conference on computer vision, pp 456–463
Hailperin T (1984) Probability logic. Notre Dame J Form Logic 25(3):198–212
Article MATH MathSciNet Google Scholar
Hastie T, tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6):607–615
Article Google Scholar
Hou W, Yang B, Wu C, Zhou Z (2010) RedTrees: a relational decision tree algorithm in streams. Expert Syst Appl 37(9):6265–6269
Article Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of symposium on theory of computing, pp 604–613
Jajuga K (2006) $L_{1}$ norm based fuzzy clustering. Fuzzy Sets Syst 39:43–50
Article MathSciNet Google Scholar
Kietz JU, Wrobel S (1992) Controlling the complexity of learning in logic through syntactic and task-oriented models. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 335–359
Google Scholar
Kirsten M, Wrobel S, Horvath T (2001) Distance based approaches to relational learning and clustering[M]//Relational data mining. Springer, Berlin, Heidelberg
Google Scholar
Kohonen T (2001) Self organization maps. Springer, New York
Book Google Scholar
Kyburg HE (1970) Probability and inductive logic. New York, Macmillan
Mayur D, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262
Muggleton S, Bain M, Hayes-Michie J, Michie D (1989) An experimental comparison of human and machine learning formalisms. In: Proceedings of 6th international workshop on machine learning. Morgan Kaufmann, pp 113–118
Muggleton S, Buntine WL (1992) Machine invention of first order predicates by inverting resolution. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 261–280
Google Scholar
Muggleton S (1992) Inductive logic programming. Academic Press, London
MATH Google Scholar
Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. MCS, LNCS 3541:176–185
Google Scholar
Page D, Craven M (2003) Biological applications of multi-relational data mining. ACM SIGKDD Explor Newsl 5(1):69–79
Article Google Scholar
Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the 17th annual ACM-SIAM symposium on discrete algorithm, pp 1186–1195
Raedt LD (1992) Interactive theory revision: an inductive logic programming approach. Academic Press, London
Google Scholar
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article MATH Google Scholar
Seidl T, Kriegel HP (1998) Optimal Multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD international conference on management of data, pp 154–165
Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Woznica A, Kalousis A, Hilario M (2004) Kernel-based distances for relational learning. The tenth ACM SIGKDD international conference on knowledge discovery and data mining
UCI (2014) Machine Learning Repository. http://www.uncc.edu/knowledgediscovery. Accessed 23 Mar 2014

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61105129 and 61100167; Jiangsu Provincial Natural Science Foundation under Grant No. BK20131130.

Author information

Authors and Affiliations

College of Computer Science and Technology, Jiangsu Normal University, Xuzhou , 221116, China
Ping Ling, Yongquan Dong & Guosheng Hao
Department of Training, Air Force Logistics of P. L. A, Xuzhou , 221000, China
Xiangsheng Rong

Authors

Ping Ling
View author publications
You can also search for this author in PubMed Google Scholar
Xiangsheng Rong
View author publications
You can also search for this author in PubMed Google Scholar
Yongquan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Guosheng Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Ling.

Additional information

Communicated by Y. Jin.

Appendices

Appendix A

For dataset $\left\{ {x_1,\ldots ,x_N } \right\} ,\,x_i \in \mathfrak {R}^n$, OCSVM constructs an objective function that takes the value of ‘+1’ in a small region capturing most of the data points and ‘$-1$’ elsewhere. Denote $\phi $ as the map from the input space to the feature space, and $G$ as the penalty coefficient. Parameters $w$ and $b$ of the hyperplane separating the data from the origin are determined by solving the following quadratic programming problem (Schölkopf et al. 2001):

$$\begin{aligned}&\min \frac{1}{2}\vert \vert w\vert \vert ^2+b+G\sum \limits _{i=1}^N \xi _i\\&\text{ s.t }.\,\,\,\,\,\,w\cdot \phi (x_i )+b\ge -\xi _i ,\,\,\,\xi _i \ge 0\,,\,\,i=1,2\ldots , N \nonumber \end{aligned}$$

(11)

Denote $\gamma _{i }\ge $ 0 as the Lagrange multipliers. Rewrite (11) into the Lagrange form and use the Kernel trick $k(x_{i}, x_{j})=\langle \phi (x_{i}), \phi (x_{j})\rangle $, and there is:

$$\begin{aligned}&\min \quad \sum \limits _{i=1}^N \sum \limits _{j=1}^N \gamma _i \gamma _j k(x_i ,x_j )-\sum \limits _{i=1}^N \gamma _i k(x_i ,x_i )\nonumber \\&\text{ s.t. }\,\,\,\,\sum \limits _{i=1}^N \gamma _i =1,\,\,\,0\le \gamma _i \le G \end{aligned}$$

(12)

Points with $\gamma _{i }\ne $ 0 are support vectors (SVs). Denote $x_{s}$ as one SV; then according to $w \cdot \phi (x)+b = 0$, $b$ is computed as:

$$\begin{aligned} b=-w\cdot \phi (x_s )&=-\sum \limits _{i=1}^N \gamma _i \langle \phi (x_i ),\phi (x_s )\rangle \nonumber \\ {}&=-\sum \limits _{i=1}^N \gamma _i k(x_i ,x_s ) \end{aligned}$$

(13)

The quadratic optimization of OCSVM consumes huge cost. We give another version of OCSVM that employs the square of the slack variable, named as SSOCSVM, to transfer the quadratic optimization problem to solve a system of linear equations. SSOCSVM is formulated as:

$$\begin{aligned}&\min \quad \frac{1}{2}\vert \vert w\vert \vert ^2+b+\frac{G}{2}\sum \limits _{i=1}^N \xi _i^2\nonumber \\&\text{ s.t. }\,\,\,\,w\cdot \phi (x_i )+b=-\xi _i ,\,\,\,i=1, 2,\ldots ,N \end{aligned}$$

(14)

Therein the employment of $\xi _{i}^{2}$ allows the removing of the constraint $\xi _{i} \ge $ 0. Write the Lagrange function of (14):

$$\begin{aligned} L={1 \over 2}\vert \vert w\vert \vert ^2+b+{G \over 2}\sum \limits _{i=1}^N \xi _i^2 -\sum \limits _{i=1}^N \beta _i (w\cdot \phi (x_i )+b+\xi _i ) \end{aligned}$$

(15)

Therein $\beta _{i}$ are the Lagrange multipliers. The conditions for optimality are:

$$\begin{aligned} {{\partial L} \over {\partial w}}=w-\sum \limits _{i=1}^N \beta _i \phi (x_i )=0,\,\,\,\text{ then }\,\,w=\sum \limits _{i=1}^N \beta _i \phi (x_i ) \end{aligned}$$

(16)

$$\begin{aligned} {{\partial L} \over {\partial \xi _i }}=G\xi _i -\beta _i =0,\,\,\,\text{ then }\,\,\xi _i ={{\beta _i } \over G} \end{aligned}$$

(17)

$$\begin{aligned} {{\partial L} \over {\partial b}}=-1+\sum \limits _{i=1}^N \beta _i =0,\,\,\text{ then } \sum \limits _{i=1}^N \beta _i =1 \end{aligned}$$

(18)

$$\begin{aligned} {{\partial L} \over {\partial \beta _i }}=w\phi (x_i )-b+\xi _i =0,\,\,\text{ then }\,\,b=w\phi (x_i )+\xi _i \end{aligned}$$

(19)

which can be reformulated as the solution to the following set of linear equations:

$$\begin{aligned} \left( \begin{array}{llll} I &{}\quad {\overrightarrow{0} } &{}\quad 0 &{}\quad {-\phi ^T} \\ {\overrightarrow{0} } &{}\quad {GI} &{}\quad 0 &{}\quad {-I} \\ {\overrightarrow{0} } &{}\quad {\overrightarrow{0} } &{}\quad 0 &{}\quad I \\ \phi &{}\quad I &{}\quad {-1} &{}\quad {\overrightarrow{0} } \\ \end{array} \right) \left( \begin{array}{l} w \\ \xi \\ b \\ \beta \\ \end{array} \right) =\left( \begin{array}{l} 0 \\ 0 \\ 1 \\ 0 \\ \end{array} \right) \end{aligned}$$

(20)

with $\phi = ( {\phi ( {x_1 }),\ldots ,\phi ( {x_N })})^T,\xi =( {\xi _1 ,\ldots ,\xi _N })^T,\beta =( {\beta _1 ,\ldots ,\beta _N })^T$ . $I$ is the identity matrix of size $N\times N$. $\overrightarrow{0} $ is a zero matrix of size $N\times N$. $E$ is the identity matrix. The solution of SSOCSVM is obtained by solving the following linear equation:

$$\begin{aligned} \left( \begin{array}{ll} 0 &{}\quad I \\ {-I^T} &{}\quad {\phi \phi ^T+G^{-1}E} \\ \end{array} \right) \left( \begin{array}{l} b \\ \beta \\ \end{array} \right) =\left( \begin{array}{l} 1 \\ {\overrightarrow{0} } \\ \end{array} \right) \end{aligned}$$

(21)

Denote ($K)_{ij }$= ($\phi \phi ^{T})_{ij}=k(x_{i}$, $x_{j})$, thus (21) is identical to (1) mentioned in Sect. 4.3. Note that SSOCSVM runs on the instances of main table, so ${\phi }= ({\phi }(x_{(0)1}),\ldots , {\phi }(x_{(0)N}))^{T}$ and $K_{ij}=k$($\phi $($x_{(0)i})$, $\phi $($x_{(0)j}))$.

We select those data whose support values are large as SVs. In the single-table data environment, $N_\mathrm{SV}$ (the number of SVs) can be predefined problem-dependently. For MR data, here we set $N_\mathrm{SV}$ as $C$, the number of the selected dimensions, so as to facilitate the consequent online hashing process.

Appendix B

Proof of Statement 1. We first investigate OCSVM and then come to SSOCSVM.

For OCSVM, the convex quadratic optimization leads to the sparseness of the solution of (11). It means that only a few points with nonzero support values serve as SVs. Points with nonzero support values are SVs, so we reach the conclusion that SVs make contribution to the hyperplane formulation and reveal the discriminative information of dataset.

For SSOCSVM, the sparseness of the solution is missing since it solves a system of linear equations. That makes most data equipped with nonzero support values. But according to (16) and (19), those data with large support values determine the direction $w$ and the offset $b$ of the hyperplane. And those data points with small support values make lesser contribution to model formulation. So the Statement 1 holds.

Proof of Statement 2. According to Statement 1, the points corresponding to the top section of the sorted support value list serve as SVs. And they are located around the boundaries to tell the discrimination information of dataset. Figure 7 shows an example of 2-dimension points. Denote point A as a SV. Obviously its ordinate is the upper bound of all data’s ordinates. For another SV point B, its abscissa is quite close to the upper bound of all data’s abscissas. That proofs the Statement 2.

Proof of Statement 3. Denote the top $C$ SV points as: ${SV}_{(1)}, \ldots {SV}_{(C)}$. Without loss of generality, we assume that the ${SV}_{(j),d}$ (coordinate of ${SV}_{(j) }$ in the $d$th dimension) serves as the upper bound (or lower bound) of coordinates of other data in that dimension. When we use the resulted $v_{d}$ to do the partition in the $d$th dimension, we actually plot a circle that takes $q$ as the center and $v_{d}$ as the radius. That leads to the scenario shown in Fig. 8. According to the specification (2), $v_{d} = (q_{d}+{SV}_{(j),d})/2$, which indicates such a circle is located within the boundaries. Thus, the cells produced by the inequality: $x_{i,j}<v_{j}$ will include the same-cluster members of $q$. Consequently, the resulted neighborhood is well expected to cover the same-cluster members of the query. That proofs the Statement 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, P., Rong, X., Dong, Y. et al. Double-phase locality-sensitive hashing of neighborhood development for multi-relational data. Soft Comput 19, 1553–1565 (2015). https://doi.org/10.1007/s00500-014-1343-4

Download citation

Published: 28 June 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s00500-014-1343-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double-phase locality-sensitive hashing of neighborhood development for multi-relational data

Abstract

Access this article

Similar content being viewed by others

DASH: Data Aware Locality Sensitive Hashing

A robust method based on locality sensitive hashing for K-nearest neighbors searching

Efficient locality-sensitive hashing over high-dimensional streaming data

References

Acknowledgments