Skip to main content
Log in

Double-phase locality-sensitive hashing of neighborhood development for multi-relational data

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Multi-relational (MR) data refer to the objects that involve multi-associated tables or a relational database, and they are widely used in diverse applications. In spite of rich achievements in concrete classification or regression tasks, a neighborhood development algorithm customized to MR data is still missing. The reason lies in the fact that MR data are high-dimensional and highly structured. To address these two difficulties, this paper presents a double-phase locality-sensitive hashing (DPLSH) algorithm to develop neighborhood for MR data. DPLSH consists of offline and online hashing schemas to draw and summarize local closeness information from each involved table. Based on the closeness information, three criteria of neighborhood development are proposed. DPLSH is encoded with parameterization heuristics to make the algorithm data adaptive and less costly. Extensive experiments indicate that for MR data, the quality of neighborhoods produced by DPLSH is better than its peers; besides, in common data environment, DPLSH also exhibits the competitive behaviors with the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of 47th annual IEEE symposium on foundations of computer science, pp 459–468

  • Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of 15th international conference on machine learning, pp 91–99

  • Ceci M, Appice A, Loglisci C, Caruso C, Fumarola F, Malerba D (2009) Novelty detection from evolving complex data streams with time windows. In: ISMIS’09: Proceedings of the 18th international symposium on foundations of intelligent systems, lecture notes in artificial intelligence, vol 5722. Springer, Berlin, Heidelberg

  • Coble J, Cook DJ, Holder LB (2006) Structure discovery in sequentially-connected data streams. Int J Artif Intell Tools 15(6):917–944

    Article  Google Scholar 

  • Dolsak B, Bratko I, Jezernik A (1994) Finite element mesh design: an engineering domain for ILP application. In: Proceedings of the 4th international workshop on inductive logic programming, GMD-studien 237, pp 305–320

  • Domeniconi C, Peng J, Gunopulos (2001) An adaptive metric machine for pattern classification. Adv Neural Inf Process Syst 13:458–464

  • Driessens K, Ramon J, Blockeel H (2001) Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In: Raedt LD, Flach P (eds) Proceedings of the 13th European conference on machine learning, lecture notes in artificial intelligence, vol 2167, pp 97–108

  • Džeroski S (2003) Multi-relational data mining: an introduction. ACM SIGKDD Explor Newsl 5(1):1–16

    Article  Google Scholar 

  • Džeroski S (2010) Relational data mining. Springer, Washington, DC

    Google Scholar 

  • Flach P, Lachiche N (1999) 1BC: a first-order Bayesian classifier, ILP-99. LNAI 1634, pp 92–103

  • Friedman JH (1994) Flexible metric nearest neighbor classification. Technique report. Department of Statistics, Stanford University

  • Georgescu B, Shimshoni IL, Mee P (2003) Mean shift based clustering in high dimensions: a texture classification example. In: Proceedings of international conference on computer vision, pp 456–463

  • Hailperin T (1984) Probability logic. Notre Dame J Form Logic 25(3):198–212

    Article  MATH  MathSciNet  Google Scholar 

  • Hastie T, tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6):607–615

    Article  Google Scholar 

  • Hou W, Yang B, Wu C, Zhou Z (2010) RedTrees: a relational decision tree algorithm in streams. Expert Syst Appl 37(9):6265–6269

    Article  Google Scholar 

  • Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of symposium on theory of computing, pp 604–613

  • Jajuga K (2006) \(L_{1}\) norm based fuzzy clustering. Fuzzy Sets Syst 39:43–50

    Article  MathSciNet  Google Scholar 

  • Kietz JU, Wrobel S (1992) Controlling the complexity of learning in logic through syntactic and task-oriented models. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 335–359

    Google Scholar 

  • Kirsten M, Wrobel S, Horvath T (2001) Distance based approaches to relational learning and clustering[M]//Relational data mining. Springer, Berlin, Heidelberg

    Google Scholar 

  • Kohonen T (2001) Self organization maps. Springer, New York

    Book  Google Scholar 

  • Kyburg HE (1970) Probability and inductive logic. New York, Macmillan

  • Mayur D, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262

  • Muggleton S, Bain M, Hayes-Michie J, Michie D (1989) An experimental comparison of human and machine learning formalisms. In: Proceedings of 6th international workshop on machine learning. Morgan Kaufmann, pp 113–118

  • Muggleton S, Buntine WL (1992) Machine invention of first order predicates by inverting resolution. In: Muggleton S (ed) Inductive logic programming. Academic Press, London, pp 261–280

    Google Scholar 

  • Muggleton S (1992) Inductive logic programming. Academic Press, London

    MATH  Google Scholar 

  • Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. MCS, LNCS 3541:176–185

    Google Scholar 

  • Page D, Craven M (2003) Biological applications of multi-relational data mining. ACM SIGKDD Explor Newsl 5(1):69–79

    Article  Google Scholar 

  • Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the 17th annual ACM-SIAM symposium on discrete algorithm, pp 1186–1195

  • Raedt LD (1992) Interactive theory revision: an inductive logic programming approach. Academic Press, London

    Google Scholar 

  • Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  MATH  Google Scholar 

  • Seidl T, Kriegel HP (1998) Optimal Multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD international conference on management of data, pp 154–165

  • Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge

    Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Woznica A, Kalousis A, Hilario M (2004) Kernel-based distances for relational learning. The tenth ACM SIGKDD international conference on knowledge discovery and data mining

  • UCI (2014) Machine Learning Repository. http://www.uncc.edu/knowledgediscovery. Accessed 23 Mar 2014

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61105129 and 61100167; Jiangsu Provincial Natural Science Foundation under Grant No. BK20131130.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Ling.

Additional information

Communicated by Y. Jin.

Appendices

Appendix A

For dataset \(\left\{ {x_1,\ldots ,x_N } \right\} ,\,x_i \in \mathfrak {R}^n\), OCSVM constructs an objective function that takes the value of ‘+1’ in a small region capturing most of the data points and ‘\(-1\)’ elsewhere. Denote \(\phi \) as the map from the input space to the feature space, and \(G\) as the penalty coefficient. Parameters \(w\) and \(b\) of the hyperplane separating the data from the origin are determined by solving the following quadratic programming problem (Schölkopf et al. 2001):

$$\begin{aligned}&\min \frac{1}{2}\vert \vert w\vert \vert ^2+b+G\sum \limits _{i=1}^N \xi _i\\&\text{ s.t }.\,\,\,\,\,\,w\cdot \phi (x_i )+b\ge -\xi _i ,\,\,\,\xi _i \ge 0\,,\,\,i=1,2\ldots , N \nonumber \end{aligned}$$
(11)

Denote \(\gamma _{i }\ge \) 0 as the Lagrange multipliers. Rewrite (11) into the Lagrange form and use the Kernel trick \(k(x_{i}, x_{j})=\langle \phi (x_{i}), \phi (x_{j})\rangle \), and there is:

$$\begin{aligned}&\min \quad \sum \limits _{i=1}^N \sum \limits _{j=1}^N \gamma _i \gamma _j k(x_i ,x_j )-\sum \limits _{i=1}^N \gamma _i k(x_i ,x_i )\nonumber \\&\text{ s.t. }\,\,\,\,\sum \limits _{i=1}^N \gamma _i =1,\,\,\,0\le \gamma _i \le G \end{aligned}$$
(12)

Points with \(\gamma _{i }\ne \) 0 are support vectors (SVs). Denote \(x_{s}\) as one SV; then according to \(w \cdot \phi (x)+b = 0\), \(b\) is computed as:

$$\begin{aligned} b=-w\cdot \phi (x_s )&=-\sum \limits _{i=1}^N \gamma _i \langle \phi (x_i ),\phi (x_s )\rangle \nonumber \\ {}&=-\sum \limits _{i=1}^N \gamma _i k(x_i ,x_s ) \end{aligned}$$
(13)

The quadratic optimization of OCSVM consumes huge cost. We give another version of OCSVM that employs the square of the slack variable, named as SSOCSVM, to transfer the quadratic optimization problem to solve a system of linear equations. SSOCSVM is formulated as:

$$\begin{aligned}&\min \quad \frac{1}{2}\vert \vert w\vert \vert ^2+b+\frac{G}{2}\sum \limits _{i=1}^N \xi _i^2\nonumber \\&\text{ s.t. }\,\,\,\,w\cdot \phi (x_i )+b=-\xi _i ,\,\,\,i=1, 2,\ldots ,N \end{aligned}$$
(14)

Therein the employment of \(\xi _{i}^{2}\) allows the removing of the constraint \(\xi _{i} \ge \) 0. Write the Lagrange function of (14):

$$\begin{aligned} L={1 \over 2}\vert \vert w\vert \vert ^2+b+{G \over 2}\sum \limits _{i=1}^N \xi _i^2 -\sum \limits _{i=1}^N \beta _i (w\cdot \phi (x_i )+b+\xi _i ) \end{aligned}$$
(15)

Therein \(\beta _{i}\) are the Lagrange multipliers. The conditions for optimality are:

$$\begin{aligned} {{\partial L} \over {\partial w}}=w-\sum \limits _{i=1}^N \beta _i \phi (x_i )=0,\,\,\,\text{ then }\,\,w=\sum \limits _{i=1}^N \beta _i \phi (x_i ) \end{aligned}$$
(16)
$$\begin{aligned} {{\partial L} \over {\partial \xi _i }}=G\xi _i -\beta _i =0,\,\,\,\text{ then }\,\,\xi _i ={{\beta _i } \over G} \end{aligned}$$
(17)
$$\begin{aligned} {{\partial L} \over {\partial b}}=-1+\sum \limits _{i=1}^N \beta _i =0,\,\,\text{ then } \sum \limits _{i=1}^N \beta _i =1 \end{aligned}$$
(18)
$$\begin{aligned} {{\partial L} \over {\partial \beta _i }}=w\phi (x_i )-b+\xi _i =0,\,\,\text{ then }\,\,b=w\phi (x_i )+\xi _i \end{aligned}$$
(19)

which can be reformulated as the solution to the following set of linear equations:

$$\begin{aligned} \left( \begin{array}{llll} I &{}\quad {\overrightarrow{0} } &{}\quad 0 &{}\quad {-\phi ^T} \\ {\overrightarrow{0} } &{}\quad {GI} &{}\quad 0 &{}\quad {-I} \\ {\overrightarrow{0} } &{}\quad {\overrightarrow{0} } &{}\quad 0 &{}\quad I \\ \phi &{}\quad I &{}\quad {-1} &{}\quad {\overrightarrow{0} } \\ \end{array} \right) \left( \begin{array}{l} w \\ \xi \\ b \\ \beta \\ \end{array} \right) =\left( \begin{array}{l} 0 \\ 0 \\ 1 \\ 0 \\ \end{array} \right) \end{aligned}$$
(20)

with \(\phi = ( {\phi ( {x_1 }),\ldots ,\phi ( {x_N })})^T,\xi =( {\xi _1 ,\ldots ,\xi _N })^T,\beta =( {\beta _1 ,\ldots ,\beta _N })^T\) . \(I\) is the identity matrix of size \(N\times N\). \(\overrightarrow{0} \) is a zero matrix of size \(N\times N\). \(E\) is the identity matrix. The solution of SSOCSVM is obtained by solving the following linear equation:

$$\begin{aligned} \left( \begin{array}{ll} 0 &{}\quad I \\ {-I^T} &{}\quad {\phi \phi ^T+G^{-1}E} \\ \end{array} \right) \left( \begin{array}{l} b \\ \beta \\ \end{array} \right) =\left( \begin{array}{l} 1 \\ {\overrightarrow{0} } \\ \end{array} \right) \end{aligned}$$
(21)

Denote (\(K)_{ij }\)= (\(\phi \phi ^{T})_{ij}=k(x_{i}\), \(x_{j})\), thus (21) is identical to (1) mentioned in Sect. 4.3. Note that SSOCSVM runs on the instances of main table, so \({\phi }= ({\phi }(x_{(0)1}),\ldots , {\phi }(x_{(0)N}))^{T}\) and \(K_{ij}=k\)(\(\phi \)(\(x_{(0)i})\), \(\phi \)(\(x_{(0)j}))\).

We select those data whose support values are large as SVs. In the single-table data environment, \(N_\mathrm{SV}\) (the number of SVs) can be predefined problem-dependently. For MR data, here we set \(N_\mathrm{SV}\) as \(C\), the number of the selected dimensions, so as to facilitate the consequent online hashing process.

Appendix B

Proof of Statement 1. We first investigate OCSVM and then come to SSOCSVM.

For OCSVM, the convex quadratic optimization leads to the sparseness of the solution of (11). It means that only a few points with nonzero support values serve as SVs. Points with nonzero support values are SVs, so we reach the conclusion that SVs make contribution to the hyperplane formulation and reveal the discriminative information of dataset.

For SSOCSVM, the sparseness of the solution is missing since it solves a system of linear equations. That makes most data equipped with nonzero support values. But according to (16) and (19), those data with large support values determine the direction \(w\) and the offset \(b\) of the hyperplane. And those data points with small support values make lesser contribution to model formulation. So the Statement 1 holds.

Proof of Statement 2. According to Statement 1, the points corresponding to the top section of the sorted support value list serve as SVs. And they are located around the boundaries to tell the discrimination information of dataset. Figure 7 shows an example of 2-dimension points. Denote point A as a SV. Obviously its ordinate is the upper bound of all data’s ordinates. For another SV point B, its abscissa is quite close to the upper bound of all data’s abscissas. That proofs the Statement 2.

Fig. 7
figure 7

The illustration of Statement 2

Proof of Statement 3. Denote the top \(C\) SV points as: \({SV}_{(1)}, \ldots {SV}_{(C)}\). Without loss of generality, we assume that the \({SV}_{(j),d}\) (coordinate of \({SV}_{(j) }\) in the \(d\)th dimension) serves as the upper bound (or lower bound) of coordinates of other data in that dimension. When we use the resulted \(v_{d}\) to do the partition in the \(d\)th dimension, we actually plot a circle that takes \(q\) as the center and \(v_{d}\) as the radius. That leads to the scenario shown in Fig. 8. According to the specification (2), \(v_{d} = (q_{d}+{SV}_{(j),d})/2\), which indicates such a circle is located within the boundaries. Thus, the cells produced by the inequality: \(x_{i,j}<v_{j}\) will include the same-cluster members of \(q\). Consequently, the resulted neighborhood is well expected to cover the same-cluster members of the query. That proofs the Statement 3.

Fig. 8
figure 8

The illustration of Statement 3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, P., Rong, X., Dong, Y. et al. Double-phase locality-sensitive hashing of neighborhood development for multi-relational data. Soft Comput 19, 1553–1565 (2015). https://doi.org/10.1007/s00500-014-1343-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1343-4

Keywords

Navigation