Skip to main content
Log in

Detecting global hyperparaboloid correlated clusters: a Hough-transform based multicore algorithm

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Correlation clustering detects complex and intricate relationships in high-dimensional data by identifying groups of data points, each characterized by differents correlation among a (sub)set of features. Current correlation clustering methods generally limit themselves to linear correlations only. In this paper, we introduce a method for detecting global non-linear correlated clusters focusing on quadratic relations. We introduce a novel Hough transform for the detection of hyperparaboloids and apply it to the detection of hyperparaboloid correlated clusters in arbitrary high-dimensional data spaces. We further provide a solution for utilizing all available CPU cores on a system. For this we simply split the Hough space among a pre-defined axis into a number of equi-sized partitions. In this paper we show that this most simple way of parallelization already improves the runtime significantly. Non-linear correlation clustering like our method can reveal valuable insights which are not covered by current linear versions. Our empirical results on synthetic and real world data reveal that the proposed method is robust against noise, jitter and irregular densities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. Obtained from http://stats.nba.com/.

References

  1. Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, pp. 119–128 (2006)

  2. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: On exploring complex relationships of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM (2007a)

  3. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 413–418 (2007b)

  4. Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1, 111–127 (2008)

    Article  MathSciNet  Google Scholar 

  5. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD Rec. 29(2), 70–81 (2000)

    Article  Google Scholar 

  6. Atkins, P., Depaula, J., Keeler, J.: Physical Chemistry. Oxford University Press, Oxford (2017)

    Google Scholar 

  7. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data—SIGMOD ’04, p. 455 (2004)

  8. Duda, R.O., Hart, P.E.: Use of the Hough transform to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)

    Article  MATH  Google Scholar 

  9. Eitman, W.J., Guthrie, G.E.: The shape of the average cost curve. Am. Econ. Rev. 42(5), 832–838 (1952)

    Google Scholar 

  10. Kazempour, D., Mauder, M., Kröger, P., Seidl, T.: Detecting global hyperparaboloid correlated clusters based on Hough transform. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 31:1–31:6 (2017)

  11. MacQueen, J.B.: Kmeans some methods for classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Stat. Probab. 1967 1(233), 281–297 (1967)

    MathSciNet  Google Scholar 

  12. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. 194, 169–194 (1998)

    Article  Google Scholar 

  13. Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)

    Google Scholar 

  14. Sha, C., Qiu, X., Zhou, A.: KLNCC: a new nonlinear correlation clustering algorithm based on KL-divergence. In: 8th IEEE International Conference on Computer and Information Technology, pp. 125–130 (2008)

  15. Tung, A.K.H., Xu, X., Ooi, B.C.: CURLER: finding and visualizing nonlinear correlation clusters. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (2005)

  16. Zwietering, M., Jongenburger, I., Rombouts, F., Riet, K.V.: Modeling of the bacterial growth curve. J. Appl. Environ. Microbiol. 56(6), 1875–1881 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniyal Kazempour.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

On parabolic representations

On parabolic representations

As we aim to detect hyperparaboloid correlated clusters we begin with the fundamental building blocks of hyperparaboloids: The parabola itself. The equation for a plain parabola emerging from the point of origin in 2D with axes x and y is

$$\begin{aligned} y = x^2. \end{aligned}$$
(20)

Putting the opening a,  the gradient with the y-axis b and the shift in y-direction c into account we extend Eqs. 2021 as follows:

$$\begin{aligned} y = ax^2 + bx +c. \end{aligned}$$
(21)

Besides the representation of a parabola through Eq. 21 we provide here another, more intuitive representation of a parabola.

Fig. 21
figure 21

Characteristics of a parabola

As it can be seen in Fig. 21 a parabola is characterized by its vertex \((x_p,\,y_p),\) its focus \(\left( x_p,\,y_p +\frac{1}{4a}\right) ,\) its directrix \(\left( x_p,\, -y_p + \frac{1}{4a}\right) \) and its openness p. The relation between focus and directrix is that given an arbitrary point \((x,\,y)\) on the parabola, the distance p from the focus to that point equals the distance from the point to the directrix, which can be expressed by the following equation:

$$\begin{aligned} \vert y+p \vert = \sqrt{(y-p)^{2}+x^{2}}. \end{aligned}$$
(22)

Squaring the equation on both sides leads to:

$$\begin{aligned} (y+p)^{2} = (y-p)^{2}+x^{2}. \end{aligned}$$
(23)

Expanding the binomial expressions on both sides results in:

$$\begin{aligned} y^{2}+2py+p^{2} = y^{2}-2py+p^{2}+x^{2}. \end{aligned}$$
(24)

Which further simplifies to:

$$\begin{aligned} x^{2} = 4py. \end{aligned}$$
(25)

This is the equation of a parabola which has its vertex at \((0,\,0).\) We now further develop Eq. 25 by extending it with the vertex coordinates \((x_{p},\, y_{p}).\)

$$\begin{aligned} (x-x_{p})^{2} = 4p(y-y_{p}) . \end{aligned}$$
(26)

However this representation of a parabola by its vertex \((x_p,\, y_p)\) and its opening p can be easily transformed into the parabola function \(y = ax^2 + bx + c\):

$$\begin{aligned} \begin{aligned}&(x-x_{p})^{2} = 4p(y-y_{p}), \\&y = \frac{(x-x_p)^2}{4p}+y_p, \\&y = \frac{x^2 - 2xx_p + {y_p}^2 }{4p} + y_p,\\&y = \frac{x^2}{4p} - \frac{2xx_p}{4p} + \frac{ {y_p}^2 }{4p} + y_p. \end{aligned} \end{aligned}$$
(27)

That is

$$\begin{aligned} \begin{aligned} a = \frac{1}{4p},\quad b = - \frac{x_p}{2p}, \quad c = \frac{ {x_p}^2 }{4p} + y_p. \end{aligned} \end{aligned}$$
(28)

Equation 26 describes a parabola which is parallel to the y-axis. If we want to detect (hyper)paraboloid development over e.g. time, it would be a good approach to detect x-axis parallel (hyper)paraboloids. Reformulating Eq. 26 is easily performed by changing the roles of x and y which delivers the following equation as it can be seen in Eq. 29.

$$\begin{aligned} (y-y_{p})^{2} = 4p(x-x_{p}). \end{aligned}$$
(29)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kazempour, D., Mauder, M., Kröger, P. et al. Detecting global hyperparaboloid correlated clusters: a Hough-transform based multicore algorithm. Distrib Parallel Databases 37, 39–72 (2019). https://doi.org/10.1007/s10619-018-7246-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7246-0

Keywords

Navigation