Abstract
Correlation clustering detects complex and intricate relationships in high-dimensional data by identifying groups of data points, each characterized by differents correlation among a (sub)set of features. Current correlation clustering methods generally limit themselves to linear correlations only. In this paper, we introduce a method for detecting global non-linear correlated clusters focusing on quadratic relations. We introduce a novel Hough transform for the detection of hyperparaboloids and apply it to the detection of hyperparaboloid correlated clusters in arbitrary high-dimensional data spaces. We further provide a solution for utilizing all available CPU cores on a system. For this we simply split the Hough space among a pre-defined axis into a number of equi-sized partitions. In this paper we show that this most simple way of parallelization already improves the runtime significantly. Non-linear correlation clustering like our method can reveal valuable insights which are not covered by current linear versions. Our empirical results on synthetic and real world data reveal that the proposed method is robust against noise, jitter and irregular densities.
Similar content being viewed by others
Notes
Obtained from http://stats.nba.com/.
References
Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, pp. 119–128 (2006)
Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: On exploring complex relationships of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM (2007a)
Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 413–418 (2007b)
Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1, 111–127 (2008)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD Rec. 29(2), 70–81 (2000)
Atkins, P., Depaula, J., Keeler, J.: Physical Chemistry. Oxford University Press, Oxford (2017)
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data—SIGMOD ’04, p. 455 (2004)
Duda, R.O., Hart, P.E.: Use of the Hough transform to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)
Eitman, W.J., Guthrie, G.E.: The shape of the average cost curve. Am. Econ. Rev. 42(5), 832–838 (1952)
Kazempour, D., Mauder, M., Kröger, P., Seidl, T.: Detecting global hyperparaboloid correlated clusters based on Hough transform. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 31:1–31:6 (2017)
MacQueen, J.B.: Kmeans some methods for classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Stat. Probab. 1967 1(233), 281–297 (1967)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. 194, 169–194 (1998)
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)
Sha, C., Qiu, X., Zhou, A.: KLNCC: a new nonlinear correlation clustering algorithm based on KL-divergence. In: 8th IEEE International Conference on Computer and Information Technology, pp. 125–130 (2008)
Tung, A.K.H., Xu, X., Ooi, B.C.: CURLER: finding and visualizing nonlinear correlation clusters. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (2005)
Zwietering, M., Jongenburger, I., Rombouts, F., Riet, K.V.: Modeling of the bacterial growth curve. J. Appl. Environ. Microbiol. 56(6), 1875–1881 (1990)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
On parabolic representations
On parabolic representations
As we aim to detect hyperparaboloid correlated clusters we begin with the fundamental building blocks of hyperparaboloids: The parabola itself. The equation for a plain parabola emerging from the point of origin in 2D with axes x and y is
Putting the opening a, the gradient with the y-axis b and the shift in y-direction c into account we extend Eqs. 20–21 as follows:
Besides the representation of a parabola through Eq. 21 we provide here another, more intuitive representation of a parabola.
As it can be seen in Fig. 21 a parabola is characterized by its vertex \((x_p,\,y_p),\) its focus \(\left( x_p,\,y_p +\frac{1}{4a}\right) ,\) its directrix \(\left( x_p,\, -y_p + \frac{1}{4a}\right) \) and its openness p. The relation between focus and directrix is that given an arbitrary point \((x,\,y)\) on the parabola, the distance p from the focus to that point equals the distance from the point to the directrix, which can be expressed by the following equation:
Squaring the equation on both sides leads to:
Expanding the binomial expressions on both sides results in:
Which further simplifies to:
This is the equation of a parabola which has its vertex at \((0,\,0).\) We now further develop Eq. 25 by extending it with the vertex coordinates \((x_{p},\, y_{p}).\)
However this representation of a parabola by its vertex \((x_p,\, y_p)\) and its opening p can be easily transformed into the parabola function \(y = ax^2 + bx + c\):
That is
Equation 26 describes a parabola which is parallel to the y-axis. If we want to detect (hyper)paraboloid development over e.g. time, it would be a good approach to detect x-axis parallel (hyper)paraboloids. Reformulating Eq. 26 is easily performed by changing the roles of x and y which delivers the following equation as it can be seen in Eq. 29.
Rights and permissions
About this article
Cite this article
Kazempour, D., Mauder, M., Kröger, P. et al. Detecting global hyperparaboloid correlated clusters: a Hough-transform based multicore algorithm. Distrib Parallel Databases 37, 39–72 (2019). https://doi.org/10.1007/s10619-018-7246-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-018-7246-0