Abstract
Estimating density is needed in several clustering algorithms and other data analysis methods. Straightforward calculation takes O(N2) because of the calculation of all pairwise distances. This is the main bottleneck for making the algorithms scalable. We propose a faster O(N logN) time algorithm that calculates the density estimates in each dimension separately, and then simply cumulates the individual estimates into the final density values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Astrahan, M.M.: Speech Analysis by Clustering, or the Hyperphome Method, Stanford Artificial Intelligence Project Memorandum AIM-124, Stanford University, Stanford, CA (1970)
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)
Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. App. 36(7), 10223–10228 (2009)
Cao, F., Liang, J., Jiang, G.: An initialization method for the k-means algorithm using neighborhood model. Comput. Math. App. 58, 474–483 (2009)
Denoeux, T., Kanhanatarakul, O., Sriboonchitta, S.: EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule. Knowl.-Based Syst. 88, 57–69 (2015)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)
Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)
Gourgaris, P., Makris, C.: A density based k-means initialization scheme. In: EANN Workshops, Rhodes Island, Greece (2015)
Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition (ICPR’2004), Cambridge, UK, pp. 430–433, August 2004
Hou, J., Pellilo, M.: A new density kernel in density peak based clustering. In: International Conference on Pattern Recognition, Cancun, Mexico, pp. 468–473, December 2014
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Upper Saddle River (1988)
Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized Lloyd iteration. IEEE Sig. Process. Lett. 1(10), 144–146 (1994)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)
Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem, Research Report A-2002-6
Lemke, O., Keller, B.: Common nearest neighbor clustering: why core sets matter. Algorithms (2018)
Lulli, A., Dell’Amico, M., Michiardi, P., Ricci, L.: NGDBSCAN: scalable density-based clustering for arbitrary data. VLDB Endow. 10(3), 157–168 (2016)
Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965)
Mak, K.F., He, K., Shan, J., Heinz, T.F.: Nat. Nanotechnol. 7, 494–498 (2012)
Melnykov, I., Melnykov, V.: On k-means algorithm with the use of Mahalanobis distances. Stat. Probab. Lett. 84, 88–95 (2014)
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Rec. 29(2), 427–438 (2000)
Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognit. Lett. 28(8), 965–973 (2007)
Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Rodriquez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Sieranoja, S., Fränti, P.: High-dimensional kNN-graph construction using z-order curve. ACM J. Exp. Algorithmics (submitted)
Steinley, D.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)
Steinwart, I.: Fully adaptive density-based clustering. Ann. Stat. 43(5), 2132–2167 (2015)
Wang, Q., Kulkarni, R., Verdu, S.: Divergence estimation for multidimensional densities via k–nearest-neighbor distances. IEEE Trans. Inf. Theory 55(5), 2392–2405 (2009)
Wang, J., Zhang, Y., Lan, X.: Automatic cluster number selection by finding density peaks. In: IEEE International Conference on Computers and Communications, Chengdu, China, October 2016
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)
Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng. 92, 77–89 (2014)
Zhao, Q., Shi, Y., Liu, Q., Fränti, P.: A grid-growing clustering algorithm for geo-spatial data. Pattern Recogn. Lett. 53(1), 77–84 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fränti, P., Sieranoja, S. (2018). Dimensionally Distributed Density Estimation. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-91262-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91261-5
Online ISBN: 978-3-319-91262-2
eBook Packages: Computer ScienceComputer Science (R0)