Abstract
This paper introduces DenLAC (Density Levels Aggregation Clustering), an adaptable clustering algorithm which achieves high accuracy independent of the input’s shape and distribution. While most clustering algorithms are specialized on particular input types, DenLAC obtains correct results for spherical, elongated and different density clusters. We also incorporate a simple procedure for outlier identification and displacement. Our method relies on defining clusters as density intervals comprised of connected components which we call density bins, through assembling several popular notions in data mining and statistics such as Kernel Density Estimation, the density attraction and density levels theoretical concepts. To build the final clusters, we extract the connected components from each density bin and we merge adjacent connected components using a slightly modified agglomerative clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
BWorld Robot Control Software (2020). Accessed 17-Feb-2020
Basford, K.E., McLachlan, G.J.: Likelihood estimation with normal mixture models. Appl. Stat. 34(3), 282 (1985). https://doi.org/10.2307/2347474
Chang, H., Yeung, D.Y.: Robust path-based spectral clustering with application to image segmentation. In: International Conference on Computer Vision. IEEE (2005). https://doi.org/10.1109/ICCV.2005.210
Chaudhuri, K., Dasgupta, S.: Rates of convergence for the cluster tree. In: Advances in Neural Information Processing Systems, pp. 343–351 (2010)
Chen, Y.C.: A tutorial on kernel density estimation and recent advances. Biostatistics Epidemiol. 1(1), 161–187 (2017). https://doi.org/10.1080/24709360.2017.1396742
Dua, D., Graff, C.: UCI machine learning repository (2017)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Franck, P., Cameron, E., Good, G., Rasplus, J.Y., Oldroyd, B.: Nest architecture and genetic differentiation in a species complex of Australian stingless bees. Mol. Ecol. 13(8), 2317–2331 (2004)
Fu, L., Medico, E.: Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8(1), 3 (2007). https://doi.org/10.1186/1471-2105-8-3
Gaonkar, M.N., Sawant, K.: AutoEpsDBSCAN: DBSCAN with Eps automatic for large dataset. Int. J. Adv. Comput. Theor. Eng. 2(2), 11–16 (2013)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 4-es (2007). https://doi.org/10.1145/1217299.1217303
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. 27(2), 73–84 (1998). https://doi.org/10.1145/276305.276312
Hartigan, J.A.: Consistency of single linkage for high-density clusters. J. Am. Stat. Assoc. 76(374), 388–394 (1981)
Hinneburg, A., Keim, D.A., et al.: An efficient approach to clustering in large multimedia databases with noise. KDD 98, 58–65 (1998)
Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. In: Pattern Recognition and Machine Intelligence, pp. 1–10. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/11590316_1
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999). https://doi.org/10.1109/2.781637
Liu, P., Zhou, D., Wu, N.: VDBSCAN: varied density based spatial clustering of applications with noise. In: 2007 International Conference on Service Systems and Service Management, pp. 1–4. IEEE (2007)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley, California (1967)
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983). https://doi.org/10.1093/comjnl/26.4.354
Nadler, B., Galun, M.: Fundamental limitations of spectral clustering. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 1017–1024. NIPS 2006. MIT Press, Cambridge, MA, USA (2006)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14, pp. 849–856. MIT Press, Cambridge (2002)
Ng, R.T., Han, J.: CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002). https://doi.org/10.1109/TKDE.2002.1033770
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.2307/2284239
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42(3), 1–21 (2017)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley and Sons, Hoboken (2015)
Seo, S.: A review and comparison of methods for detecting outliers in univariate data sets. Ph.D. thesis, University of Pittsburgh (2006)
Truică, C.O., Rădulescu, F., Boicea, A.: Comparing different term weighting schemas for topic modeling. In: 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, September 2016. https://doi.org/10.1109/SYNASC.2016.055
Veenman, C., Reinders, M., Backer, E.: A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9), 1273–1280 (sep 2002). https://doi.org/10.1109/TPAMI.2002.1033218
Wagner, S., Wagner, D.: Comparing clusterings: an overview. Tech. rep, ETH Zurich (2007)
Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20(1), 68–86 (1971). https://doi.org/10.1109/t-c.1971.223083
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2), 103–114 (1996). https://doi.org/10.1145/235968.233324
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rădulescu, IM., Boicea, A., Truică, CO., Apostol, ES., Mocanu, M., Rădulescu, F. (2021). DenLAC: Density Levels Aggregation Clustering – A Flexible Clustering Method –. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-77961-0_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)