Abstract
Density-based clustering methods are of particular interest for applications where the anticipated groups of data instances are expected to differ in size or shape, arbitrary shapes are possible and the number of clusters is not known a priori. In such applications, background knowledge about group-membership or non-membership of some instances may be available and its exploitation so interesting. Recently, such knowledge is being expressed as constraints and exploited in constraint-based clustering. In this paper, we enhance the density-based algorithm DBSCAN with constraints upon data instances – “Must-Link” and “Cannot-Link” constraints. We test the new algorithm C-DBSCAN on artificial and real datasets and show that C-DBSCAN has superior performance to DBSCAN, even when only a small number of constraints is available.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kopanas, I., Avouris, N.M., Daskalaki, S.: The Role of Domain Knowledge in a Large Scale Data Mining Projects. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol. 2308, Springer, Heidelberg (2002)
Wagstaff, K., et al.: Constrained K-means Clustering with Background Knowledge. In: ICML’01: Proc. of 18th Int. Conf. on Machine Learning, pp. 577–584 (2001)
Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM’05: Proc. of the SIAM Int. Conf. on Data Mining (2005)
Bennett, K., Bradley, P., Demiriz, A.: Constrained K-Means Clustering. Technical report, Microsoft Research, MSR-TR-2000-65 (2000)
Davidson, I., Ravi, S.S.: Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical results. In: Jorge, A.M., et al. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 59–70. Springer, Heidelberg (2005)
Ester, M., et al.: A Density-Based Algortihm for Discovering Clusters in Large Spatial Database with Noise. In: KDD’96: Proc. of 2nd Int. Conf. on Knowledge Discovery in Databases and Data Mining (1996)
Davidson, I., Basu, S.: Clustering with Constraints. In: ICDM’05: Tutorial at The 5th IEEE Int. Conf. on Data Mining, IEEE Computer Society Press, Los Alamitos (2005)
Davidson, I., Basu, S.: Clustering with Constraints: Theory and Practice. In: KDD’06: Tutorial at The Int. Conf. on Knowledge Discovery in Databases and Data Mining (2006)
Gunopulos, D., Vazirgiannis, M., Halkidi, M.: From Unsupervised to Semi-supervised Learning: Algorithms and Evaluation Approaches. In: SIAM’06: Tutorial at Society for Industrial and Applied Mathematics Int. Conf. on Data Mining (2006)
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML’00: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. of the Int. Conf. on Machine Learning (2002)
Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for Semi-Supervised Clustering. In: KDD’04: Proc. of the 10th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 59–68 (2004)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)
Halkidi, M., et al.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM’2005: Proc. of the IEEE Int. Conf. on Data Mining, pp. 637–640. IEEE Computer Society Press, Los Alamitos (2005)
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: KDD’98: Proc. of the 4th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 58–65 (1998)
Ankerst, M., Breunig, M.M., Kriegel, H.-P.: OPTICS: Ordering Points to Identify the Clustering Structure. In: SIGMOD’99: Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data, pp. 49–60. ACM Press, New York (1999)
Angiulli, F., Pizzuti, C., Ruffolo, M.: DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 203–210. Springer, Heidelberg (2004)
Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative Searching. Communications of ACM 18(9), 509–517 (1975)
Newman, D., et al.: UCI Repository of ML Databases (1998)
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD’98: Proceeding of the 1998 ACM SIGMOD Int. Conf. on Management of Data, pp. 73–84. ACM Press, New York (1998)
Karypis, G., Hang, E.-H., Kumar, V.: Chameleon: Hierchachical Clustering Using Dynamic Modeling. Computer 32(8), 68–75 (1999)
Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2007). C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-72530-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72529-9
Online ISBN: 978-3-540-72530-5
eBook Packages: Computer ScienceComputer Science (R0)