C-DBSCAN: Density-Based Clustering with Constraints

Ruiz, Carlos; Spiliopoulou, Myra; Menasalvas, Ernestina

doi:10.1007/978-3-540-72530-5_25

C-DBSCAN: Density-Based Clustering with Constraints

Carlos Ruiz²⁴,
Myra Spiliopoulou²⁵ &
Ernestina Menasalvas²⁴

Conference paper

2302 Accesses
40 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4482))

Abstract

Density-based clustering methods are of particular interest for applications where the anticipated groups of data instances are expected to differ in size or shape, arbitrary shapes are possible and the number of clusters is not known a priori. In such applications, background knowledge about group-membership or non-membership of some instances may be available and its exploitation so interesting. Recently, such knowledge is being expressed as constraints and exploited in constraint-based clustering. In this paper, we enhance the density-based algorithm DBSCAN with constraints upon data instances – “Must-Link” and “Cannot-Link” constraints. We test the new algorithm C-DBSCAN on artificial and real datasets and show that C-DBSCAN has superior performance to DBSCAN, even when only a small number of constraints is available.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kopanas, I., Avouris, N.M., Daskalaki, S.: The Role of Domain Knowledge in a Large Scale Data Mining Projects. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol. 2308, Springer, Heidelberg (2002)
Chapter Google Scholar
Wagstaff, K., et al.: Constrained K-means Clustering with Background Knowledge. In: ICML’01: Proc. of 18th Int. Conf. on Machine Learning, pp. 577–584 (2001)
Google Scholar
Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM’05: Proc. of the SIAM Int. Conf. on Data Mining (2005)
Google Scholar
Bennett, K., Bradley, P., Demiriz, A.: Constrained K-Means Clustering. Technical report, Microsoft Research, MSR-TR-2000-65 (2000)
Google Scholar
Davidson, I., Ravi, S.S.: Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical results. In: Jorge, A.M., et al. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 59–70. Springer, Heidelberg (2005)
Chapter Google Scholar
Ester, M., et al.: A Density-Based Algortihm for Discovering Clusters in Large Spatial Database with Noise. In: KDD’96: Proc. of 2nd Int. Conf. on Knowledge Discovery in Databases and Data Mining (1996)
Google Scholar
Davidson, I., Basu, S.: Clustering with Constraints. In: ICDM’05: Tutorial at The 5th IEEE Int. Conf. on Data Mining, IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Davidson, I., Basu, S.: Clustering with Constraints: Theory and Practice. In: KDD’06: Tutorial at The Int. Conf. on Knowledge Discovery in Databases and Data Mining (2006)
Google Scholar
Gunopulos, D., Vazirgiannis, M., Halkidi, M.: From Unsupervised to Semi-supervised Learning: Algorithms and Evaluation Approaches. In: SIAM’06: Tutorial at Society for Industrial and Applied Mathematics Int. Conf. on Data Mining (2006)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML’00: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. of the Int. Conf. on Machine Learning (2002)
Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for Semi-Supervised Clustering. In: KDD’04: Proc. of the 10th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 59–68 (2004)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)
Google Scholar
Halkidi, M., et al.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM’2005: Proc. of the IEEE Int. Conf. on Data Mining, pp. 637–640. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: KDD’98: Proc. of the 4th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 58–65 (1998)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.-P.: OPTICS: Ordering Points to Identify the Clustering Structure. In: SIGMOD’99: Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data, pp. 49–60. ACM Press, New York (1999)
Chapter Google Scholar
Angiulli, F., Pizzuti, C., Ruffolo, M.: DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 203–210. Springer, Heidelberg (2004)
Chapter Google Scholar
Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative Searching. Communications of ACM 18(9), 509–517 (1975)
Article MATH Google Scholar
Newman, D., et al.: UCI Repository of ML Databases (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD’98: Proceeding of the 1998 ACM SIGMOD Int. Conf. on Management of Data, pp. 73–84. ACM Press, New York (1998)
Chapter Google Scholar
Karypis, G., Hang, E.-H., Kumar, V.: Chameleon: Hierchachical Clustering Using Dynamic Modeling. Computer 32(8), 68–75 (1999)
Article Google Scholar
Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Informatica, Universidad Politecnica, Madrid, Spain
Carlos Ruiz & Ernestina Menasalvas
Faculty of Computer Science, Otto-von-Guericke-Universität Magdeburg, Germany
Myra Spiliopoulou

Authors

Carlos Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Ernestina Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, York University, M3J 1P3, Toronto, Ontario, Canada
Aijun An
Institute of Computing Sciences, Poznań University of Technology, ul. Piotrowo 2, 60–965, Poznań, Poland
Jerzy Stefanowski
Department of Applied Computer Science, University of Winnipeg, R3B 2E9, Winnipeg, Manitoba, Canada
Sheela Ramanna
Department of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
Cory J. Butz
Department of Electrical and Computer Engineering, University of Alberta, T6G 2V4, Edmonton, Alberta, Canada
Witold Pedrycz
Institute of Compuer Science and Technology, Chongqing University of Posts and Telecommunications, 40065, Chongqing, P.R. China
Guoyin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2007). C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-72530-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72529-9
Online ISBN: 978-3-540-72530-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics