Skip to main content

C-DBSCAN: Density-Based Clustering with Constraints

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4482))

Abstract

Density-based clustering methods are of particular interest for applications where the anticipated groups of data instances are expected to differ in size or shape, arbitrary shapes are possible and the number of clusters is not known a priori. In such applications, background knowledge about group-membership or non-membership of some instances may be available and its exploitation so interesting. Recently, such knowledge is being expressed as constraints and exploited in constraint-based clustering. In this paper, we enhance the density-based algorithm DBSCAN with constraints upon data instances – “Must-Link” and “Cannot-Link” constraints. We test the new algorithm C-DBSCAN on artificial and real datasets and show that C-DBSCAN has superior performance to DBSCAN, even when only a small number of constraints is available.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kopanas, I., Avouris, N.M., Daskalaki, S.: The Role of Domain Knowledge in a Large Scale Data Mining Projects. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol. 2308, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Wagstaff, K., et al.: Constrained K-means Clustering with Background Knowledge. In: ICML’01: Proc. of 18th Int. Conf. on Machine Learning, pp. 577–584 (2001)

    Google Scholar 

  3. Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM’05: Proc. of the SIAM Int. Conf. on Data Mining (2005)

    Google Scholar 

  4. Bennett, K., Bradley, P., Demiriz, A.: Constrained K-Means Clustering. Technical report, Microsoft Research, MSR-TR-2000-65 (2000)

    Google Scholar 

  5. Davidson, I., Ravi, S.S.: Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical results. In: Jorge, A.M., et al. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 59–70. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Ester, M., et al.: A Density-Based Algortihm for Discovering Clusters in Large Spatial Database with Noise. In: KDD’96: Proc. of 2nd Int. Conf. on Knowledge Discovery in Databases and Data Mining (1996)

    Google Scholar 

  7. Davidson, I., Basu, S.: Clustering with Constraints. In: ICDM’05: Tutorial at The 5th IEEE Int. Conf. on Data Mining, IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  8. Davidson, I., Basu, S.: Clustering with Constraints: Theory and Practice. In: KDD’06: Tutorial at The Int. Conf. on Knowledge Discovery in Databases and Data Mining (2006)

    Google Scholar 

  9. Gunopulos, D., Vazirgiannis, M., Halkidi, M.: From Unsupervised to Semi-supervised Learning: Algorithms and Evaluation Approaches. In: SIAM’06: Tutorial at Society for Industrial and Applied Mathematics Int. Conf. on Data Mining (2006)

    Google Scholar 

  10. Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: ICML’00: Proc. of 17th Int. Conf. on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  11. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: ICML’02: Proc. of the Int. Conf. on Machine Learning (2002)

    Google Scholar 

  12. Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for Semi-Supervised Clustering. In: KDD’04: Proc. of the 10th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 59–68 (2004)

    Google Scholar 

  13. Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semisupervised Clustering. In: ICML’04: Proc. of the 21th Int. Conf. on Machine Learning, pp. 11–19 (2004)

    Google Scholar 

  14. Halkidi, M., et al.: A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria. In: ICDM’2005: Proc. of the IEEE Int. Conf. on Data Mining, pp. 637–640. IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  15. Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: KDD’98: Proc. of the 4th Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 58–65 (1998)

    Google Scholar 

  16. Ankerst, M., Breunig, M.M., Kriegel, H.-P.: OPTICS: Ordering Points to Identify the Clustering Structure. In: SIGMOD’99: Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data, pp. 49–60. ACM Press, New York (1999)

    Chapter  Google Scholar 

  17. Angiulli, F., Pizzuti, C., Ruffolo, M.: DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 203–210. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative Searching. Communications of ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  19. Newman, D., et al.: UCI Repository of ML Databases (1998)

    Google Scholar 

  20. Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD’98: Proceeding of the 1998 ACM SIGMOD Int. Conf. on Management of Data, pp. 73–84. ACM Press, New York (1998)

    Chapter  Google Scholar 

  21. Karypis, G., Hang, E.-H., Kumar, V.: Chameleon: Hierchachical Clustering Using Dynamic Modeling. Computer 32(8), 68–75 (1999)

    Article  Google Scholar 

  22. Rand, W.M.: Objective Criteria for the Evalluation of Clustering Methods. Journal of the American Statistical Association 66, 846–850 (1971)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2007). C-DBSCAN: Density-Based Clustering with Constraints. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72530-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72529-9

  • Online ISBN: 978-3-540-72530-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics