Skip to main content

SUDEPHIC: Self-Tuning Density-Based Partitioning and Hierarchical Clustering

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Included in the following conference series:

Abstract

Clustering is one of the primary techniques in data mining, for which to find the user expecting result is a major issue. However, to dynamically specify the parameters for clustering algorithms presents an obstacle for users. This paper firstly introduces a novel density-based partitioning and hierarchical algorithm, which makes it easy to employ synthetic feedback mechanism in clustering. Additionally, by investigating into the relation between parameters and the clustering result, we propose a self-tuning technique for the setting of parameters. Meanwhile, the density distribution within a cluster can be expressed in the result for the user to specify the cluster’s feature. The algorithm is both evaluated in theory and practice. It outperforms many existing algorithms both in efficiency and quality.

This paper was supported by the Key Program of National Natural Science Foundation of China (No. 69933010) and China National 863 High-Tech Projects (No. 2002AA4Z3430 and 2002AA231041)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AGRAWAL, R., et al.: CLIQUE: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998, pp. 94–105 (1998)

    Google Scholar 

  2. Ankerst, M., et al.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 1999), pp. 49–60 (1999)

    Google Scholar 

  3. Berkhin, P.: Survey of Clustering Data Mining Techniques. In: Accrue Sotware (2002)

    Google Scholar 

  4. Cheeseman, P., et al.: Bayesian Classification (AutoClass): Theory and Results. In: Fayyad, U.M., et al. (eds.): Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press (1996)

    Google Scholar 

  5. Dash, M., et al.: 1 + 1 > 2: Merging Distance and Density Based Clustering. In: The 7th International Conference on Database Systems for Advanced Applications, DASFAA 2001, pp. 32–39 (2001)

    Google Scholar 

  6. Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc.of the Second Intl Conference on Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  7. Fraley, C., et al.: MCLUST: Software for model-based cluster and discriminant analysis. Tech Report 342, Dept. Statistics, Univ. of Washington (1999)

    Google Scholar 

  8. Goil, S., et al.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University (1999)

    Google Scholar 

  9. Guha, S., et al.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proc. 1999 Int. Conf. Data Engineering (ICDE 1999), pp. 512–521 (1999)

    Google Scholar 

  10. Guha, S., et al.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), pp. 73–84 (1998)

    Google Scholar 

  11. Han, J.W., et al.: Data Mining: Concepts and Techniques, June, 2000. Morgan Kaufmann Press, San Francisco (2000)

    Google Scholar 

  12. Hinneburg, A., et al.: An efficient approach to clustering in large multimedia databases with noise. In: Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD 1998), pp. 58–65 (1998)

    Google Scholar 

  13. Jain, R.: The Art of Computer Systems Performance Analysis. John Wiley & Sons, Inc., Chichester (1991)

    MATH  Google Scholar 

  14. Karypis, G., et al.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. Computer 32, 68–75 (1999)

    Google Scholar 

  15. Kriegel, H.P., et al.: Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, Springer, Heidelberg (2003)

    Google Scholar 

  16. Kaufman, L., et al.: PAM & CLARA: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)

    Google Scholar 

  17. Macqueen, J.: k-means:Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297

    Google Scholar 

  18. Ng, R., et al.: CLARANS: Efficient and effective clustering method for spatial data mining. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), pp. 144–155 (1994)

    Google Scholar 

  19. Schikuta, E., et al.: The BANG-clustering system: grid-based data analysis. In: Proceeding of Advances in Intelligent Data Analysis, Reasoning about Data, 2nd International Symposium, pp. 513–524

    Google Scholar 

  20. Sander, J., et al.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. In: Data Mining and Knowledge Discovery, vol. 2(2), pp. 169–194

    Google Scholar 

  21. Sheikholeslami, G., et al.: WaveCluster: A multiresolution clustering approach for very large spatial databases. In: Proceedings of the 24th Conference on VLDB, pp. 428–439

    Google Scholar 

  22. Wallace, C., et al.: Intrinsic classification by MML-the Snob program. In: The Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, pp. 37–44

    Google Scholar 

  23. Wang, W., et al.: STING+: An approach to active spatial data mining. In: Proceedings 15th ICDE, pp. 116–125

    Google Scholar 

  24. Wang, W., et al.: A statistical information grid approach to spatialdata mining. In: Proceedings of the 23rd Conference on VLDB, pp. 186–195

    Google Scholar 

  25. Xu, X., et al.: DBCLASD: A distribution-based clustering algorithm for mining large spatial datasets. In: Proceedings of the 14th ICDE, pp. 324–331

    Google Scholar 

  26. Yang, J., et al.: CLUSEQ: efficient and effective sequence clustering. In: Proceedings of the 19th IEEE International Conference on Data Engineering, ICDE (2003)

    Google Scholar 

  27. Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD 1996), pp. 103–114 (1996)

    Google Scholar 

  28. Zhou, H.F., et al.: PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm. Journal of Computer Science and Technology 18(3), 407–411

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, D., Cheng, Z., Wang, C., Zhou, H., Wang, W., Shi, B. (2004). SUDEPHIC: Self-Tuning Density-Based Partitioning and Hierarchical Clustering. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24571-1_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21047-4

  • Online ISBN: 978-3-540-24571-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics