Abstract
Clustering is one of the primary techniques in data mining, for which to find the user expecting result is a major issue. However, to dynamically specify the parameters for clustering algorithms presents an obstacle for users. This paper firstly introduces a novel density-based partitioning and hierarchical algorithm, which makes it easy to employ synthetic feedback mechanism in clustering. Additionally, by investigating into the relation between parameters and the clustering result, we propose a self-tuning technique for the setting of parameters. Meanwhile, the density distribution within a cluster can be expressed in the result for the user to specify the cluster’s feature. The algorithm is both evaluated in theory and practice. It outperforms many existing algorithms both in efficiency and quality.
This paper was supported by the Key Program of National Natural Science Foundation of China (No. 69933010) and China National 863 High-Tech Projects (No. 2002AA4Z3430 and 2002AA231041)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AGRAWAL, R., et al.: CLIQUE: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998, pp. 94–105 (1998)
Ankerst, M., et al.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 1999), pp. 49–60 (1999)
Berkhin, P.: Survey of Clustering Data Mining Techniques. In: Accrue Sotware (2002)
Cheeseman, P., et al.: Bayesian Classification (AutoClass): Theory and Results. In: Fayyad, U.M., et al. (eds.): Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press (1996)
Dash, M., et al.: 1 + 1 > 2′: Merging Distance and Density Based Clustering. In: The 7th International Conference on Database Systems for Advanced Applications, DASFAA 2001, pp. 32–39 (2001)
Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc.of the Second Intl Conference on Knowledge Discovery and Data Mining (1996)
Fraley, C., et al.: MCLUST: Software for model-based cluster and discriminant analysis. Tech Report 342, Dept. Statistics, Univ. of Washington (1999)
Goil, S., et al.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University (1999)
Guha, S., et al.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proc. 1999 Int. Conf. Data Engineering (ICDE 1999), pp. 512–521 (1999)
Guha, S., et al.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), pp. 73–84 (1998)
Han, J.W., et al.: Data Mining: Concepts and Techniques, June, 2000. Morgan Kaufmann Press, San Francisco (2000)
Hinneburg, A., et al.: An efficient approach to clustering in large multimedia databases with noise. In: Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD 1998), pp. 58–65 (1998)
Jain, R.: The Art of Computer Systems Performance Analysis. John Wiley & Sons, Inc., Chichester (1991)
Karypis, G., et al.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. Computer 32, 68–75 (1999)
Kriegel, H.P., et al.: Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, Springer, Heidelberg (2003)
Kaufman, L., et al.: PAM & CLARA: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Macqueen, J.: k-means:Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297
Ng, R., et al.: CLARANS: Efficient and effective clustering method for spatial data mining. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), pp. 144–155 (1994)
Schikuta, E., et al.: The BANG-clustering system: grid-based data analysis. In: Proceeding of Advances in Intelligent Data Analysis, Reasoning about Data, 2nd International Symposium, pp. 513–524
Sander, J., et al.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. In: Data Mining and Knowledge Discovery, vol. 2(2), pp. 169–194
Sheikholeslami, G., et al.: WaveCluster: A multiresolution clustering approach for very large spatial databases. In: Proceedings of the 24th Conference on VLDB, pp. 428–439
Wallace, C., et al.: Intrinsic classification by MML-the Snob program. In: The Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, pp. 37–44
Wang, W., et al.: STING+: An approach to active spatial data mining. In: Proceedings 15th ICDE, pp. 116–125
Wang, W., et al.: A statistical information grid approach to spatialdata mining. In: Proceedings of the 23rd Conference on VLDB, pp. 186–195
Xu, X., et al.: DBCLASD: A distribution-based clustering algorithm for mining large spatial datasets. In: Proceedings of the 14th ICDE, pp. 324–331
Yang, J., et al.: CLUSEQ: efficient and effective sequence clustering. In: Proceedings of the 19th IEEE International Conference on Data Engineering, ICDE (2003)
Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD 1996), pp. 103–114 (1996)
Zhou, H.F., et al.: PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm. Journal of Computer Science and Technology 18(3), 407–411
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, D., Cheng, Z., Wang, C., Zhou, H., Wang, W., Shi, B. (2004). SUDEPHIC: Self-Tuning Density-Based Partitioning and Hierarchical Clustering. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-24571-1_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive