Abstract
The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.
Keywords
The participation of the conference is supported by NOKIA.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. Ankerst, M. Breunig, et al, “OPTICS: Ordering points to identify the clustering structure”, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49–60, Philadelphia, PA, June 1999.
R. Agrawal, J. Gehrke, et al, “Automatic subspace clustering of high dimensional data for data mining aplications”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 94–105, Seattle, WA, June 1998.
K. Alsabti, S. Ranka, V. Singh, “An Efficient K-Means Clustering Algorithm,” Proc. the First Workshop on High Performance Data Mining, Orlando, Florida, 1998.
P.S. Bradley, O.L. Mangasarian, “K-Plane Clustering,” Journal of Global Optimization 16, Number 1, 2000, pp. 23–32.
M. Ester, H.-P. Kriegel et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231, Portland, Oregon, Aug. 1996.
S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73–84, Seattle, WA, June 1998.
S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes”, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512–521, Sydney, Australia, Mar. 1999.
A. Hinneburg and D.A. Keim, “An efficient approach to clustering in large multimedia databases with noise”, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pp. 58–65, New York, Aug. 1998.
Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques,” Higher Education Press, Morgan Kaufmann Publishers, 2001.
Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, 2:283–304, 1998.
Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD’97), Tucson, Arizona, May 1997.
G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling”, IEEE Computer, Special Issue on Data Analysis and Mining, Vol. 32, No. 8, August 1999, pp. 68–75.
R. Ng and J. Han, “Efficient and effective clustering method for spatial data mining”, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pp. 144–155, Santiago, Chile, Sept. 1994.
G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases”, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pp. 428–429, New York, Aug. 1998.
Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications”, Data Mining and Knowledge Discovery, Vol. 2, No 2, June 1998.
W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to spatial data mining”, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pp. 186–195, Athens, Greece, Aug. 1997.
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases”, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103–114, Montreal, Canada, June 1996.
Zhao Yanchang, Song Junde, “GDILC: A Grid-based Density Iso-line Clustering Algorithm,” Proc. Int. Conf. Info-tech and Info-net (ICII 2001), Beijing, China, Oct. 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yanchang, Z., Junde, S. (2003). AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_27
Download citation
DOI: https://doi.org/10.1007/3-540-36175-8_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive