Abstract
In order to solve the problem that traditional grid-based clustering techniques lack of the capability of dealing with data of high dimensionality, we propose an intersecting grid partition method and a density estimation method. The partition method can greatly reduce the number of grid cells generated in high dimensional data space and make the neighbor-searching easily. On basis of the two methods, we propose grid-based clustering algorithm (GCOD), which merges two intersecting grids according to density estimation. The algorithm requires only one parameter and the time complexity is linear to the size of the input data set or data dimension. The experimental results show that GCOD can discover arbitrary shapes of clusters and scale well.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE transaction on neural networks 16(3), 645–678 (2005)
Agrawal, R., Gehrke, J., Gunopulos, D., et al.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. of the ACM SIGMOD Int’l Conference on Management of Data. Seattle, Washington, pp. 94–105 (June 1998)
W., Yang, J., Muntz, R.: STING: A statistical information grid approach to spatial data mining. In: Proc. 23rd Int. conf. on very large data bases, pp. 186–195. Morgan Kaufmann, San Francisco (1997)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multiresolution clustering approach for very large spatial databases. In: Proc. 1998 Int. conf. very large data bases, New York, pp. 428–439 (1998)
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia databases with Noise. In: KDD 1998, pp. 58–65.
Yanchang, Z., Jjunde, S.: GDILC: A Grid-based Density-Isoline Clustering Algorithm. In: Proc. of 2001 Int’l Conferences on Info-tech and Info-net, Beijing, China, October 2001, pp. 140–145
Ma, E.W.M., Chow, T.W.S.: A new shifting grid clustering algorithm. Pattern Recognition 37, 503–514 (2004)
Hsu, C.-M., Chen, M.-S.: Subspace clustering of high dimensional spatial data with noises. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 31–40. Springer, Heidelberg (2004)
Zhang, J., Hsu, W., Lilee, M.: Clustering in dynamic spatial database. Journal of intelligent information systems 24(1), 5–27 (2005)
Goil, S., Nagesh, H., Choudhary, A.: Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 2145 Sheridan Road, Evanston IL 60208 (June 1999)
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustering: Towards breaking the curse of Dimensionality in high-dimensional Clustering. In: Proceedings of the 25th VLDB Conference, Edinburgh, pp. 506–517 (1999)
Rickard, J.T., Yager, R.R., Miller, W.: Mountain clustering on non-uniform grids using P-tree. Fuzzy optimization and decision making 4, 87–102 (2005)
Angiulli, F., Pizzuti, C., Ruffolo, M.: DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 203–210. Springer, Heidelberg (2004)
Chang, J.-W., Kim, Y.-K.: An Efficient Clustering Method for High-Dimensional Data Mining. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 276–285. Springer, Heidelberg (2004)
Pilevar, A.H., Sukumar, M.: GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern recognition letters 26, 999–1010 (2005)
Shi, Y., Song, Y., Zhang, A.: A shrinking-based clustering approach for multidimensional data. IEEE transaction on knowledge and data engineering 17(10), 1389–1403 (2005)
Qiu, B., Zhang, X., Shen, J.: Grid-based clustering algorithm for multi-density. In: Proceedings of 2005 international conference on machine learning and cybernetics, pp. 1509–1512.
Qiu, B., shen, J.: GBCBE:Grid-based clustering algorithm with boundary point extraction. Intelligent information management systems and technologies 1(2), 271–276 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qiu, BZ., Li, XL., Shen, JY. (2007). Grid-Based Clustering Algorithm Based on Intersecting Partition and Density Estimation. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)