Abstract
Efficient clustering in dynamic spatial databases is currently an open problem with many potential applications. Most traditional spatial clustering algorithms are inadequate because they do not have an efficient support for incremental clustering.In this paper, we propose DClust, a novel clustering technique for dynamic spatial databases. DClust is able to provide multi-resolution view of the clusters, generate arbitrary shapes clusters in the presence of noise, generate clusters that are insensitive to ordering of input data and support incremental clustering efficiently. DClust utilizes the density criterion that captures arbitrary cluster shapes and sizes to select a number of representative points, and builds the Minimum Spanning Tree (MST) of these representative points, called R-MST. After the initial clustering, a summary of the cluster structure is built. This summary enables quick localization of the effect of data updates on the current set of clusters. Our experimental results show that DClust outperforms existing spatial clustering methods such as DBSCAN, C2P, DENCLUE, Incremental DBSCAN and BIRCH in terms of clustering time and accuracy of clusters found.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berchtold, S., Keim, D.A., and Kriegel, H. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. 22nd International Conference on Very Large Data Base (VLDB’96) (pp. 28–39). Mumbai, India.
Can, F. (1993). Incremental Clustering for Dynamic Information Processing. ACM Transactions on Information Systems, 11(2), 143–164.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96) (pp. 226–231). Portland, USA.
Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., and Xu, X. (1998). Incremental Clustering for Mining in a Data Warehouse Environment. In Proc. 24th International Conference on Very Large Data Base (VLDB’98) (pp. 323–333). New York, USA.
Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2(2), 139–172.
Ganti, V., Gehrke, J., and Ramakrishnan, R. (2001). DEMON: Mining and Mentoring Evolving Data. IEEE Transactions on Knowledge and Data Engineering, 13(1).
Ganti, V., Ramakrishnan, R., Gehrke, J., Powell, A., and French, J. (1999). Clustering Large Datasets in Arbitrary Metric Spaces. In Proc. 15thInternational Conference on Data Engineering (ICDE’99)(pp. 502–511). Sydney, Australia.
Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An Efficient ClusteringAlgorithm for Large Databases. In Proc. 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98) (pp. 73–84). Seattle, WA, USA.
Hinneburg, A. and Keim, D.A. (1998). An Efficient Approach to Clusteringin Large Multimedia Databases with Noise. In Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 58–65). New York City, USA.
MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In Proc. 5th Berkeley Symposium on Math, Statistics and Probability, vol. 1 (pp. 281–297).
Nanopoulos, A., Theodoridis, Y., and Manolopoulos, Y. (2001). C2P: Clustering Based on Closest Pairs. In Proc. 27th International Conference on Very Large Data Base (VLDB’01) (pp. 331–340). Roma, Italy.
Ng, R. and Han, J. (1994). Efficient and Effective Clustering Methods for Spatial Data Mining. In Proc. 20th International Conference on Very Large Data Base (VLDB’94) (pp. 144–155). Santiago, Chile.
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R. (2002). Streaming-Data Algorithms For High-Quality Clustering. In Proc. 18th International Conference on Data Engineering (ICDE’02) (pp. 685–694). San Jose, California, USA.
Sheikholeslami, G., Chatterjee, S., and Zhang, A. (1999). WaveCluster: A Wavelet based Clustering Approach for Spatial Data in Very Large Database. VLDB Journal, 8(3/4), 289–304.
Utgoff, P.E. (1989). Incremental Induction of Decision Tress. Machine Learning, 4, 161–186.
Wang, W., Yang, J., and Muntz, R. (1997). STING: A Statistical Information Grid Approach to Spatial Data Mining. In Proc. 23rd International Conference on Very Large Data Base (VLDB’97) (pp. 186–195). Athens, Green.
Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proc. 1996 ACMSIGMOD International Conference on Management of Data (SIGMOD’96) (pp.103–114). Montreal, Canada.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, J., Hsu, W. & Li Lee, M. Clustering in Dynamic Spatial Databases. J Intell Inf Syst 24, 5–27 (2005). https://doi.org/10.1007/s10844-005-0265-0
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10844-005-0265-0