Abstract
Many distributed data mining DDM tasks such as distributed association rules and distributed classification have been proposed and developed in the last few years. However, only a few research concerns distributed clustering for analysing large, heterogeneous and distributed datasets. This is especially true with distributed density-based clustering although the centralised versions of the technique have been widely used fin different real-world applications. In this paper, we present a new approach for distributed density-based clustering. Our approach is based on two main concepts: the extension of local models created by DBSCAN at each node of the system and the aggregation of these local models by using tree based topologies to construct global models. The preliminary evaluation shows that our approach is efficient and flexible and it is appropriate with high density datasets and a moderate difference in dataset distributions among the sites.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A Framework for Knowledge Discovery on the Grid - from a Vision to Design and Implementation. In: Cracow Grid Workshop. Cracow, pp. 12–15 (December 2004)
Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields. Southampton, UK, pp. 41–50, September 2002, WIT Press (2002)
Edi, E., Kechadi, M-T., McNulty, R.: TreeP: A Self-Reconfigurable Topology for Unstructured P2P Systems. In: Workshop on State-of-the-Art in Scientific & Parallel Computing, Ume, Sweden (June 18-21, 2006)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering clusters in Large Spatial Databases with Noise. In: KDD 1996. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, California (1996)
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Elsevier Press (2004)
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD 1998), pp. 58–65, New York (1998)
Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M-T.: Entity Based Peer-to-Peer in a Data Grid Environment. In: the 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation. Paris, France, July 2005, pp. 11–15 (2005)
Januzaj, E., Kriegel, H-P., Pfeifle, M.: DBDC: Density-Based Distributed Clustering. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 88–105. Springer, Heidelberg (2004)
Januzaj, E., Kriegel, H-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 231–244. Springer, Heidelberg (2004)
Kargupta, H., Chan, P.: Advances in distributed and Parallel Knowledge Discovery, 1st edn. AAAI Press/The MIT Press, London (2000)
Le-Khac, N-A., Kechadi, M-T., Carthy, J.: ADMIRE: framework: Distributed data mining on data grid platforms. In: Proc. 1st Int. Conf. on Software and Data Technologies ICSOFT 2006, pp. 67–72 (2006)
Li, M., Lee, G., Lee, W-C., Sivasubramaniam, A.: PENS: An algorithm for Density-Based Clustering in Peer-to-Peer Systems. In: Proceedings of the 1st international conference on Scalable information systems, May 30-June 01, 2006, Hong Kong, p. 39 (2006)
LOCAL Location contexts for location-aware applications, http://get.dsi.uminho.pt/local/
Ratnasamy, S., Francis, P., Handley, M., Karp, R-M., Schenker, S.: A scalable content-addressable network. In: Proc. of ACM SIGCOMM, pp. 161–172 (August 2001)
Silva, J-C., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed Data Mining and Agents. International Journal of Engineering Applications of Artificial Intelligence 18(7), 791–807 (2005)
Xu, X., Jager, J., Kriegel, H-P.: A Fast Parallel Clustering Algorithm for Large Spatial Databases. Journal of Data. Mining and Knowledge Discovery 3, 263–290 (1999)
Zhang, Bin, Hsu, M., Forman, G.: Distributed Data clustering System and Method. United States Patent, Patent No.: US 7,039,638 B2, Date of Patent (May 2, 2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Le-Khac, NA., Aouad, L.M., Kechadi, MT. (2007). A New Approach for Distributed Density Based Clustering on Grid Platform. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-73390-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73389-8
Online ISBN: 978-3-540-73390-4
eBook Packages: Computer ScienceComputer Science (R0)