Skip to main content

A New Approach for Distributed Density Based Clustering on Grid Platform

  • Conference paper
Data Management. Data, Data Everywhere (BNCOD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4587))

Included in the following conference series:

Abstract

Many distributed data mining DDM tasks such as distributed association rules and distributed classification have been proposed and developed in the last few years. However, only a few research concerns distributed clustering for analysing large, heterogeneous and distributed datasets. This is especially true with distributed density-based clustering although the centralised versions of the technique have been widely used fin different real-world applications. In this paper, we present a new approach for distributed density-based clustering. Our approach is based on two main concepts: the extension of local models created by DBSCAN at each node of the system and the aggregation of these local models by using tree based topologies to construct global models. The preliminary evaluation shows that our approach is efficient and flexible and it is appropriate with high density datasets and a moderate difference in dataset distributions among the sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A Framework for Knowledge Discovery on the Grid - from a Vision to Design and Implementation. In: Cracow Grid Workshop. Cracow, pp. 12–15 (December 2004)

    Google Scholar 

  2. Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields. Southampton, UK, pp. 41–50, September 2002, WIT Press (2002)

    Google Scholar 

  3. Edi, E., Kechadi, M-T., McNulty, R.: TreeP: A Self-Reconfigurable Topology for Unstructured P2P Systems. In: Workshop on State-of-the-Art in Scientific & Parallel Computing, Ume, Sweden (June 18-21, 2006)

    Google Scholar 

  4. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering clusters in Large Spatial Databases with Noise. In: KDD 1996. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, California (1996)

    Google Scholar 

  5. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Elsevier Press (2004)

    Google Scholar 

  6. Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD 1998), pp. 58–65, New York (1998)

    Google Scholar 

  7. Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M-T.: Entity Based Peer-to-Peer in a Data Grid Environment. In: the 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation. Paris, France, July 2005, pp. 11–15 (2005)

    Google Scholar 

  8. http://vis.computer.org/vis2004contest/data.html#format

  9. Januzaj, E., Kriegel, H-P., Pfeifle, M.: DBDC: Density-Based Distributed Clustering. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 88–105. Springer, Heidelberg (2004)

    Google Scholar 

  10. Januzaj, E., Kriegel, H-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 231–244. Springer, Heidelberg (2004)

    Google Scholar 

  11. Kargupta, H., Chan, P.: Advances in distributed and Parallel Knowledge Discovery, 1st edn. AAAI Press/The MIT Press, London (2000)

    Google Scholar 

  12. Le-Khac, N-A., Kechadi, M-T., Carthy, J.: ADMIRE: framework: Distributed data mining on data grid platforms. In: Proc. 1st Int. Conf. on Software and Data Technologies ICSOFT 2006, pp. 67–72 (2006)

    Google Scholar 

  13. Li, M., Lee, G., Lee, W-C., Sivasubramaniam, A.: PENS: An algorithm for Density-Based Clustering in Peer-to-Peer Systems. In: Proceedings of the 1st international conference on Scalable information systems, May 30-June 01, 2006, Hong Kong, p. 39 (2006)

    Google Scholar 

  14. LOCAL Location contexts for location-aware applications, http://get.dsi.uminho.pt/local/

  15. Ratnasamy, S., Francis, P., Handley, M., Karp, R-M., Schenker, S.: A scalable content-addressable network. In: Proc. of ACM SIGCOMM, pp. 161–172 (August 2001)

    Google Scholar 

  16. Silva, J-C., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed Data Mining and Agents. International Journal of Engineering Applications of Artificial Intelligence 18(7), 791–807 (2005)

    Article  Google Scholar 

  17. Xu, X., Jager, J., Kriegel, H-P.: A Fast Parallel Clustering Algorithm for Large Spatial Databases. Journal of Data. Mining and Knowledge Discovery 3, 263–290 (1999)

    Article  Google Scholar 

  18. Zhang, Bin, Hsu, M., Forman, G.: Distributed Data clustering System and Method. United States Patent, Patent No.: US 7,039,638 B2, Date of Patent (May 2, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Richard Cooper Jessie Kennedy

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Le-Khac, NA., Aouad, L.M., Kechadi, MT. (2007). A New Approach for Distributed Density Based Clustering on Grid Platform. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73390-4_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73389-8

  • Online ISBN: 978-3-540-73390-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics