A New Approach for Distributed Density Based Clustering on Grid Platform

Le-Khac, Nhien-An; Aouad, Lamine M.; Kechadi, M-Tahar

doi:10.1007/978-3-540-73390-4_27

Nhien-An Le-Khac¹,
Lamine M. Aouad¹ &
M-Tahar Kechadi¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4587))

Included in the following conference series:

British National Conference on Databases

675 Accesses
11 Citations

Abstract

Many distributed data mining DDM tasks such as distributed association rules and distributed classification have been proposed and developed in the last few years. However, only a few research concerns distributed clustering for analysing large, heterogeneous and distributed datasets. This is especially true with distributed density-based clustering although the centralised versions of the technique have been widely used fin different real-world applications. In this paper, we present a new approach for distributed density-based clustering. Our approach is based on two main concepts: the extension of local models created by DBSCAN at each node of the system and the aggregation of these local models by using tree based topologies to construct global models. The preliminary evaluation shows that our approach is efficient and flexible and it is appropriate with high density datasets and a moderate difference in dataset distributions among the sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Distributed Approach Towards Density Based Clustering: D-TDCT

Article 13 July 2018

Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce

Distributed DENCLUE Algorithm Based on Apache Spark

References

Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A Framework for Knowledge Discovery on the Grid - from a Vision to Design and Implementation. In: Cracow Grid Workshop. Cracow, pp. 12–15 (December 2004)
Google Scholar
Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields. Southampton, UK, pp. 41–50, September 2002, WIT Press (2002)
Google Scholar
Edi, E., Kechadi, M-T., McNulty, R.: TreeP: A Self-Reconfigurable Topology for Unstructured P2P Systems. In: Workshop on State-of-the-Art in Scientific & Parallel Computing, Ume, Sweden (June 18-21, 2006)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering clusters in Large Spatial Databases with Noise. In: KDD 1996. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, California (1996)
Google Scholar
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Elsevier Press (2004)
Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD 1998), pp. 58–65, New York (1998)
Google Scholar
Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M-T.: Entity Based Peer-to-Peer in a Data Grid Environment. In: the 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation. Paris, France, July 2005, pp. 11–15 (2005)
Google Scholar
http://vis.computer.org/vis2004contest/data.html#format
Januzaj, E., Kriegel, H-P., Pfeifle, M.: DBDC: Density-Based Distributed Clustering. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 88–105. Springer, Heidelberg (2004)
Google Scholar
Januzaj, E., Kriegel, H-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 231–244. Springer, Heidelberg (2004)
Google Scholar
Kargupta, H., Chan, P.: Advances in distributed and Parallel Knowledge Discovery, 1st edn. AAAI Press/The MIT Press, London (2000)
Google Scholar
Le-Khac, N-A., Kechadi, M-T., Carthy, J.: ADMIRE: framework: Distributed data mining on data grid platforms. In: Proc. 1st Int. Conf. on Software and Data Technologies ICSOFT 2006, pp. 67–72 (2006)
Google Scholar
Li, M., Lee, G., Lee, W-C., Sivasubramaniam, A.: PENS: An algorithm for Density-Based Clustering in Peer-to-Peer Systems. In: Proceedings of the 1st international conference on Scalable information systems, May 30-June 01, 2006, Hong Kong, p. 39 (2006)
Google Scholar
LOCAL Location contexts for location-aware applications, http://get.dsi.uminho.pt/local/
Ratnasamy, S., Francis, P., Handley, M., Karp, R-M., Schenker, S.: A scalable content-addressable network. In: Proc. of ACM SIGCOMM, pp. 161–172 (August 2001)
Google Scholar
Silva, J-C., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed Data Mining and Agents. International Journal of Engineering Applications of Artificial Intelligence 18(7), 791–807 (2005)
Article Google Scholar
Xu, X., Jager, J., Kriegel, H-P.: A Fast Parallel Clustering Algorithm for Large Spatial Databases. Journal of Data. Mining and Knowledge Discovery 3, 263–290 (1999)
Article Google Scholar
Zhang, Bin, Hsu, M., Forman, G.: Distributed Data clustering System and Method. United States Patent, Patent No.: US 7,039,638 B2, Date of Patent (May 2, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, University College Dublin, Dublin 4, Ireland
Nhien-An Le-Khac, Lamine M. Aouad & M-Tahar Kechadi

Authors

Nhien-An Le-Khac
View author publications
You can also search for this author in PubMed Google Scholar
Lamine M. Aouad
View author publications
You can also search for this author in PubMed Google Scholar
M-Tahar Kechadi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Richard Cooper Jessie Kennedy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le-Khac, NA., Aouad, L.M., Kechadi, MT. (2007). A New Approach for Distributed Density Based Clustering on Grid Platform. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-73390-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73389-8
Online ISBN: 978-3-540-73390-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics