Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

Genaud, Stéphane; Gançarski, Pierre; Latu, Guillaume; Blansché, Alexandre; Rattanapoka, Choopan; Vouriot, Damien

doi:10.1007/s11227-007-0136-2

Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

Published: 05 May 2007

Volume 43, pages 21–41, (2008)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Stéphane Genaud¹,
Pierre Gançarski²,
Guillaume Latu¹,
Alexandre Blansché²,
Choopan Rattanapoka¹ &
…
Damien Vouriot²

74 Accesses
5 Citations
Explore all metrics

Abstract

The goal of clustering is to identify subsets called clusters which usually correspond to objects that are more similar to each other than they are to objects from other clusters. We have proposed the MACLAW method, a cooperative coevolution algorithm for data clustering, which has shown good results (Blansché and Gançarski, Pattern Recognit. Lett. 27(11), 1299–1306, 2006). However the complexity of the algorithm increases rapidly with the number of clusters to find. We propose in this article a parallelization of MACLAW, based on a message-passing paradigm, as well as the analysis of the application performances with experiment results. We show that we reach near optimal speedups when searching for 16 clusters, a typical problem instance for which the sequential execution duration is an obstacle to the MACLAW method. Further, our approach is original because we use the P2P-MP1 grid middleware (Genaud and Rattanapoka, Lecture Notes in Comput. Sci., vol. 3666, pp. 276–284, 2005) which both provides the message passing library and infrastructure services to discover computing resources. We also put forward that the application can be tightly coupled with the middleware to make the parallel execution nearly transparent for the user.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perspectives of Fast Clustering Techniques

A survey on parallel clustering algorithms for Big Data

Article 06 October 2020

Zineb Dafir, Yasmine Lamari & Said Chah Slaoui

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Berkhin P (2002) Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA
Blansché A, Gançarski P (2006) MACLAW: a modular approach for clustering with local attribute weighting. Pattern Recognit Lett 27(11):1299–1306
Article Google Scholar
Cappello F et al (2005) Grid’5000: a large scale, reconfigurable, controlable and monitorable grid platform. In: Proceedings of the 6th IEEE/ACM international workshop on grid computing Grid’2005, November 2005. http://www.grid5000.org
Carpenter B, Getov V, Judd G, Skjellum T, Fox G (2000) MPJ: MPI-like message passing for Java. Concurr Pract Experience 12(11), September
Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit 37:943–952
Article MATH Google Scholar
Dhillon IS, Modha DS (2000) A data-clustering algorithm on distributed memory multiprocessors. In: Revised papers from large-scale parallel data mining, workshop on large-scale parallel KDD systems, SIGKDD Springer, New York, pp 245–260
Google Scholar
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan1 M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97
Article Google Scholar
Forman G, Zhang B (2000) Linear speedup for a parallel non-approximate recasting of centerbased clustering algorithms, including k-means, k-harmonic means, and em. In: ACM SIGKDD workshop on distributed and parallel knowledge discovery, KDD-2000
Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes. J Roy Stat Soc 66(4):815–849
Article MATH MathSciNet Google Scholar
Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recognit 34:567–581
Article Google Scholar
Gabriel E, Resch M, Beisel T, Keller R (1998) Distributed computing in an heterogeneous computing environment. In: EuroPVM/MPI. Lecture notes in comput sci, vol 1497. Springer, New York, pp 180–187
Google Scholar
Genaud S, Rattanapoka C (2005) A peer-to-peer framework for robust execution of message passing parallel programs. In: Di Martino B et al (eds) EuroPVM/MPI 2005. Lecture notes in comput sci, vol 3666. Springer, New York, pp 276–284, September
Google Scholar
Genaud S, Rattanapoka C (2007) Fault management in P2P-MPI. In: Proceedings of international conference on grid and pervasive computing, GPC’07. Lecture notes in comput sci. Springer, May
Genaud S, Rattanapoka C (2007) P2P-MPI: a peer-to-peer framework for robust execution of message passing parallel programs. J Grid Comput 5:27–42
Article Google Scholar
Gnanadesikan R, Kettenring JR, Tsao SL (1995) Weighting and selection of variables for cluster analysis. J Classif 12(1):113–136
Article MATH Google Scholar
Howe N, Cardie C (1997) Examining locally varying weights for nearest neighbor algorithms. In: ICCBR, pp 455–466
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(2):657–668
Article Google Scholar
JXTA http://www.jxta.org
Karonis NT, Toonen BT, Foster I (2003) MPICH-G2: a grid-enabled implementation of the message passing interface. J Parallel Distributed Comput special issue on Comput Grids 63(5):551–563, May
Article MATH Google Scholar
Kielmann T, Hofman RFH, Bal HE, Plaat A, Bhoedjang RAF (1999) MagPIe: MPI’s collective communication operations for clustered wide area systems. ACM SIGPLAN Notices 34(8):131–140, August
Article Google Scholar
Kruengkrai C, Jaruskulchai C (2002) A parallel learning algorithm for text classification. In: Eighth ACM SIGKDD international conference on knowledge discovery and data mining, July
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Berkeley, CA, 1967. University of California Press, pp 281–297
MPI (1995) A message passing interface standard, version 1.1. Technical report, University of Tennessee, Knoxville, TN, USA, Jun
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD explorations, newsletter of the ACM special interest group on knowledge discovery and data mining 6(1):90–106
Google Scholar
Shudo K, Tanaka Y, Sekiguchi S (2005) P3: P2P-based middleware enabling transfer and aggregation of computational resource. In: 5th intl workshop on global and peer-to-peer computing, in conjunc with CCGrid05. IEEE, May
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar

Download references

Author information

Authors and Affiliations

LSIIT-ICPS, Louis Pasteur University, Strasbourg – UMR 7005 CNRS-ULP, Blvd. S. Brant, BP 10413, 67412, Illkirch, France
Stéphane Genaud, Guillaume Latu & Choopan Rattanapoka
LSIIT-AFD, Louis Pasteur University, Strasbourg – UMR 7005 CNRS-ULP, Blvd. S. Brant, BP 10413, 67412, Illkirch, France
Pierre Gançarski, Alexandre Blansché & Damien Vouriot

Authors

Stéphane Genaud
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Gançarski
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Latu
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Blansché
View author publications
You can also search for this author in PubMed Google Scholar
Choopan Rattanapoka
View author publications
You can also search for this author in PubMed Google Scholar
Damien Vouriot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Gançarski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Genaud, S., Gançarski, P., Latu, G. et al. Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI. J Supercomput 43, 21–41 (2008). https://doi.org/10.1007/s11227-007-0136-2

Download citation

Received: 05 December 2006
Accepted: 27 March 2007
Published: 05 May 2007
Issue Date: January 2008
DOI: https://doi.org/10.1007/s11227-007-0136-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

Abstract

Access this article

Similar content being viewed by others

Perspectives of Fast Clustering Techniques

A survey on parallel clustering algorithms for Big Data

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

Abstract

Access this article

Similar content being viewed by others

Perspectives of Fast Clustering Techniques

A survey on parallel clustering algorithms for Big Data

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation