Parallel Two-Phase K-Means

Nguyen, Cuong Duc; Nguyen, Dung Tien; Pham, Van-Hau

doi:10.1007/978-3-642-39640-3_16

Cuong Duc Nguyen²⁴,
Dung Tien Nguyen²⁴ &
Van-Hau Pham²⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7975))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1866 Accesses
9 Citations

Abstract

In this paper, a new parallel version of Two-Phase K-means, called Parallel Two-Phase K-means (Par2PK-means), is introduced to overcome limits of available parallel versions. Par2PK-means is developed and executed on the MapReduce framework. It is divided into two phases. In the first phase, Mappers independently work on data segments to create an intermediate data. In the second phase, the intermediate data collected from Mappers are clustered by the Reducer to create the final clustering result. Testing on large data sets, the newly proposed algorithm attained a good speedup ratio, closing to the linearly speed-up ratio, when comparing to the sequential version Two-Phase K-means.

The work is supported by DOST, Hochiminh City under the contract number 283/2012/HD-SKHCN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Optimisation Techniques for Parallel K-Means on MapReduce

A Novel MapReduce Based k-Means Clustering

Accelerating K-Means by Grouping Points Automatically

References

Zhang, Y., Xiong, Z., Mao, J., Ou, L.: The Study of Parallel K-Means Algorithm. In: Proceedings of the Sixth World Congress on Intelligent Control and Automation (WCICA 2006), vol. 2, pp. 5868–5871 (2006)
Google Scholar
Tian, J., Zhu, L., Zhang, S., Liu, L.: Improvement and Parallelism of k-Means Clustering Algorithm. Tsinghua Science & Technology 10(3), 277–281 (2005)
Article MathSciNet Google Scholar
Kraj, P., Sharma, A., Garge, N., Podolsky, R., McIndoe, R.A.: ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use. BMC Bioinformatics 9, 200 (2008)
Article Google Scholar
Pakhira, M.K.: Clustering Large Databases in Distributed Environment. In: IEEE International Advance Computing Conference (IACC 2009), pp. 351–358 (2009)
Google Scholar
Kantabutra, S., Couch, A.L.: Parallel K-means clustering algorithm on NOWs. NECTEC Technical Journal 1(6), 243–247 (2000)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations, Berkeley, California, vol. (1), pp. 281–297. University of California Press, Los Angeles (1967)
Google Scholar
Pham, D.T., Dimov, S.S., Nguyen, C.D.: An Incremental K-means Algorithm. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218, 783–795 (2004)
Article Google Scholar
Pham, D.T., Dimov, S.S., Nguyen, C.D.: A two-phase k-means algorithm for large datasets. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218(10), 1269–1273 (2004)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, pp. 137–150 (2004)
Google Scholar
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)
Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel K-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhou, P., Lei, J., Ye, W.: Large-Scale Data Sets Clustering Based on MapReduce and Hadoop. Journal of Computational Information Systems 7(16), 5956–5963 (2011)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010), http://archive.ics.uci.edu/ml
Google Scholar
VMware virtualization technology, http://www.vmware.com (accessed in May 2013)
Kernel based virtual machine, http://www.linux-kvm.org (accessed in May 2013)
Linux Foundation Collaborative Projects, http://www.xen.org/products/xenhyp.html (Last accessed in May 2013)
Openstack: Open source software for building private and public cloud, http://www.openstack.org/ (Last accessed in May 2013)

Download references

Author information

Authors and Affiliations

International University – VNU-HCM, Vietnam
Cuong Duc Nguyen, Dung Tien Nguyen & Van-Hau Pham

Authors

Cuong Duc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Dung Tien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Van-Hau Pham
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

L-I.S.U.T. - D.A.P.I.t. Facoltà Ingegneria, Università degli Studi della Basilicata, Viale dell’Ateneo Lucano, 10, 85100, Potenza, Italy
Beniamino Murgante
Covenant University, Canaanland OTA, Nigeria
Sanjay Misra
Partimento di Scienze e Tecnologie per LAgricoltura, le Foreste, la Natura e lEnergia, Università degli Studi della Tuscia, Via S. Camillo de Lellis, snc, 01100, Viterbo, Italy
Maurizio Carlini
Dipartimento di Scienze dell’Ingegneria Civile e dell’Architecttura, Politecnico di Bari, Via Orabona, 4, 70125, Bari, Italy
Carmelo M. Torre
International University VNU-HCM, Quarter 6, Linh Trung, Thu Duc, Ho Chi Minh City, Vietnam
Hong-Quang Nguyen
School of Business Systems, Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, 813-8503, Higashi-ku, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, C.D., Nguyen, D.T., Pham, VH. (2013). Parallel Two-Phase K-Means. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-39640-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Two-Phase K-Means

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Optimisation Techniques for Parallel K-Means on MapReduce

A Novel MapReduce Based k-Means Clustering

Accelerating K-Means by Grouping Points Automatically

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallel Two-Phase K-Means

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Optimisation Techniques for Parallel K-Means on MapReduce

A Novel MapReduce Based k-Means Clustering

Accelerating K-Means by Grouping Points Automatically

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation