Parallel K-prototypes for Clustering Big Data

HajKacem, Mohamed Aymen Ben; N’cir, Chiheb-Eddine Ben; Essoussi, Nadia

doi:10.1007/978-3-319-24306-1_61

Mohamed Aymen Ben HajKacem¹⁷,
Chiheb-Eddine Ben N’cir¹⁷ &
Nadia Essoussi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

2261 Accesses
3 Citations

Abstract

Big data clustering has become an important challenge in data mining. Indeed, Big data are often characterized by a huge volume and a variety of attributes namely, numerical and categorical. To deal with these challenges, we propose the parallel k-prototypes method which is based on the Map-Reduce model. This method is able to perform efficient groupings on large-scale and mixed type of data. Experiments realized on huge data sets show the performance of the proposed method in clustering large-scale of mixed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proceedings of the VLDB Endowment 5(7), 622–633 (2012)
Article Google Scholar
Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing 70(3), 1249–1259 (2014)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Gorodetsky, V.: Big data: opportunities, challenges and solutions. In: Ermolayev, V., Mayr, H.C., Nikitchenko, M., Spivakovsky, A., Zholtkevych, G. (eds.) ICTERI 2014. CCIS, vol. 469, pp. 3–22. Springer, Heidelberg (2014)
Google Scholar
Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)
Article Google Scholar
Hadian, A., Shahrivari, S.: High performance parallel k-means clustering for disk-resident datasets on multi-core CPUs. The Journal of Supercomputing 69(2), 845–863 (2014)
Article Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Google Scholar
Kim, Y., Shim, K., Kim, M.S., Lee, J.S.: DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce. Information Systems 42, 15–35 (2014)
Article Google Scholar
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. Knowledge and Data Engineering 14(4), 673–690 (2002)
Article Google Scholar
Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on mapreduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014)
Chapter Google Scholar
Ludwig, S.A.: MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability. International Journal of Machine Learning and Cybernetics, 1–12 (2015)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 14(1), 281–297 (1967)
MathSciNet MATH Google Scholar
Vattani, A.: K-means requires exponentially many iterations even in the plane. Discrete Computational Geometry 45(4), 596–616 (2011)
Article MathSciNet MATH Google Scholar
Xu, X., Jger, J., Kriegel, H.P.: A fast parallel clustering algorithm for large spatial databases. High Performance Data Mining, 263–290 (2002)
Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. Cloud Computing, 674–679 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

LARODEC, Université de Tunis, Institut Supérieur de Gestion de Tunis, 41 Avenue de la Liberté, Cité Bouchoucha, 2000, Le Bardo, Tunisia
Mohamed Aymen Ben HajKacem, Chiheb-Eddine Ben N’cir & Nadia Essoussi

Authors

Mohamed Aymen Ben HajKacem
View author publications
You can also search for this author in PubMed Google Scholar
Chiheb-Eddine Ben N’cir
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Essoussi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Aymen Ben HajKacem .

Editor information

Editors and Affiliations

Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Computer Science Department, Universidad Autónoma De Madrid, Madrid, Spain
David Camacho
Wroclaw University of Technology, Wroclaw, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

HajKacem, M.A.B., N’cir, CE.B., Essoussi, N. (2015). Parallel K-prototypes for Clustering Big Data. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-24306-1_61
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics