skip to main content
10.1145/3090354.3090370acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdcaConference Proceedingsconference-collections
research-article

Towards Clustering Validation in Big Data Context

Published: 29 March 2017 Publication History

Abstract

Clustering1is an essential task in many areas such as machine learning, data mining and computer vision among others. Cluster validation aims to assess the quality of partitions obtained by clustering algorithms. Several indexes have been developed for cluster validation purpose. They can be external or internal depending on the availability of ground truth clustering. This paper deals with the issue of cluster validation of large data set. Indeed, in the era of big data this task becomes even more difficult to handle and requires parallel and distributed approaches. In this work, we are interested in external validation indexes. More specifically, this paper proposes a model for purity based cluster validation in parallel and distributed manner using Map-Reduce paradigm in order to be able to scale with increasing dataset sizes.
The experimental results show that our proposed model is valid and achieves properly cluster validation of large datasets.

References

[1]
L.V. Bijuraj. 2013. Clustering and its applications. In Proceedings of National Conference on New Horizons in IT - NCNHIT.ISBN 978-93-82338-79-6,169--172.
[2]
Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. on Comp. Biology and Bioinformatics. 24--45.
[3]
Yogita K. Dubey and Milind M. Mushrif. 2012. Segmentation of brain MR images using intuitionistic Fuzzy clustering algorithm.ACM. In Proceedings of the Eighth ICVGIP
[4]
Arnau Oliver, Xavier Muñoz, Joan Batlle, Lluıs Pacheco, and Jordi Freixenet. 2006. Improving clustering algorithms for image segmentation using contour and region information.1--4244-0361-8/06 IEEE
[5]
Sarunyoo Boriratrit, Sirapat Chiewchanwattana, Khamron Sunat, Pakarat Musikawan, Punyaphol Horata. 2016. Harmonic extreme Learning Machine for data clustering. IEEE in 13th IJCSSE
[6]
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier. 2012. The seven practice areas of text analytics. Elsevier Inc.29-41.
[7]
Anil K. Jain and Richard C. Dubes. 1998. Algorithms for clustering data. Prentice Hall, Inc.ISBN 0-13-022278-X.
[8]
Marwan Hassani, Thomas Seidl. 2016. Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Springer. Vietnam The 2nd International Conference on Big Data, Cloud and Applications, March 29-30, 2017. J Comput Sci. DOI 10.1007/s40595-016-0086-9.
[9]
Mohammed J. Zaki and Wagner Meira Jr. 2014. Data mining and analysis: fundamental concepts and algorithms. ISBN 978-0-521-76633-3 (hardback).
[10]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. CACM.51(1).107--113.
[11]
Ibrahim Abaker Targio Hashem, Nor Badrul Anuar, Abdullah Gani, Ibrar Yaqoob, Feng Xia, Samee Ullah Khan. 2016. Map Reduce: review and open challenges. Springer. Scientometrics 389--422. DOI 10.1007/s11192-016-1945-y.
[12]
Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu and Sen Wu. 2013. Understanding and enhancement of internal clustering validation measures, IEEE Transactions on Cybernetics, vol. 43(3). 982--993.
[13]
Maria Halkidi, Yannis Batistakis, Michalis Vazirgiannis. 2001. On clustering validation techniques. Journal of Intelligent Information Systems, 17:2/3, 107--145.
[14]
Eréndira Rendón, Itzel Abundez, Alejandra Arizmendi and Elvia M. Quiroz. 2011. Internal versus external cluster validation indexes, International Journal of Computers and Communications 5(1). 27--34.
[15]
Prajesh P Anchalia. 2014. Improved MapReduce k-Means clustering algorithm with combiner, UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, IEEE. 385--390. DOI 10.1109/UKSim.
[16]
Handl J and Knowles J. 2005. Improvements to the scalability of multi objective clustering. IEEE Congress on Evolutionary Computation, Vol. 3. 2372--2379.

Cited By

View all
  • (2020)Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduceInternational Journal of Distributed Systems and Technologies10.4018/IJDST.202007010311:3(42-67)Online publication date: 1-Jul-2020
  • (2018)Parallel Clustering Validation Based on MapReduceAdvances in Computing Systems and Applications10.1007/978-3-319-98352-3_31(291-299)Online publication date: 10-Aug-2018
  • (2018)Parallel and Distributed Map-Reduce Models for External Clustering Validation IndexesCloud Computing and Big Data: Technologies, Applications and Security10.1007/978-3-319-97719-5_15(220-240)Online publication date: 28-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications
March 2017
685 pages
ISBN:9781450348522
DOI:10.1145/3090354
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Ministère de I'enseignement supérieur: Ministère de I'enseignement supérieur

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster validation index
  2. distributed computing
  3. parallel computing
  4. partitioned clustering
  5. purity index

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BDCA'17

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduceInternational Journal of Distributed Systems and Technologies10.4018/IJDST.202007010311:3(42-67)Online publication date: 1-Jul-2020
  • (2018)Parallel Clustering Validation Based on MapReduceAdvances in Computing Systems and Applications10.1007/978-3-319-98352-3_31(291-299)Online publication date: 10-Aug-2018
  • (2018)Parallel and Distributed Map-Reduce Models for External Clustering Validation IndexesCloud Computing and Big Data: Technologies, Applications and Security10.1007/978-3-319-97719-5_15(220-240)Online publication date: 28-Jul-2018
  • (2017)External clustering validation in big data context2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech)10.1109/CloudTech.2017.8284735(1-6)Online publication date: Oct-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media