Reference Hub1
Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce

Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce

Soumeya Zerabi, Souham Meshoul, Samia Chikhi Boucherkha
Copyright: © 2020 |Volume: 11 |Issue: 3 |Pages: 26
ISSN: 1947-3532|EISSN: 1947-3540|EISBN13: 9781799807094|DOI: 10.4018/IJDST.2020070103
Cite Article Cite Article

MLA

Zerabi, Soumeya, et al. "Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce." IJDST vol.11, no.3 2020: pp.42-67. http://doi.org/10.4018/IJDST.2020070103

APA

Zerabi, S., Meshoul, S., & Boucherkha, S. C. (2020). Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce. International Journal of Distributed Systems and Technologies (IJDST), 11(3), 42-67. http://doi.org/10.4018/IJDST.2020070103

Chicago

Zerabi, Soumeya, Souham Meshoul, and Samia Chikhi Boucherkha. "Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce," International Journal of Distributed Systems and Technologies (IJDST) 11, no.3: 42-67. http://doi.org/10.4018/IJDST.2020070103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.