Abstract
K-medoid algorithm has recently become a highly active and most discussed topic. It is better than k-means as it is more robust and less sensitive to outliers, but it itself has drawbacks such as number of medoids should be given in advance which is hard to determine and the initial k-clustering centers need to be chosen at random.
This article focuses on new modified k-medoid++ algorithm, which is a proposed algorithm for increasing the processing speed and efficiency of K-medoid algorithm.
However, not only modifying the algorithm increases the processing speed, but selecting appropriate framework to efficiently run the algorithm has its own perquisites.
Apache Hadoop and Spark provide an effective open source solution for big data. Many researchers are making false interpretations about these frameworks regarding the performance and efficiency.
In this paper, the performance of both the frameworks are compared by implementing simple k-medoid algorithm and then selecting the appropriate tool for modified k-medoid++ algorithm. It was also observed on implementing the k-medoid algorithm, that on selecting initial medoids randomly was giving random results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Assefi, M., Behravesh, E., Liu, G., Tafti, A.P.: Big data machine learning using apache spark MLlib. In: 2017 IEEE International Conference on Big Data (2017)
Han, D., Agrawal, A., Liao, W.-K., Choudhary, A.: A novel scalable DBSCAN algorithm with spark. In: IEEE Conference Publication, 04 August 2016
Martino, A., Rizzi, A., Mascioli, F.M.: Efficient approaches for solving the large scale k-medoids problem. In: 9th IJCCI (2017)
Jaiswal, A., Yadav, O.P.: Analyzing and enhancing processing speed for knowledge discovery from Big Data using Hadoop Framework. In: National Conference on Information Technology & Security Applications(NCITSA 2019) (2019). ISBN No. 9781-940543-0-6
Song, H., Lee, J.-G., Han, W.-S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, 13–17 August 2017, Halifax, NS, Canada (2017)
Omair Shafiq, M., Torunski, E: A parallel k-medoids algorithm for clustering based on MapReduce. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (2016)
Yue, J., Mao, S., Li, M., et al.: An efficient PAM spatial clustering algorithm based on MapReduce. In: 2014 22nd International Conference on IEEE (2014)
Jiang, Y., Zhang, J.: Parallel K-Medoids clustering algorithm based on Hadoop. In: 2014 IEEE 5th International Conference on Software Engineering and Service Science (2014)
Vijayalaksmi, S., Punithavalli, M.: A fast approach to clustering datasets using DBSCAN and pruning algorithms. Int. J. Comput. Appl. (0975 – 8887) 60(14), 1–7 (2012)
Verma, J.P., Patel, A.: Comparison of MapReduce and Spark programming frameworks for big data analytics on HDFS. IJCSC 7(2), 180–184 (2016)
Fu, J., Sun, J., Wang, K.: Spark – a big data processing platform for machine learning. In: 2016 IEEE, International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information integration (ICIICII) (2016)
Richter, A.N., Khoshgoftaar, T.M., Landset, S., Hasanin, T.: A multi-dimensional comparison of toolkits for machine learning with Big data. In: 2015 IEEE 16th International Conference on Information Reuse and Integration (2015)
Srinivas Jonnalagadda, V., Srikanth, P., Thumati, K.: A review study of apache spark in big data processing. Int. J. Comput. Sci. Trends Technol. (IJCST) 4(3), 93–98 (2016)
UCI Machine learning repository
Nandakumar, A.N., Yambem, N.: A survey on data mining algorithms on Apache Hadoop Platform. Int. J. Emerg. Technol. Adv. Eng. 4(1), 563–565 (2014)
https://www.dezyre.com/article/apache-spark-architecture-explained-in-detail/338
https://medium.com/better-programming/high-level-overview-of-apache-spark-c225a0a162e9
Zhu, Y., Wang, F., Sang, X., Lv, X.: K-medoids clustering based on MapReduce and optimal search of medoids. In: The 9th International Conference on Computer Science and Education (ICCSE 2014), Vancouver, Canada, 24 August (2014)
Liu, A., Zuo, S., Qui, T., Bai, X.: Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce. J. Residuals Sci. Technol. 13, e4015 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jaiswal, A., Dwivedi, V.K., Yadav, O.P. (2021). Analyzing and Enhancing Processing Speed of K-Medoid Algorithm Using Efficient Large Scale Processing Frameworks. In: Abraham, A., Shandilya, S., Garcia-Hernandez, L., Varela, M. (eds) Hybrid Intelligent Systems. HIS 2019. Advances in Intelligent Systems and Computing, vol 1179. Springer, Cham. https://doi.org/10.1007/978-3-030-49336-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-49336-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49335-6
Online ISBN: 978-3-030-49336-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)