Skip to main content

Analyzing and Enhancing Processing Speed of K-Medoid Algorithm Using Efficient Large Scale Processing Frameworks

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1179))

Included in the following conference series:

  • 562 Accesses

Abstract

K-medoid algorithm has recently become a highly active and most discussed topic. It is better than k-means as it is more robust and less sensitive to outliers, but it itself has drawbacks such as number of medoids should be given in advance which is hard to determine and the initial k-clustering centers need to be chosen at random.

This article focuses on new modified k-medoid++ algorithm, which is a proposed algorithm for increasing the processing speed and efficiency of K-medoid algorithm.

However, not only modifying the algorithm increases the processing speed, but selecting appropriate framework to efficiently run the algorithm has its own perquisites.

Apache Hadoop and Spark provide an effective open source solution for big data. Many researchers are making false interpretations about these frameworks regarding the performance and efficiency.

In this paper, the performance of both the frameworks are compared by implementing simple k-medoid algorithm and then selecting the appropriate tool for modified k-medoid++ algorithm. It was also observed on implementing the k-medoid algorithm, that on selecting initial medoids randomly was giving random results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Assefi, M., Behravesh, E., Liu, G., Tafti, A.P.: Big data machine learning using apache spark MLlib. In: 2017 IEEE International Conference on Big Data (2017)

    Google Scholar 

  2. Han, D., Agrawal, A., Liao, W.-K., Choudhary, A.: A novel scalable DBSCAN algorithm with spark. In: IEEE Conference Publication, 04 August 2016

    Google Scholar 

  3. Martino, A., Rizzi, A., Mascioli, F.M.: Efficient approaches for solving the large scale k-medoids problem. In: 9th IJCCI (2017)

    Google Scholar 

  4. Jaiswal, A., Yadav, O.P.: Analyzing and enhancing processing speed for knowledge discovery from Big Data using Hadoop Framework. In: National Conference on Information Technology & Security Applications(NCITSA 2019) (2019). ISBN No. 9781-940543-0-6

    Google Scholar 

  5. Song, H., Lee, J.-G., Han, W.-S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, 13–17 August 2017, Halifax, NS, Canada (2017)

    Google Scholar 

  6. Omair Shafiq, M., Torunski, E: A parallel k-medoids algorithm for clustering based on MapReduce. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (2016)

    Google Scholar 

  7. Yue, J., Mao, S., Li, M., et al.: An efficient PAM spatial clustering algorithm based on MapReduce. In: 2014 22nd International Conference on IEEE (2014)

    Google Scholar 

  8. Jiang, Y., Zhang, J.: Parallel K-Medoids clustering algorithm based on Hadoop. In: 2014 IEEE 5th International Conference on Software Engineering and Service Science (2014)

    Google Scholar 

  9. Vijayalaksmi, S., Punithavalli, M.: A fast approach to clustering datasets using DBSCAN and pruning algorithms. Int. J. Comput. Appl. (0975 – 8887) 60(14), 1–7 (2012)

    Google Scholar 

  10. Verma, J.P., Patel, A.: Comparison of MapReduce and Spark programming frameworks for big data analytics on HDFS. IJCSC 7(2), 180–184 (2016)

    Google Scholar 

  11. Fu, J., Sun, J., Wang, K.: Spark – a big data processing platform for machine learning. In: 2016 IEEE, International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information integration (ICIICII) (2016)

    Google Scholar 

  12. Richter, A.N., Khoshgoftaar, T.M., Landset, S., Hasanin, T.: A multi-dimensional comparison of toolkits for machine learning with Big data. In: 2015 IEEE 16th International Conference on Information Reuse and Integration (2015)

    Google Scholar 

  13. Srinivas Jonnalagadda, V., Srikanth, P., Thumati, K.: A review study of apache spark in big data processing. Int. J. Comput. Sci. Trends Technol. (IJCST) 4(3), 93–98 (2016)

    Google Scholar 

  14. UCI Machine learning repository

    Google Scholar 

  15. Nandakumar, A.N., Yambem, N.: A survey on data mining algorithms on Apache Hadoop Platform. Int. J. Emerg. Technol. Adv. Eng. 4(1), 563–565 (2014)

    Google Scholar 

  16. https://www.dezyre.com/article/apache-spark-architecture-explained-in-detail/338

  17. https://www.edureka.co/blog/spark-architecture/

  18. https://medium.com/better-programming/high-level-overview-of-apache-spark-c225a0a162e9

  19. Zhu, Y., Wang, F., Sang, X., Lv, X.: K-medoids clustering based on MapReduce and optimal search of medoids. In: The 9th International Conference on Computer Science and Education (ICCSE 2014), Vancouver, Canada, 24 August (2014)

    Google Scholar 

  20. Liu, A., Zuo, S., Qui, T., Bai, X.: Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce. J. Residuals Sci. Technol. 13, e4015 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Kumar Dwivedi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaiswal, A., Dwivedi, V.K., Yadav, O.P. (2021). Analyzing and Enhancing Processing Speed of K-Medoid Algorithm Using Efficient Large Scale Processing Frameworks. In: Abraham, A., Shandilya, S., Garcia-Hernandez, L., Varela, M. (eds) Hybrid Intelligent Systems. HIS 2019. Advances in Intelligent Systems and Computing, vol 1179. Springer, Cham. https://doi.org/10.1007/978-3-030-49336-3_14

Download citation

Publish with us

Policies and ethics