Skip to main content

An Improved K-Means Parallel Algorithm Based on Cloud Computing

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 901))

Abstract

Through deeply analyzing of the problem in K-Means algorithm, this topic proposed an improved scheme based on Hadoop distributed platform. Using the proposed clustering analysis system to configure the experimental environment, the algorithm is optimized from three aspects: parallel random sampling, parallelization of sample distance computation and parallelization of data clustering process. At the same time, the improved K-Means parallel algorithm flow was described in detail. The experimental result shows that the cluster analysis system based on Hadoop distributed cloud computing platform can provide efficient, stable and configurable clustering analysis service. Improved K-Means parallel clustering algorithm can quickly deal with large scale calculation of cluster analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deng, Q., Yang, Y.: Research on improved parallel K-means algorithm based on Spark framework. Intell. Comput. Appl. 8(01), 76–78 (2018)

    Google Scholar 

  2. Li, X., Yu, L., Lei, H., Tang, X.: A parallel implementation and application of K-means improved algorithm. J. Univ. Electron. Sci. Technol. China 46(01), 61–68 (2017)

    Google Scholar 

  3. Li, H.: Improved K-means clustering method and its application, pp. 15–17. Northeast Agricultural University (2014)

    Google Scholar 

  4. Li, G.B., Han Qing, J.: An improved K-means clustering algorithm for MapReduce parallelization. Digit. Technol. Appl. (12), 134–136 (2016)

    Google Scholar 

  5. Lu, S., Wang, J., Zhang, X., Gao, J.: Optimization of K-means clustering algorithm based on Hadoop platform. J. Inner Mongolia Univ. Sci. Technol. 35(03), 264–268 (2016)

    Google Scholar 

  6. Ran, J., Kou, C., Liu, R.: Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. J. Cloud Comput. Adv. Syst. Appl. 2(1), 1–10 (2013)

    Article  Google Scholar 

  7. Fu, C., Zhou, G.: Improved parallel sorting algorithm based on Hadoop. Softw. Guide 15(4), 68–70 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Li, D. (2018). An Improved K-Means Parallel Algorithm Based on Cloud Computing. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds) Data Science. ICPCSEE 2018. Communications in Computer and Information Science, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-13-2203-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2203-7_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2202-0

  • Online ISBN: 978-981-13-2203-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics