Skip to main content

Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

  • Conference paper
  • First Online:
Book cover Intelligent Systems Design and Applications (ISDA 2018 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Abstract

K-means algorithm, the most classic partition-based clustering method, has its disadvantages. If there are outliers in the data sets, the K-means algorithm may lead to serious deviation of the mean value. In addition, random initialization is very sensitive to the input data parameters. In this paper, we propose initialization and outlier detection based on distance and density for the K-means algorithm (KMIDDO), an improvement method to optimize the initial center points, especially it has more effective in the case of outliers. What’s more, we extend an outlier detection method to improve the clustering effect. We hope the distance between every two center points is as far as possible and the density of the center points are as large as they can. In terms of initialization, we calculate the distance and density of points. In the outliers detection, we take the outliers as a single class based on the distance and density. Experiments are conducted to illustrate the effectiveness and accuracy of the proposed algorithms on several synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, J., Ke, Q., Li, S., Wang, J.: Approximate k-means via cluster closures (2017)

    Google Scholar 

  2. Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: International Conference on Future Information Technology and Management Engineering, pp. 476–480 (2010)

    Google Scholar 

  3. Xu, J., Han, J., Nie, F., Li, X.: Re-weighted discriminatively embedded \(k\)-means for multi-view clustering. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 26(6), 3016–3027 (2017)

    Article  MathSciNet  Google Scholar 

  4. Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)

    Google Scholar 

  5. Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)

    Article  Google Scholar 

  6. Jiang, F., Liu, G., Junwei, D., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)

    Article  Google Scholar 

  7. Ai, H., Li, W.: K-means initial clustering center optimal algorithm based on estimating density and refining initial. In: Information Science and Service Science and Data Mining, pp. 603–606 (2013)

    Google Scholar 

  8. Gan, G., Chen, K.: A soft subspace clustering algorithm with log-transformed distances. Big Data Inf. Anal. 1(1), 93–109 (2015)

    Google Scholar 

  9. Li, X., Lv, J., Li, L., Ao, F.: An angle and density-based method for key points detection. In: International Joint Conference on Neural Networks, pp. 3682–3688 (2016)

    Google Scholar 

  10. Gan, G., Ng, K.P.: K-means Clustering with Outlier Removal. Elsevier Science Inc., New York (2017)

    Book  Google Scholar 

  11. Suleman, A.: Assessing a Fuzzy Extension of Rand Index and Related Measures. IEEE Press (2017)

    Google Scholar 

  12. Coelho, G.P., Barbante, C.C., Boccato, L., Attux, R.R.F., Oliveira, J.R., Von Zuben, F.J.: Automatic feature selection for BCI: an analysis using the davies-bouldin index and extreme learning machines. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)

    Google Scholar 

  13. Chawla, S., Gionis, A.: K-means-: A unified approach to clustering and outlier detection (2013)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61702218, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenxiang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, Q. et al. (2020). Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_49

Download citation

Publish with us

Policies and ethics