Abstract
Clustering is a hot topic of data mining. After studying the existing classical algorithm of clustering, this paper proposes a new clustering algorithm based on probability, and makes a new definition for clustering and outlier. According to the distribution characteristics of sample data, this algorithm determines the initial clustering center automatically. It also implements eliminating outliers in the process of clustering. The experiment results on IRIS show that this algorithm can clustering effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhai, D., et al.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Application Research of Computer 31(3), 713–715 (2014)
Xia, L.N., Jing, J.W.: SA-DBSCAN: A self-adaptive density-based clustering algorithm. Journal of the Graduate School of the Chinese Academy of Sciences 26(4), 530–538 (2009)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam, L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Post & Telecom Press, Beijing (2006)
Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J.W., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)
Shen, H.: Probability and Statistics, 5th edn. Higher Education Press, Beijing (2011)
Yu, Y., Zhou, A.: An Improved Algorithm of DBSCAN. Computer Technology and Development 21(2), 30–33, 38 (2011)
Daszykowski, M., Walczak, B., Massart, D.L.: Looking for Natural Patterns In Data. Chemometrics and Intelligent Laboratory Systems 56(2), 83–92 (2001)
Chen, S., He, Y.J., Zhen, M.G.: NPP-oriented intelligent diagnose. Nuclear Power Engineering and Technology (3), 20–24 (2003)
Center for Machine Learning and Intelligent Systems at the University of California, Irvine, http://archive.ics.uci.edu/ml/datasets/Iris
Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yue, Z., Chuansheng, Z. (2014). A New Clustering Algorithm Based on Probability. In: Pan, JS., Snasel, V., Corchado, E., Abraham, A., Wang, SL. (eds) Intelligent Data analysis and its Applications, Volume II. Advances in Intelligent Systems and Computing, vol 298. Springer, Cham. https://doi.org/10.1007/978-3-319-07773-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-07773-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07772-7
Online ISBN: 978-3-319-07773-4
eBook Packages: EngineeringEngineering (R0)