Abstract
In the field of cluster analysis, most of existing algorithms are developed for small data sets, which cannot effectively process the large data sets encountered in data mining. Moreover, most clustering algorithms consider the contribution of each sample for classification uniformly. In fact, different samples should be of different contribution for clustering result. For this purpose, a novel typical-sample-weighted clustering algorithm is proposed for large data sets. By the atom clustering, the new algorithm extracts the typical samples to reduce the data amount. Then the extracted samples are weighted by their corresponding typicality and then clustered by the classical fuzzy c-means (FCM) algorithm. Finally, the Mahalanobis distance is employed to classify each original sample into obtained clusters. It is obvious that the novel algorithm can improve the speed and robustness of the traditional FCM algorithm. The experimental results with various test data sets illustrate the effectiveness of the proposed clustering algorithm.
This work was supported by National Natural Science Foundation of China (No.60202004), the Key project of Chinese Ministry of Education (No.104173) and the program for New Century Excellent Talents in University of China.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Qing, H.: Advance of the theory and application of fuzzy clustering analysis. Fuzzy System and Fuzzy Mathematics 12(2), 89–94 (1998) (in Chinese)
Gao, X.: Optimization and Applications Research on Fuzzy Clustering Algorithms. Doctoral Thesis, Xidian University, Xi’an 710071, China (1999)
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, London (1973)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)
Everitt, B.: Cluster Analysis, pp. 45–60. Heinemann Educational Books Ltd., New York (1974)
Gao, X., Li, J., Ji, H.: An automatic multi-threshold image segmentation algorithm based on weighting FCM and statistical test. Acta Electronica Sinica 32(4), 661–664 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, J., Gao, X., Jiao, L. (2005). A Novel Typical-Sample-Weighted Clustering Algorithm for Large Data Sets. In: Hao, Y., et al. Computational Intelligence and Security. CIS 2005. Lecture Notes in Computer Science(), vol 3801. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596448_103
Download citation
DOI: https://doi.org/10.1007/11596448_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30818-8
Online ISBN: 978-3-540-31599-5
eBook Packages: Computer ScienceComputer Science (R0)