Abstract
Most of the existing clustering algorithms are affected seriously by noise data and high cost of time. In this paper, on the basis of CURE algorithm, a representative points clustering algorithm based on density factor and relevant degree called RPCDR is proposed. The definition of density factor and relevant degree are presented. The primary representative point whose density factor is less than the prescribed threshold will be deleted directly. New representative points can be reselected from non representative points in corresponding cluster. Moreover, the representative points of each cluster are modeled by using K-nearest neighbor method. Relevant degree is computed by comprehensive considering the correlations of objects within a cluster and between different clusters. And then whether the two clusters need to merge is judged. The theoretic experimental results and analysis prove that RPCDR has better clustering accuracy and execution efficiency.
Similar content being viewed by others
References
Hou SZ, Zhang XF (2008) Analysis and research for network management alarms correlation based on sequence clustering algorithm. In: Proceedings of the 2008 international conference on intelligent computation technology and automation, pp 982–986
Mishra R, Kumar P, Bhasker B (2014) An alternative approach for clustering web user sessions considering sequential information. J Intell Data Anal 18:137–156
Sharif MA, Raghavan VV (2014) A clustering based scalable hybrid approach for web page recommendation. In: Proceedings of 2014 IEEE international conference on big data, pp 80–87
Sheu TL, Lin YH (2014) A cluster-based TDMA system for inter-vehicle communications. J Inf Sci Eng 30:213–231
Pichara K, Soto A (2011) Active learning and subspace clustering for anomaly detection. J Intell Data Anal 15:151–171
Guha S, Rastogi R, Shim K (2001) CURE: an efficient clustering algorithm for large databases. J Inf Syst 26:35–58
Zhang JJ, Peng YW, Li HF (2013) A new semiparametric estimation method for accelerated hazards mixture cure model. J Comput Stat Data Anal 59:95–102
Wang XJ, Shen H (2009) Clustering high dimensional data streams with representative points. In: Proceedings of the 6th international conference on fuzzy systems and knowledge discovery, pp 449–453
DelibasiC B, VukiCeviC M, JovanoviC M, Kirchner K (2012) An architecture for component-based design of representative-based blustering algorithms. J Data Knowl Eng 75:78–98
Cesmeci D, Gullu MK (2009) Phase-correlation-based hyperspectral image classification using multiple class representatives obtained with K-means clustering. Int J Remote Sens 30:3827–3834
Pang YJ, Pan W, Liu KD (2010) A supervised clustering algorithm based on representative points and its application to fault diagnosis of diesel engine. J Adv Mater Res 121–122:958–963
Chen EH, Wang SF, Yan N, Wang XF (2001) The design and implementation of clustering algorithm using representative data. J Pattern Recognit Artif Intell 14:417–422
Huang TQ, Qin XL, Wang JD (2006) Multi-representation feature tree and spatial clustering algorithm. J Comput Sci 33:189–195
Jia RY, Geng JW, Ning ZZ, He CG (2010) Fast clustering algorithm based on representative points. J Comput Eng Appl 46:121–126
Arajo D, Neto AD (2013) Information-theoretic clustering: a representative and evolutionary approach. J Expert Syst Appl 40:4190–4205
Domenica A, Massimo C (2001) Experiments in parallel clustering with DBSCAN. Lect Notes Comput Sci 2150:326–331
Wang XZ, Wang YD, Wang LJ (2004) Improving fuzzy C-means clustering based on feature-weight learning. Pattern Recognit Lett 25:1123–1132
Li XX, Meng FR, Zhou Y (2012) The fast clustering algorithm based representative points. J Nanjing Univ (Natl Sci) 48:504–512
Yeung D, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. IEEE Trans Pattern Anal Mach Intell 24:556–561
Pham TT, Luo JW, Hong TP, Vo B (2013) Efficient algorithm for mining sequential rules with interestingness measures. Int J Innov Comput Inf Control 9:4811–4824
IBM Almaden Research Center, Quest Data Mining Project[DB/OL] (1996-03-12) [2007-05-26]. http://www.almaden.ibm.com/cs/quest/syndata.html
Xie JY, Guo WJ, Xie WX, Gao XB (2012) K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space. J Appl Res Comput 29:888–892
Wu D, Ren JD (2012) K-means sequence clustering algorithm based on top-K maximal frequent sequence patterns. Int J Adv Comput Technol 4:405–413
Wang SY, Hu YF, Fan YJ, Xu HX (2010) Cluster of data streams with mixed numeric and categorical values based on entropy and distance. J Comput Syst 31:2365–2371
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61170190), the Nature Science Foundation of Hebei Province (No. F2015402114, F2015402070, F2015402119) and Foundation of Hebei Educational Committee (No. YQ2014014). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, D., Ren, J. & Sheng, L. Representative points clustering algorithm based on density factor and relevant degree. Int. J. Mach. Learn. & Cyber. 8, 641–649 (2017). https://doi.org/10.1007/s13042-015-0451-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0451-5