Abstract
Traditional clustering algorithms are often defeated by high dimensionality. In order to find clusters hiding in different subspaces, soft subspace clustering has become an effective means of dealing with high dimensional data. However, most existing soft subspace clustering algorithms contain parameters which are difficult to be determined by users in real-world applications. A new soft subspace clustering algorithm named SC-IFWSA is proposed, which uses an improved feature weight self-adjustment mechanism IFWSA to update adaptively the weights of all features for each cluster according to the importance of the features to clustering quality and does not require users to set any parameter values. In addition, SC-IFWSA can overcome the traditional FWSA mechanism which may fail to calculate feature weights in some particular cases. In comparison with its related approaches, the experimental results carried out on ten data sets demonstrate the effectiveness and feasibility of the proposed method.
Similar content being viewed by others
References
Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. New directions in statistical physics: econophysics, bioinformatics, and pattern recognition, pp 273–308
Han JW, Kamber M (2007) Data mining: concepts and techniques, 2nd edn. China Machine Press, Beijing
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4):597–604
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):1–12
Wang LJ (2010) An improved multiple fuzzy NNC system based on mutual information and fuzzy integral. Int J Mach Learn Cybern 2(1):25–36
Hu QH, Pan W, An S, Ma PJ, Wei JM (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1–4):63–74
Shah NH, Shukla KT (2010) Optimal production schedule in declining market for an imperfect production system. Int J Mach Learn Cybern 1(1–4):89–99
Tsai CY, Chiu CC (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52:4658–4672
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithm for projected clustering. In: Proceedings of the ACM SIGMOD, pp 61–72
Woo KG, Lee JH, Kim MH, Lee YJ (2004) FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform Softw Technol 46(4):255–271
Yip KY, Cheung DW, Ng MK (2004) A practical projected clustering algorithm. IEEE Trans Knowl Data Eng 16(11):1387–1397
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105
Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn 37(5):943–952
Jing L, Ng MK, Huang JZ (2007) An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1–16
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Disc 14(1):63–97
Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. Adv Knowl Discov Data Mining 3518:802–812
Gan G, Wu J, Yang Z (2006) A fuzzy subspace algorithm for clustering high dimensional data. In: Li X, Zaiane O, Li Z (eds) Lecture notes in artificial intelligence 4093. Springer, Berlin, pp 271–278
Gan G, Wu J (2008) A convergence theorem for the fuzzy subspace clustering algorithm. Pattern Recogn 41:1939–1947
Deng Z, Choi KS, Chung FL, Wang S (2010) Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn 43:767–781
Domeniconi C, Papadopoulos D, Gunopulos D, Ma S (2004) Subspace clustering of high dimensional data, In: Proceedings of the SIAM international conference on data mining
Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes. J R Stat Soc B 66(4):815–849
Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 37(3):567–581
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn edn. Morgan Kaufmann, San Francisc
Asuncion A, Newman D J (2007) UCI Machine Learning Repository. School of Information and Computer Science, CA: University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, In Proc. of ACM International Conference on Knowledge Discovery and Data Mining, ACM Press: 97-106
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61070062, the Key Project on the Cooperation of Industry and University of Fujian Province of China under Grant No. 2010H6007, and the Key Scientific Research Project of the Higher Education Institutions of Fujian Province of China under Grant No. JK2009006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, G., Chen, S. & Chen, L. Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int. J. Mach. Learn. & Cyber. 3, 39–49 (2012). https://doi.org/10.1007/s13042-011-0038-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0038-8