Abstract
This paper addresses the problem of selecting a subset of the most relevant features from a dataset through a weighted learning paradigm.We propose two automated feature selection algorithms for unlabeled data. In contrast to supervised learning, the problem of automated feature selection and feature weighting in the context of unsupervised learning is challenging, because label information is not available or not used to guide the feature selection. These algorithms involve both the introduction of unsupervised local feature weights, identifying certain relevant features of the data, and the suppression of the irrelevant features using unsupervised selection. The algorithms described in this paper provide topographic clustering, each cluster being associated to a prototype and a weight vector, reflecting the relevance of the feature. The proposed methods require simple computational techniques and are based on the self-organizing map (SOM) model. Empirical results based on both synthetic and real datasets from the UCI repository, are given and discussed.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almuallim, H., Dietterich, T.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 547–552. AAAI Press, Anaheim (1991)
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Benabdeslem, K., Lebbah, M.: Feature selection for Self Organizing Map. In: International Conference on Information Technology Interface-ITI 2007, Cavtat-Dubrovnik,Croatia, June 25-28, pp. 45–50 (2007)
Bennani., Y.: Adaptive weighting of pattern features during learning. In: IJCNN 1999, Piscataway, NJ, vol. 5, pp. 3008–3013 (1999)
Bishop, C.M., Svensén, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Comput. 10(1), 215–234 (1998)
Blansche, A., Gancarski, P., Korczak, J.: MACLAW: A modular approach for clustering with local attribute weighting. Pattern Recognition Letters 27(11), 1299–1306 (2006)
Cattell, R.: The scree test for the number of factors. Multivariate Behavioral Research 1, 245–276 (1966)
Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. JMLR 5, 845–889 (2004)
Frigui, H., Nasraoui, O.: Unsupervised learning of prototypes and attribute weights. Pattern Recognition 37(3), 567–581 (2004)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Computer Science and Scientific Computing Series. Academic Press, London (1990)
Guérif, S., Bennani, Y.: Dimensionality reduction trough unsupervised features selection. In: International Conference on Engineering Applications of Neural Networks (2007)
Horn, J.L., Engstrom, R.: Cattell’s Scree Test in Relation to Bartlett’s Chi-Square Test and Other Observations on the Number of Factors Problem. Multivariate Behavioral Research 14(3), 283–300 (1979)
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated Variable Weighting in k-Means Type Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005), http://dx.doi.org/10.1109/TPAMI.2005.95
Huh, M.-H., Lim, Y.B.: Weighting variables in K-means clustering. Journal of Applied Statistics 36(1), 67–78 (2009)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
Jing, L., Ng, M.K., Huang, J.Z.: An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data. IEEE Trans. on Knowl. and Data Eng. 19(8), 1026–1041 (2007), http://dx.doi.org/10.1109/TKDE.2007.1048
Kohonen, T.: Self-organizing Maps. Springer, Berlin (2001)
Lebbah, M., Rogovschi, N., Bennani, Y.: BeSOM: Bernoulli on Self Organizing Map. In: IJCNN 2007, Orlando, Florida (2007)
Li, C.-X., Yu, J.: A novel fuzzy C-means clustering algorithm. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 510–515. Springer, Heidelberg (2006)
Raîche, G., Riopel, M., Blais, J.-G.: Non Graphical Solutions for the Cattell’s Scree Test. In: International Meeting of the Psychometric Society, IMPS 2006, HEC, Montréal (2006)
Tsai, C.-Y., Chiu, C.-C.: Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput. Stat. Data Anal. 52(10), 4658–4672 (2008), http://dx.doi.org/10.1016/j.csda.2008.03.002
Verbeek, J., Vlassis, N., Krose, B.: Self-organizing mixture models. Neurocomputing 63, 99–123 (2005)
Vesanto, J., Alhoniemi, E.: Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks 11(3), 586–600 (2000)
Wang, C.-M., Huang, Y.-F.: Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Systems with Applications 36(3, Part 2), 5900–5908 (2009)
Wang, Q., Ye, Y., Huang, J.Z.: Fuzzy K-Means with Variable Weighting in High Dimensional Data Analysis. In: International Conference on Web-Age Information Management, vol. 0, pp. 365–372 (2008), http://doi.ieeecomputersociety.org/10.1109/WAIM.2008.50
Wiratunga, N., Lothian, R., Massie, S.: Unsupervised Feature Selection for Text Data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006)
Yacoub, M., Bennani, Y.: Features Selection and Architecture Optimization in Connectionist Systems. IJNS 10(5) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Grozavu, N., Bennani, Y., Lebbah, M. (2010). Cluster-Dependent Feature Selection through a Weighted Learning Paradigm. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-00580-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00579-4
Online ISBN: 978-3-642-00580-0
eBook Packages: EngineeringEngineering (R0)