Abstract
The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breaban, M., Luchian, H.: Unsupervised feature weighting with multi-niche genetic algorithms. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, July 2009, pp. 1163–1170. ACM, New York (2009)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 224–227 (1979)
Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: Methods and analysis. ACM Transactions on Knowledge Discovery from Data 2(4), 1–40 (2009)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001)
Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
Guerif, S.: Unsupervised variable selection: when random rankings sound as irrelevancy. Journal of Machine Learning Research 4, 163–177 (2008)
Handl, J., Knowles, J.: Feature subset selection in unsupervised learning via multiobjective optimization. International Journal of Computational Intelligence Research 2(3), 217–238 (2006)
Handl, J., Knowles, J.: Semi-supervised feature selection via multiobjective optimization. In: Proceedings of the International Joint Conference on Neural Networks, pp. 3319–3326 (2006)
Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters 29, 595–602 (2008)
Hubert, A.: Comparing partitions. Journal of Classification 2, 193–198 (1985)
Talavera, L.: Feature selection as a preprocessing step for hierarchical clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 389–398. Morgan Kaufmann, San Francisco (1990)
Ren, J., Qiu, Z., Fan, W., Cheng, H., Yu, P.S.: Forward semi-supervised feature selection. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 970–976. Springer, Heidelberg (2008)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(1), 53–65 (1987)
Sndberg-madsen, N., Thomsen, C., Pena, J.M.: Unsupervised feature subset selection. In: Proceedings of the Workshop on Probabilistic Graphical Models for Classification (within ECML 2003) (2003)
Varshavsky, R., Gottlieb, A., Linial, M., Horn, D.: Novel unsupervised feature filtering of biological data. Bioinformatics 22(14), 507–513 (2006)
Vemuri, V., Cedeo, W.: Multi-niche crowding for multimodal search. In: Practical Handbook of Genetic Algorithms: New Frontiers, 2nd edn., Lance Chambers (1995)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 577–584. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Breaban, M.E. (2010). Evolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering. In: GarcÃa-Pedrajas, N., Herrera, F., Fyfe, C., BenÃtez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-13025-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13024-3
Online ISBN: 978-3-642-13025-0
eBook Packages: Computer ScienceComputer Science (R0)