Skip to main content

Evolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering

  • Conference paper
Trends in Applied Intelligent Systems (IEA/AIE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6097))

  • 1180 Accesses

Abstract

The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breaban, M., Luchian, H.: Unsupervised feature weighting with multi-niche genetic algorithms. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, July 2009, pp. 1163–1170. ACM, New York (2009)

    Google Scholar 

  2. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 224–227 (1979)

    Article  Google Scholar 

  3. Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: Methods and analysis. ACM Transactions on Knowledge Discovery from Data 2(4), 1–40 (2009)

    Article  Google Scholar 

  4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001)

    MATH  Google Scholar 

  5. Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Guerif, S.: Unsupervised variable selection: when random rankings sound as irrelevancy. Journal of Machine Learning Research 4, 163–177 (2008)

    Google Scholar 

  7. Handl, J., Knowles, J.: Feature subset selection in unsupervised learning via multiobjective optimization. International Journal of Computational Intelligence Research 2(3), 217–238 (2006)

    Article  MathSciNet  Google Scholar 

  8. Handl, J., Knowles, J.: Semi-supervised feature selection via multiobjective optimization. In: Proceedings of the International Joint Conference on Neural Networks, pp. 3319–3326 (2006)

    Google Scholar 

  9. Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters 29, 595–602 (2008)

    Article  Google Scholar 

  10. Hubert, A.: Comparing partitions. Journal of Classification 2, 193–198 (1985)

    Article  Google Scholar 

  11. Talavera, L.: Feature selection as a preprocessing step for hierarchical clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 389–398. Morgan Kaufmann, San Francisco (1990)

    Google Scholar 

  12. Ren, J., Qiu, Z., Fan, W., Cheng, H., Yu, P.S.: Forward semi-supervised feature selection. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 970–976. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(1), 53–65 (1987)

    Article  MATH  Google Scholar 

  14. Sndberg-madsen, N., Thomsen, C., Pena, J.M.: Unsupervised feature subset selection. In: Proceedings of the Workshop on Probabilistic Graphical Models for Classification (within ECML 2003) (2003)

    Google Scholar 

  15. Varshavsky, R., Gottlieb, A., Linial, M., Horn, D.: Novel unsupervised feature filtering of biological data. Bioinformatics 22(14), 507–513 (2006)

    Article  Google Scholar 

  16. Vemuri, V., Cedeo, W.: Multi-niche crowding for multimodal search. In: Practical Handbook of Genetic Algorithms: New Frontiers, 2nd edn., Lance Chambers (1995)

    Google Scholar 

  17. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 577–584. Springer, Heidelberg (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Breaban, M.E. (2010). Evolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13025-0_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13024-3

  • Online ISBN: 978-3-642-13025-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics