Efficient greedy feature selection for unsupervised learning

Farahat, Ahmed K.; Ghodsi, Ali; Kamel, Mohamed S.

doi:10.1007/s10115-012-0538-1

Efficient greedy feature selection for unsupervised learning

Regular Paper
Published: 27 September 2012

Volume 35, pages 285–310, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ahmed K. Farahat¹,
Ali Ghodsi² &
Mohamed S. Kamel¹

1758 Accesses
60 Citations
Explore all metrics

Abstract

Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

\(\Vert A \Vert _{F}^{2} = trace(A^TA)\).
Data sets are available in MATLAB format at:
http://www.zjucadcg.cn/dengcai/Data/FaceData.html.
http://www.zjucadcg.cn/dengcai/Data/MLData.html.
http://www.zjucadcg.cn/dengcai/Data/TextData.html.
http://people.csail.mit.edu/jrennie/20Newsgroups/.
The following implementations were used:
FSFS: http://www.facweb.iitkgp.ernet.in/~pabitra/paper/fsfs.tar.gz.
LS: http://www.zjucadcg.cn/dengcai/Data/code/LaplacianScore.m.
SPEC: http://featureselection.asu.edu/algorithms/fs_uns_spec.zip.
MCFS: http://www.zjucadcg.cn/dengcai/Data/code/MCFS_p.m.
The CPFA method was not included in the comparison as its implementation details were not completely specified in [20].
The experiments on the first four data sets were conducted on an Intel P4 3.6 GHz machine with 2 GB RAM, while the experiments on the last two last sets were conducted on an Intel Core i5 650 3.2 GHz machine with 8 GB RAM.
The implementations of AP and SPEC algorithms do not scale to run on the USPS data set, and those of AP, PCA-LRG, FSFS, and SPEC do not scale to run on the TDT2-30 and 20NG data sets on the used simulation machines.

References

Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) A review of feature selection methods on synthetic data. Knowl Inf Syst 1–37. doi:10.1007/s10115-012-0487-8
Boutsidis C, Mahoney M, Drineas P (2009) Unsupervised feature selection for the \(k\)-means clustering problem. In: Proceedings of advances in neural information processing systems (NIPS), vol 22. Curran Associates, Red Hook, pp 153–161
Boutsidis C, Mahoney MW, Drineas P (2008) Unsupervised feature selection for principal components analysis. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 61–69
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), ACM, New York, pp 333–342
Cieri C, Graff D, Liberman M, Martey N, Strassel S (1999) The TDT-2 text and speech corpus. In: Proceedings of the DARPA Broadcast News, Workshop, pp 57–60
Cole R, Fanty M (1990) Spoken letter recognition. In: Proceedings of the third DARPA workshop on speech and natural language, pp 385–390
Cui Y, Dy J (2008) Orthogonal principal feature selection, the sparse optimization and variable selection workshop at the international conference on machine learning (ICML)
Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175
Article MATH Google Scholar
Dhir C, Lee J, Lee S-Y (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30:359–375
Article Google Scholar
Farahat A, Ghodsi A, Kamel M (2011) An efficient greedy method for unsupervised feature selection. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM), pp 161–170
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972
Article MathSciNet MATH Google Scholar
Guyon I (2006) Feature extraction: foundations and applications. Springer, Berlin
Book MATH Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Proceedings of advances in neural information processing systems (NIPS) 18, MIT Press, Cambridge, pp 507–514
Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Article Google Scholar
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, Berlin
MATH Google Scholar
Lu Y, Cohen I, Zhou X, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th international conference on multimedia. ACM, New York, pp 301–304
Lütkepohl H (1996) Handbook of matrices. Wiley, New York
MATH Google Scholar
Masaeli M, Yan Y, Cui Y, Fung G, Dy J (2010) Convex principal feature selection. In: Proceedings of SIAM international conference on data mining (SDM). SIAM, Philadelphia, pp 619–628
Mitra P, Murthy C, Pal S (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Article Google Scholar
Nene S, Nayar S, Murase H (1996) Columbia object image library (COIL-20), technical report CUCS-005-96, Columbia University
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems (NIPS), vol 14, MIT Press, Cambridge, pp 849–856
Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6:1855–1887
MathSciNet MATH Google Scholar
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of advances in neural information processing systems (NIPS), vol 16. MIT Press, Cambridge, pp 1601–1608
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning (ICML), ACM, New York, pp 1151–1157
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Ahmed K. Farahat & Mohamed S. Kamel
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Ali Ghodsi

Authors

Ahmed K. Farahat
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ghodsi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed S. Kamel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed K. Farahat.

Additional information

A preliminary version of this paper appeared as Farahat et al. [10].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farahat, A.K., Ghodsi, A. & Kamel, M.S. Efficient greedy feature selection for unsupervised learning. Knowl Inf Syst 35, 285–310 (2013). https://doi.org/10.1007/s10115-012-0538-1

Download citation

Received: 10 December 2011
Revised: 25 April 2012
Accepted: 11 August 2012
Published: 27 September 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s10115-012-0538-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient greedy feature selection for unsupervised learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient greedy feature selection for unsupervised learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation