research-article

A Redundancy Based Unsupervised Feature Selection Method for High-Dimensional Data

Authors:

Ding LiuAuthors Info & Claims

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Pages 285 - 289

https://doi.org/10.1145/3457682.3457725

Published: 21 June 2021 Publication History

Abstract

Feature selection is a process to select key features from the initial feature set. It is commonly used as a preprocessing step to improve the efficiency and accuracy of a classification model in artificial intelligence and machine learning domains. This paper proposes a redundancy based unsupervised feature selection method for high-dimensional data called as RUFS. Firstly, RUFS roughly descending order the features by the average SU with the other features. Secondly, RUFS orderly check each feature to decide whether it is redundant or not. Finally, it selects the proper feature subset by repeating the second step until all the features are checked. After key features are selected, the research implements classifiers to check the quality of the selected feature subset. Compared with the other existing methods, the proposed RUFS method improves the mean classification of 11 real datasets by 8.1 percent at least.

References

[1]

Wang, S., Pedrycz, W., Zhu, Q., & Zhu, W. 2015. Unsupervised feature selection via maximum projection and minimum redundancy. Knowledge-Based Systems, 75, 19-29.

[2]

Tang, J., Chai, T., Yu, W., Liu, Z., & Zhou, X. 2016. A comparative study that measures ball mill load parameters through different single-scale and multiscale frequency spectra-based approaches. IEEE Transactions on Industrial Informatics, 12(6), 2008-2019.

[3]

Georgoulas, G., Climente-Alarcon, V., Antonino-Daviu, J. A., Tsoumas, I. P., Stylios, C. D., & Arkkio, A., . 2017. The use of a multi-label classification framework for the detection of broken bars and mixed eccentricity faults based on the start-up transient. IEEE Transactions on Industrial Informatics, 13(2), pp. 625-634.

[4]

Rajput, D. S., Singh, K., Bhattacharya, M., Rajput, D. S., & Bhattacharya, M. 2015. PROFIT: A Projected Clustering Technique. Springer International Publishing. 51-70.

[5]

Liu, Z., Zhao, X., Zuo, M. J., & Xu, H. 2014. Feature selection for fault level diagnosis of planetary gearboxes. Advances in Data Analysis & Classification, 8(4), 377-401.

Digital Library

[6]

Liu, H., & Motoda, H. 1999. Feature extraction construction and selection: a data mining perspective. Journal of the American Statistical Association, 94(448).

[7]

Al-Otaibi, R., Jin, N., Wilcox, T., & Flach, P. 2017. Feature construction and calibration for clustering daily load curves from smart-meter data. IEEE Transactions on Industrial Informatics, 12(2), 645-654.

[8]

Lillywhite, K., Lee, D. J., Tippetts, B., & Archibald, J. 2013. A feature construction method for general object recognition. Pattern Recognition, 46(12), 3300-3314.

Digital Library

[9]

Guyon, I. M., Elisseeff, A. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research. 3, 1157-1182.

Digital Library

[10]

Banerjee, M., & Pal, N. R. . 2014. Feature selection with svd entropy: some modification and extension. Information Sciences, 264, 118-134.

Digital Library

[11]

He, X., Cai, D., & Niyogi, P. . 2005. Laplacian Score for Feature Selection. Advances in Neural Information Processing Systems 18, December 5-8, Vancouver, British Columbia, Canada. MIT Press.

[12]

Saeys, Y., Abeel, T., & Peer, Y. V. D. 2008. Robust Feature Selection Using Ensemble Feature Selection Techniques. European Conference on Machine Learning & Knowledge Discovery in Databases. Springer-Verlag. Antwerp, Belgium, September 15-19, 313-325.

[13]

Han, Y., & Yu, L. 2012. A variance reduction framework for stable feature selection. Statistical Analysis and Data Mining, 5(5), 428-445.

Digital Library

[14]

Zhu, P., Zhu, W., Hu, Q., Zhang, C., & Zuo, W. 2017. Subspace clustering guided unsupervised feature selection. Pattern Recognition, 66, 364-374.

Digital Library

[15]

Peña, J. M. & Nilsson, R. 2010. On the Complexity of Discrete Feature Selection for Optimal Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32(8), 1517-1522.

Digital Library

[16]

Han, Y., Yang, Y., Yan, Y., Ma, Z., Sebe, N., & Zhou, X. (2015). Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst, 26(2), 252-264.

[17]

Zhao, Z., & Liu, H. (2007). Semi-supervised Feature Selection via Spectral Analysis. Siam International Conference on Data Mining, April 26-28, Minneapolis, Minnesota, USA. 641-646.

[18]

Saeys, Y., Inza, I., & Larrañaga, P. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507-2517.

[19]

Al-Jumeily, D., Hussain, A., Mallucci, C., & Oliver, C. 2015. Applied Computing in Medicine and Health. Morgan Kaufmann Publishers.

[20]

Katagiri, F., & Glazebrook, J. 2009. Overview of mRNA Expression Profiling Using Microarrays. Current Protocols in Molecular Biology. John Wiley & Sons.

[21]

Feldman, R., & Sanger, J. . 2009. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press.

[22]

Bharti, K. K., & Singh, P. K. 2014. A survey on filter techniques for feature selection in text mining. Advances in Intelligent Systems and Computing, 236, 1545-1559.

[23]

Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. JMLR.org.

[24]

Ross, J. S. 2008. Magnetic resonance imaging: physical and biological principles. Neurology, 38(7), 1169-1169.

[25]

Swets, D. L., & Weng, J. J. 1995. Efficient content-based image retrieval using automatic feature selection. International Symposium on Computer Vision. IEEE Computer Society.

[26]

Ahmed, M., Mahmood, A. N., & Islam, R. 2016. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55(6), 278-288.

Digital Library

[27]

Lee, W., Stolfo, S. J., & Mok, K. W. 2000. Adaptive intrusion detection: a data mining approach. Artificial Intelligence Review, 14(6), 533-567.

[28]

Minnan, Luo, Feiping, Nie, Xiaojun, & Chang, 2017. Adaptive unsupervised feature selection with structure regularization. IEEE Transactions on Neural Networks & Learning Systems. 99, 1-13.

[29]

Cheng, Q., Zhou, H., & Cheng, J. 2011. The fisher-markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence, 33(6), 1217-33.

Digital Library

[30]

Sammon, J. W. 1969. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401-409.

Digital Library

[31]

Banerjee, M., & Pal, N. 2015. Unsupervised feature selection with controlled redundancy (UFeSCoR). IEEE Transactions on Knowledge & Data Engineering, 27(12), 3390-3403.

[32]

Hall, M. (1998). Correlation-based feature selection for machine learning. PhD Thesis, Waikato Univer-sity, 19.

[33]

UC Irvine Machine Learning Repository (UCI), Center for Machine Learning and Intelligent Systems, http://archive.ics.uci.edu/ml/datasets.

[34]

Feature selection at Arizona State University, Scikit-feature feature selection repository, Feature selection datasets, http://featureselection.asu.edu/datasets.php.

Cited By

Zhang LSun XYang HCheng F(2024)Sparsity-Preserved Pareto Optimization for Subset SelectionIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328145628:3(825-836)Online publication date: Jun-2024
https://doi.org/10.1109/TEVC.2023.3281456
Zhang GHu JZhang P(2024)Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature SelectionInternational Journal of Fuzzy Systems10.1007/s40815-024-01740-026:8(2805-2820)Online publication date: 26-May-2024
https://doi.org/10.1007/s40815-024-01740-0

Recommendations

Unsupervised feature selection for multi-cluster data
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. ...
Efficiently handling feature redundancy in high-dimensional data
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant ...
Unsupervised Feature Selection Based on 3-D Feature Decision Graph for High-dimensional Data
ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

Feature selection is aimed at reducing the dimensionality of data sets and obtaining a feature subset with better performance for the target learner. Unsupervised feature selection is more challenging because of the lack of label information. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

February 2021

601 pages

ISBN:9781450389310

DOI:10.1145/3457682

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLC 2021

ICMLC 2021: 2021 13th International Conference on Machine Learning and Computing

February 26 - March 1, 2021

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
62
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LSun XYang HCheng F(2024)Sparsity-Preserved Pareto Optimization for Subset SelectionIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328145628:3(825-836)Online publication date: Jun-2024
https://doi.org/10.1109/TEVC.2023.3281456
Zhang GHu JZhang P(2024)Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature SelectionInternational Journal of Fuzzy Systems10.1007/s40815-024-01740-026:8(2805-2820)Online publication date: 26-May-2024
https://doi.org/10.1007/s40815-024-01740-0

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten