skip to main content
10.1145/3457682.3457725acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

A Redundancy Based Unsupervised Feature Selection Method for High-Dimensional Data

Published: 21 June 2021 Publication History

Abstract

Feature selection is a process to select key features from the initial feature set. It is commonly used as a preprocessing step to improve the efficiency and accuracy of a classification model in artificial intelligence and machine learning domains. This paper proposes a redundancy based unsupervised feature selection method for high-dimensional data called as RUFS. Firstly, RUFS roughly descending order the features by the average SU with the other features. Secondly, RUFS orderly check each feature to decide whether it is redundant or not. Finally, it selects the proper feature subset by repeating the second step until all the features are checked. After key features are selected, the research implements classifiers to check the quality of the selected feature subset. Compared with the other existing methods, the proposed RUFS method improves the mean classification of 11 real datasets by 8.1 percent at least.

References

[1]
Wang, S., Pedrycz, W., Zhu, Q., & Zhu, W. 2015. Unsupervised feature selection via maximum projection and minimum redundancy. Knowledge-Based Systems, 75, 19-29.
[2]
Tang, J., Chai, T., Yu, W., Liu, Z., & Zhou, X. 2016. A comparative study that measures ball mill load parameters through different single-scale and multiscale frequency spectra-based approaches. IEEE Transactions on Industrial Informatics, 12(6), 2008-2019.
[3]
Georgoulas, G., Climente-Alarcon, V., Antonino-Daviu, J. A., Tsoumas, I. P., Stylios, C. D., & Arkkio, A., . 2017. The use of a multi-label classification framework for the detection of broken bars and mixed eccentricity faults based on the start-up transient. IEEE Transactions on Industrial Informatics, 13(2), pp. 625-634.
[4]
Rajput, D. S., Singh, K., Bhattacharya, M., Rajput, D. S., & Bhattacharya, M. 2015. PROFIT: A Projected Clustering Technique. Springer International Publishing. 51-70.
[5]
Liu, Z., Zhao, X., Zuo, M. J., & Xu, H. 2014. Feature selection for fault level diagnosis of planetary gearboxes. Advances in Data Analysis & Classification, 8(4), 377-401.
[6]
Liu, H., & Motoda, H. 1999. Feature extraction construction and selection: a data mining perspective. Journal of the American Statistical Association, 94(448).
[7]
Al-Otaibi, R., Jin, N., Wilcox, T., & Flach, P. 2017. Feature construction and calibration for clustering daily load curves from smart-meter data. IEEE Transactions on Industrial Informatics, 12(2), 645-654.
[8]
Lillywhite, K., Lee, D. J., Tippetts, B., & Archibald, J. 2013. A feature construction method for general object recognition. Pattern Recognition, 46(12), 3300-3314.
[9]
Guyon, I. M., Elisseeff, A. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research. 3, 1157-1182.
[10]
Banerjee, M., & Pal, N. R. . 2014. Feature selection with svd entropy: some modification and extension. Information Sciences, 264, 118-134.
[11]
He, X., Cai, D., & Niyogi, P. . 2005. Laplacian Score for Feature Selection. Advances in Neural Information Processing Systems 18, December 5-8, Vancouver, British Columbia, Canada. MIT Press.
[12]
Saeys, Y., Abeel, T., & Peer, Y. V. D. 2008. Robust Feature Selection Using Ensemble Feature Selection Techniques. European Conference on Machine Learning & Knowledge Discovery in Databases. Springer-Verlag. Antwerp, Belgium, September 15-19, 313-325.
[13]
Han, Y., & Yu, L. 2012. A variance reduction framework for stable feature selection. Statistical Analysis and Data Mining, 5(5), 428-445.
[14]
Zhu, P., Zhu, W., Hu, Q., Zhang, C., & Zuo, W. 2017. Subspace clustering guided unsupervised feature selection. Pattern Recognition, 66, 364-374.
[15]
Peña, J. M. & Nilsson, R. 2010. On the Complexity of Discrete Feature Selection for Optimal Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32(8), 1517-1522.
[16]
Han, Y., Yang, Y., Yan, Y., Ma, Z., Sebe, N., & Zhou, X. (2015). Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst, 26(2), 252-264.
[17]
Zhao, Z., & Liu, H. (2007). Semi-supervised Feature Selection via Spectral Analysis. Siam International Conference on Data Mining, April 26-28, Minneapolis, Minnesota, USA. 641-646.
[18]
Saeys, Y., Inza, I., & Larrañaga, P. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507-2517.
[19]
Al-Jumeily, D., Hussain, A., Mallucci, C., & Oliver, C. 2015. Applied Computing in Medicine and Health. Morgan Kaufmann Publishers.
[20]
Katagiri, F., & Glazebrook, J. 2009. Overview of mRNA Expression Profiling Using Microarrays. Current Protocols in Molecular Biology. John Wiley & Sons.
[21]
Feldman, R., & Sanger, J. . 2009. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press.
[22]
Bharti, K. K., & Singh, P. K. 2014. A survey on filter techniques for feature selection in text mining. Advances in Intelligent Systems and Computing, 236, 1545-1559.
[23]
Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. JMLR.org.
[24]
Ross, J. S. 2008. Magnetic resonance imaging: physical and biological principles. Neurology, 38(7), 1169-1169.
[25]
Swets, D. L., & Weng, J. J. 1995. Efficient content-based image retrieval using automatic feature selection. International Symposium on Computer Vision. IEEE Computer Society.
[26]
Ahmed, M., Mahmood, A. N., & Islam, R. 2016. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55(6), 278-288.
[27]
Lee, W., Stolfo, S. J., & Mok, K. W. 2000. Adaptive intrusion detection: a data mining approach. Artificial Intelligence Review, 14(6), 533-567.
[28]
Minnan, Luo, Feiping, Nie, Xiaojun, & Chang, 2017. Adaptive unsupervised feature selection with structure regularization. IEEE Transactions on Neural Networks & Learning Systems. 99, 1-13.
[29]
Cheng, Q., Zhou, H., & Cheng, J. 2011. The fisher-markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence, 33(6), 1217-33.
[30]
Sammon, J. W. 1969. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401-409.
[31]
Banerjee, M., & Pal, N. 2015. Unsupervised feature selection with controlled redundancy (UFeSCoR). IEEE Transactions on Knowledge & Data Engineering, 27(12), 3390-3403.
[32]
Hall, M. (1998). Correlation-based feature selection for machine learning. PhD Thesis, Waikato Univer-sity, 19.
[33]
UC Irvine Machine Learning Repository (UCI), Center for Machine Learning and Intelligent Systems, http://archive.ics.uci.edu/ml/datasets.
[34]
Feature selection at Arizona State University, Scikit-feature feature selection repository, Feature selection datasets, http://featureselection.asu.edu/datasets.php.

Cited By

View all
  • (2024)Sparsity-Preserved Pareto Optimization for Subset SelectionIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328145628:3(825-836)Online publication date: Jun-2024
  • (2024)Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature SelectionInternational Journal of Fuzzy Systems10.1007/s40815-024-01740-026:8(2805-2820)Online publication date: 26-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing
February 2021
601 pages
ISBN:9781450389310
DOI:10.1145/3457682
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Feature selection
  2. High-dimensional data
  3. Redundancy
  4. Unsupervised

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sparsity-Preserved Pareto Optimization for Subset SelectionIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.328145628:3(825-836)Online publication date: Jun-2024
  • (2024)Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature SelectionInternational Journal of Fuzzy Systems10.1007/s40815-024-01740-026:8(2805-2820)Online publication date: 26-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media