Online early terminated streaming feature selection based on Rough Set theory
Section snippets
Code metadata
Permanent link to reproducible Capsule: https://codeocean.com/capsule/8154265/tree/v1.
Related work
Feature selection aims to select a minimal subset from the original feature space and is essential to speed up learning and improve concept quality [1]. According to different data types, we can divide feature selection into two categories: traditional feature selection for static data and online feature selection for stream data [5].
The proposed framework
This section first defines online streaming feature selection and early terminated online streaming feature selection. With the in-depth analysis of the reasons for early termination, we point out two properties that the mapping function should satisfy to terminate the selection before the end while maintaining competing performance. Then, we introduce the dependency degree in Rough Set theory and demonstrate that it satisfies these two early terminated properties. After that, we propose our
Datasets
In this section, we apply the proposed OSFS-ET and its competing algorithms on twelve real-world high-dimensional datasets [40], [41],1 as shown in Table 2.
Evaluation metrics
We use two basic classifiers, KNN(k = 9) and SVM (with the linear kernel) in Matlab R2017a, to evaluate a selected feature subset in our experiments. We perform 5-fold cross-validation
Conclusion
In this paper, we study the exciting issue of how to terminate the online streaming feature selection early while maintaining a satisfactory performance for the first time. An assumption is proposed that the online streaming feature selection can be terminated early if the expected increase of mapping function is much lower than the time consumption cost for the following arriving features. Based on this, we first present a formal definition on this issue and summarize two properties that the
CRediT authorship contribution statement
Peng Zhou: Conceptualization, Methodology, Software, Writing – original draft, Funding acquisition. Peipei Li: Validation, Investigation, Writing – review & editing, Funding acquisition. Shu Zhao: Formal analysis, Project administration, Funding acquisition. Yanping Zhang: Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work is supported in part by the National Natural Science Foundation of China under grants (61906056, 61976077, 61876001).
References (43)
- et al.
A novel hybrid feature selection method based on dynamic feature importance
Appl. Soft Comput.
(2020) - et al.
Joint adaptive manifold and embedding learning for unsupervised feature selection
Pattern Recognit.
(2021) - et al.
Feature selection in machine learning: A new perspective
Neurocomputing
(2018) - et al.
Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions
Appl. Soft Comput.
(2021) - et al.
Incremental approaches for heterogeneous feature selection in dynamic ordered data
Inform. Sci.
(2020) - et al.
Online streaming feature selection using rough sets
Internat. J. Approx. Reason.
(2016) - et al.
Online streaming feature selection using adapted neighborhood rough set
Inform. Sci.
(2019) - et al.
Ofs-density: A novel online streaming feature selection method
Pattern Recognit.
(2019) - et al.
Online feature selection for high-dimensional class-imbalanced data
Knowl.-Based Syst.
(2017) - et al.
An efficient feature selection framework based on information theory for high dimensional data
Appl. Soft Comput.
(2021)
A survey on rough set theory and its applications
CAAI Trans. Intell. Technol.
Osfsmi: Online stream feature selection method based on mutual information
Appl. Soft Comput.
Probability granular distance-based fuzzy rough set model
Appl. Soft Comput.
Lofs: Library of online streaming feature selection
Knowl.-Based Syst.
Computational Methods of Feature Selection
An introduction to variable and feature selection
J. Mach. Learn. Res.
Feature selection: A data perspective
Acm Comput. Surv.
Recent advances in feature selection and its applications
Knowl. Inf. Syst.
Subkilometer crater discovery with boosting and transfer learning
ACM Trans. Intell. Syst. Technol. (TIST)
Multimodal graph-based reranking for web image search
IEEE Trans. Image Process.
Data mining with big data
IEEE Trans. Knowl. Data Eng.
Cited by (13)
Hybrid interpretable model using roughset theory and association rule mining to detect interaction terms in a generalized linear model
2023, Expert Systems with ApplicationsFeature selection based on double-hierarchical and multiplication-optimal fusion measurement in fuzzy neighborhood rough sets
2022, Information SciencesCitation Excerpt :For example, intuitionistic FNRSs are modeled for heterogeneous datasets [26], and multigranularity FNRSs and their fuzzy neighborhood entropy are established for feature selection [28], while fuzzy neighborhoods are developed for coverage data classification [42]. Feature selection (FS) adopts granulation cognition to remove redundant attributes and select useful information, so FS is extensively utilized in data mining, machine learning, and knowledge discovery [4,14,18,30,39,40,49]. In particular, FS resorts mainly to uncertainty measurement [5,16,45], so measure-driven FS becomes an important topic [15,22,27,41,43].
Feature selection using fuzzy-neighborhood relative decision entropy with class-level priority fusion
2023, Journal of Intelligent and Fuzzy SystemsRHDOFS: A Distributed Online Algorithm Towards Scalable Streaming Feature Selection
2023, IEEE Transactions on Parallel and Distributed Systems
The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.