ABSTRACT
Data discretization and feature selection are two important tasks that can be performed prior to the learning phase of data mining algorithms and can significantly reduce the processing effort of the learning algorithm. In this paper, we present a new algorithm, called Omega, for data preprocessing. Our proposed algorithm performs simultaneously data discretization and feature selection. Some experiments were performed to validate the effects of the preprocessing performed by the Omega algorithm in the results of the C4.5 algorithm (a well-known decision tree-based classifier). The results indicates that the proposed algorithm Omega is well-suited to both, data discretization and feature selection, being appropriate for data pre-processing.
- A. Asuncion and D. Newman. Uci repository (www.ics.uci.edu/mlearn/mlrepository.html). 2007.Google Scholar
- R. Kerber. Chimerge: Discretization of numeric attributes. In 10th Intl. Conf. on Artificial Intelligence, pages 123--128, 1992.Google Scholar
- K. Kira and L. A. Rendell. A practical approach for feature selection. In 9th Intl. Conf. on Machine Learning, pages 249--256, Aberdeen, Scotland, 1992. Google ScholarDigital Library
- H. Liu and R. Setiono. Feature selection via discretization. Knowledge and Data Engineering, 9(4):642--645, 1997. Google ScholarDigital Library
Index Terms
- A new algorithm for data discretization and feature selection
Recommendations
Data pre-processing: a new algorithm for feature selection and data discretization
CSTST '08: Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technologyData pre-processing is a key element to improve the accuracy of data mining algorithms. In the pre-processing step, the data are treated in order to make the mining process achievable and effective. Data discretization and feature selection are two ...
Synthetic Data for Feature Selection
Artificial Intelligence and Soft ComputingAbstractFeature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection ...
Genetic algorithms in feature and instance selection
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Comments