Abstract
Feature selection is a crucial activity when knowledge discovery is applied to large databases, as it reduces dimensionality and therefore the complexity of the problem. Its main objective is to eliminate attributes to obtain a computationally tractable problem, without affecting the solution quality. To perform feature selection, several methods have been proposed, some of them tested over small academic datasets. In this paper we evaluate different feature selection-ranking methods over a large real world database related with a Mexican electric energy client-invoice system. Most of the research on feature selection methods only evaluates accuracy and processing time; here we also report on cost sensitive classification and the amount of discovered knowledge. Additionally, we stress the issue around the boundary that separates relevant and irrelevant features. Finally, we propose a promising feature selection heuristic based on the experiments performed, taken into account a cost sensitive classification.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Frawley, W., et al.: Knowledge Discovery in DBs: An Overview. In: Piatetsky-Shapiro, G. (ed.) Knowledge Discovery in Databases, pp. 1–27. AAAI/MIT, Cambridge (1991)
Pyle, D.: Data preparation for data mining. Morgan Kaufmann, San Francisco, California (1999)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research 3, 1157–1182 (2003)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence Journal, Special issue on relevance, 273–324 (1997)
Leite, R., Brazdil, P.: Decision tree-based attribute selection via sub sampling. In: Herrera, F., Riquelme, J. (eds.) Workshop de minería de datos y aprendizaje, VIII Iberamia, Sevilla, Spain, November 2002, pp. 77–83 (2002)
Piramuthu, S.: Evaluating feature selection methods for learning in data mining applications. In: Proc. 31st annual Hawaii Int. conf. on system sciences, pp. 294–301 (1998)
Stolfo, S., Fan, W., Lee, W., Prodromidis, A., Chan, P.: Credit card fraud detection using meta-learning: Issues and initial results. In: Working notes of AAAI Workshop on AI Approaches to Fraud Detection and Risk Management (1997)
(2003), http://www.ia.uned.es/~elvira/
(2003), www.cs.waikato.ac.nz/ml/weka
Stoppiglia, H., Dreyfus, G., et al.: Ranking a random feature for variable and feature selection. Journal of machine learning research 3, 1399–1414 (2003)
Molina, L., Belanche, L., Nebot, A.: Feature selection algorithms, a survey and experimental evaluation. In: IEEE Int. conf. on data mining, Maebashi City Japan, pp. 306–313 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mejía-Lavalle, M. (2008). Applying Cost Sensitive Feature Selection in an Electric Database. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_71
Download citation
DOI: https://doi.org/10.1007/978-3-540-68123-6_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)