Abstract
Feature discretization (FD) is a necessary pre-processing step for many machine learning tasks. Its use often yields compact and robust data representations, leading to more accurate classifiers and lower training times. In this paper, we propose an incremental supervised FD technique based on recursive bit allocation. The proposed algorithm starts with a pool of bits and, at each stage, if there are still bits left in the pool, allocates the next bit to the most promising feature, i.e., the one which, after discretization, has the highest mutual information with the class label. Since it may happen that one (or more) feature(s) receives no bits at all, this FD procedure has a built-in feature selection effect. The experimental evaluation on public domain benchmark datasets shows that the proposed method obtains similar or better results, both in terms of classification accuracy and number of discretization intervals, as compared to other state-of-the-art supervised FD techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Int. Conf. M. L. (ICML), pp. 194–202 (1995)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann (2005)
Cover, T., Thomas, J.: Elements of Information Theory. J. Wiley & Sons (1991)
Principe, J.: Information Theoretic Learning. Springer (2010)
Tsai, C.-J., Lee, C.-I., Yang, W.-P.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)
Jin, R., Breitbart, Y., Muoh, C.: Data discretization unification. Know. Inf. Systems 19(1), 1–29 (2009)
Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: A recent survey. GESTS Int. Trans. on Computer Science and Engineering 32(1) (2006)
Fayyad, U., Irani, K.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Int. Joint Conf. on Art. Intell. (IJCAI), pp. 1022–1027 (1993)
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proc. Int. Joint Conf. on Art. Intell. (IJCAI), pp. 1034–1040 (1995)
Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. on Know. and Data Engineering 16(2), 145–153 (2004)
Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Machine Learning Research 13, 27–66 (2012)
Fox, B.: Discrete optimization via marginal analysis. Man. Sci. 13(3), 210–216 (1966)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferreira, A., Figueiredo, M. (2013). An Incremental Bit Allocation Strategy for Supervised Feature Discretization. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds) Pattern Recognition and Image Analysis. IbPRIA 2013. Lecture Notes in Computer Science, vol 7887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38628-2_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-38628-2_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38627-5
Online ISBN: 978-3-642-38628-2
eBook Packages: Computer ScienceComputer Science (R0)