Abstract
In the data preparation phase of data mining, supervised discretization and value grouping methods have numerous applications: interpretation, conditional density estimation, filter selection of input variables, variable recoding for classification methods. These methods usually assume a small number of classes, typically less than ten, and reach their limit in case of too many classes. In this paper, we extend discretization and value grouping methods, based on the partitioning of both the input and class variables. The best joint partitioning is searched by maximizing a Bayesian model selection criterion. We show how to exploit this preprocessing method as a preparation for the naive Bayes classifier. Extensive experiments demonstrate the benefits of the approach in the case of hundreds of classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramowitz, M., Stegun, I.: Handbook of mathematical functions. Dover Publications Inc., New York (1970)
Allwein, E., Schapire, R., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2002)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bock, H.: Simultaneous clustering of objects and variables. In: Diday, E. (ed.) Analyse des Données et Informatique, pp. 187–203. INRIA (1979)
Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Boullé, M.: Optimum simultaneous discretization with data grid models in supervised classification: a bayesian model selection approach. Advances in Data Analysis and Classification 3(1), 39–61 (2009)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, California (1984)
Catlett, J.: On Changing Continuous Attributes into Ordered Discrete Attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 87–102. Springer, Heidelberg (1991)
Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Escalera, S., Pujol, O., Radeva, P.: On the decoding process in ternary error-correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence 32(1), 120–134 (2010)
Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)
Hand, D., Yu, K.: Idiot’s bayes? not so stupid after all? International Statistical Review 69(3), 385–399 (2001)
Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. European Journal of Operational Research 130, 449–467 (2001)
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)
Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research, 2727–2754 (2007)
Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)
Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 4(6), 393–423 (2002)
Nadif, M., Govaert, G.: Block Clustering of Contingency Table and Mixture Model. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 249–259. Springer, Heidelberg (2005)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)
Ritschard, G., Zighed, D.A.: Simultaneous Row and Column Partitioning: Evaluation of a Heuristic. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 468–472. Springer, Heidelberg (2003)
Yang, Y., Webb, G.: A comparative study of discretization methods for naive-Bayes classifiers. In: Proceedings of the Pacific Rim Knowledge Acquisition Workshop, pp. 159–173 (2002)
Zighed, D., Rakotomalala, R.: Graphes d’induction. Hermes, France (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Boullé, M. (2012). Simultaneous Partitioning of Input and Class Variables for Supervised Classification Problems with Many Classes. In: Guillet, F., Ritschard, G., Zighed, D. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25838-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-25838-1_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25837-4
Online ISBN: 978-3-642-25838-1
eBook Packages: EngineeringEngineering (R0)