Simultaneous Partitioning of Input and Class Variables for Supervised Classification Problems with Many Classes

Boullé, Marc

doi:10.1007/978-3-642-25838-1_6

Marc Boullé⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 398))

427 Accesses

Abstract

In the data preparation phase of data mining, supervised discretization and value grouping methods have numerous applications: interpretation, conditional density estimation, filter selection of input variables, variable recoding for classification methods. These methods usually assume a small number of classes, typically less than ten, and reach their limit in case of too many classes. In this paper, we extend discretization and value grouping methods, based on the partitioning of both the input and class variables. The best joint partitioning is searched by maximizing a Bayesian model selection criterion. We show how to exploit this preprocessing method as a preparation for the naive Bayes classifier. Extensive experiments demonstrate the benefits of the approach in the case of hundreds of classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abramowitz, M., Stegun, I.: Handbook of mathematical functions. Dover Publications Inc., New York (1970)
Google Scholar
Allwein, E., Schapire, R., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2002)
MathSciNet Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bock, H.: Simultaneous clustering of objects and variables. In: Diday, E. (ed.) Analyse des Données et Informatique, pp. 187–203. INRIA (1979)
Google Scholar
Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
MATH Google Scholar
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Article Google Scholar
Boullé, M.: Optimum simultaneous discretization with data grid models in supervised classification: a bayesian model selection approach. Advances in Data Analysis and Classification 3(1), 39–61 (2009)
Article MathSciNet MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, California (1984)
MATH Google Scholar
Catlett, J.: On Changing Continuous Attributes into Ordered Discrete Attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 87–102. Springer, Heidelberg (1991)
Google Scholar
Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Escalera, S., Pujol, O., Radeva, P.: On the decoding process in ternary error-correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence 32(1), 120–134 (2010)
Article Google Scholar
Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)
MATH Google Scholar
Hand, D., Yu, K.: Idiot’s bayes? not so stupid after all? International Statistical Review 69(3), 385–399 (2001)
Article MATH Google Scholar
Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. European Journal of Operational Research 130, 449–467 (2001)
Article MathSciNet MATH Google Scholar
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)
Article MATH Google Scholar
Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research, 2727–2754 (2007)
Google Scholar
Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)
Article Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press (1992)
Google Scholar
Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 4(6), 393–423 (2002)
Article MathSciNet Google Scholar
Nadif, M., Govaert, G.: Block Clustering of Contingency Table and Mixture Model. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 249–259. Springer, Heidelberg (2005)
Chapter Google Scholar
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)
MathSciNet MATH Google Scholar
Ritschard, G., Zighed, D.A.: Simultaneous Row and Column Partitioning: Evaluation of a Heuristic. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 468–472. Springer, Heidelberg (2003)
Chapter Google Scholar
Yang, Y., Webb, G.: A comparative study of discretization methods for naive-Bayes classifiers. In: Proceedings of the Pacific Rim Knowledge Acquisition Workshop, pp. 159–173 (2002)
Google Scholar
Zighed, D., Rakotomalala, R.: Graphes d’induction. Hermes, France (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2 avenue Pierre Marzin, 22300, Lannion, France
Marc Boullé

Authors

Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Boullé .

Editor information

Editors and Affiliations

, LINA (CNRS UMR 6241), Nantes University, Cedex 3, rue C. Pauc BP 50609, Nantes, 44306, France
Fabrice Guillet
, Department of Econometrics, Université de Genève, room 5232, Uni-Mail, 40, bd du Pont-d'Arve, Geneva, 1211, Switzerland
Gilbert Ritschard
Laboratoire ERIC, Université Lumière Lyon 2, Bât L., avenue Pierre Mendès-France 5, Bron, 69600, France
Djamel Abdelkader Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boullé, M. (2012). Simultaneous Partitioning of Input and Class Variables for Supervised Classification Problems with Many Classes. In: Guillet, F., Ritschard, G., Zighed, D. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25838-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-25838-1_6
Published: 10 February 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25837-4
Online ISBN: 978-3-642-25838-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics