Random Ordinality Ensembles $\colon$ A Novel Ensemble Method for Multi-valued Categorical Data

Ahmad, Amir; Brown, Gavin

doi:10.1007/978-3-642-02326-2_23

Amir Ahmad¹⁹ &
Gavin Brown¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5519))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

2456 Accesses
3 Citations

Abstract

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alpaydin, E.: Combined 5 x 2 cv f Test Comparing Supervised Classification Learning Algorithms. Neural Computation 11(8), 1885–1892 (1999)
Article Google Scholar
Bratko, I., Kononenko, I.: Learning Diagnostic Rules from Incomplete and Noisy Data, Seminar on AI Methods in Statistics, London (1986)
Google Scholar
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, CA (1985)
MATH Google Scholar
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision trees: Bagging, Boosting, and randomization. Machine Learning 40(2), 1–22 (2000)
Article Google Scholar
Fayyad, U.M., Irani, K.B.: The Attribute Selection Problem in Decision Tree Generation. In: Proc. AAAI 1992. MIT Press, Cambridge (1992)
Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely Randomized Trees. Machine Learning 63(1), 3–42 (2006)
Article MATH Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)
Book MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Vilalta, R., Blix, G., Rendell, L.: Global Data Analysis and the Fragmentation Problem in Decision Tree Induction. In: Proceedings of the 9th European Conference on Machine Learning, pp. 312–328 (1997)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Amir Ahmad & Gavin Brown

Authors

Amir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Brown
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical and Computer Engineering, University of Iceland, Hjardarhagi 2-6, 107, Reykjavik, Iceland
Jón Atli Benediktsson
Speech and Signal Processing, Guildford, University of Surrey, Centre for Vision, GU2 7XH, Surrey, United Kingdom
Josef Kittler
Department of Electrical and Electronic Engineering, Piazza d’Armi, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmad, A., Brown, G. (2009). Random Ordinality Ensembles$\colon$ A Novel Ensemble Method for Multi-valued Categorical Data. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2009. Lecture Notes in Computer Science, vol 5519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02326-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-02326-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02325-5
Online ISBN: 978-3-642-02326-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Random Ordinality Ensembles\(\colon\) A Novel Ensemble Method for Multi-valued Categorical Data

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Random Ordinality Ensembles\(\colon\) A Novel Ensemble Method for Multi-valued Categorical Data

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation