Handling categorical data in rule induction

Burgess, Martin; Janacek, Gareth J.; Rayward-Smith, Vic J.

doi:10.1007/978-3-7091-0646-4_45

Handling categorical data in rule induction

Martin Burgess⁴,
Gareth J. Janacek⁴ &
Vic J. Rayward-Smith⁴

Conference paper

290 Accesses

Abstract

In this paper we address problems arising from the use of categorical valued data in rule induction. By naively using categorical values in rule induction, we risk reducing the chances of finding a good rule in terms both of confidence (accuracy) and of support or coverage. In this paper we introduce a technique called arcsin transformation where categorical valued data is replaced with numeric values. Our results show that on relatively large databases, containing many unordered categorical attributes, larger databases incorporating both unordered and numeric data, and especially those databases that are small containing rare cases, this technique is highly effective when dealing with categorical valued data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lanner (2001) DataLamp and the templar framework. http://www.lanner.com/corporate. (2001)
Google Scholar
UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html.
Google Scholar
Agresti A. and Yang M. (1987) An empirical investigation of some effects of sparseness in contingency tables. Comm. Stat., 5:9–21.
MATH Google Scholar
Reid R. C. and Cressie N. A. (1988) Goodness-of-fit statistics for discrete multivariate data. Springer-Verlag, New York.
Book Google Scholar
Haberman S. J. (1988) A Warning on the use of chi-squared statistics with frequency tables with small expected cell counts, volume 83, Issue 402. Journal of the american statistical association, pp. 555–560.
Google Scholar
Bishop Y. M. M, Fienberg S. E. and Holland P. W. (1975) Discrete multivariate analysis, MIT Press, Cambridge, Massachusetts, pp. 491–492.
MATH Google Scholar
Freeman M. F. and Tukey J. W. (1950) Transformations related to the angular and the square root, volume 21, issue 4, Annals of mathematical statistics, pp. 607–611.
Article MathSciNet MATH Google Scholar
Angoss knowledge engineering (1987) http://www.angoss.com.
Google Scholar
Richards G. and Rayward-Smith V. J. (2001) Discovery of association rules in tabular data, IEEE international conference on data mining, pp. 465–472.
Google Scholar
Kaufman L. and Rousseeuw P. (1990) Finding groups in data: An introduction to cluster analysis, John Wiley and Sons Inc.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Systems, University of East Anglia, Norwich, UK
Martin Burgess, Gareth J. Janacek & Vic J. Rayward-Smith

Authors

Martin Burgess
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. Janacek
View author publications
You can also search for this author in PubMed Google Scholar
Vic J. Rayward-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Equipe Universitaire de Recherche en Informatique de Saint-Etienne (Groupe de Recherche de Roanne) Institut Universitaire de Technologie de Roanne, Université Jean Monnet, Saint-Etienne, France
David W. Pearson
Division of Mathematics School of Mathematical and Information Sciences, Coventry University, Coventry, UK
Nigel C. Steele
Institut für Informatik, Universität Innsbruck, Innsbruck, Austria
Rudolf F. Albrecht

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burgess, M., Janacek, G.J., Rayward-Smith, V.J. (2003). Handling categorical data in rule induction. In: Pearson, D.W., Steele, N.C., Albrecht, R.F. (eds) Artificial Neural Nets and Genetic Algorithms. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0646-4_45

Download citation

DOI: https://doi.org/10.1007/978-3-7091-0646-4_45
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-00743-3
Online ISBN: 978-3-7091-0646-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics