Skip to main content

Handling categorical data in rule induction

  • Conference paper
  • 290 Accesses

Abstract

In this paper we address problems arising from the use of categorical valued data in rule induction. By naively using categorical values in rule induction, we risk reducing the chances of finding a good rule in terms both of confidence (accuracy) and of support or coverage. In this paper we introduce a technique called arcsin transformation where categorical valued data is replaced with numeric values. Our results show that on relatively large databases, containing many unordered categorical attributes, larger databases incorporating both unordered and numeric data, and especially those databases that are small containing rare cases, this technique is highly effective when dealing with categorical valued data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lanner (2001) DataLamp and the templar framework. http://www.lanner.com/corporate. (2001)

    Google Scholar 

  2. UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html.

    Google Scholar 

  3. Agresti A. and Yang M. (1987) An empirical investigation of some effects of sparseness in contingency tables. Comm. Stat., 5:9–21.

    MATH  Google Scholar 

  4. Reid R. C. and Cressie N. A. (1988) Goodness-of-fit statistics for discrete multivariate data. Springer-Verlag, New York.

    Book  Google Scholar 

  5. Haberman S. J. (1988) A Warning on the use of chi-squared statistics with frequency tables with small expected cell counts, volume 83, Issue 402. Journal of the american statistical association, pp. 555–560.

    Google Scholar 

  6. Bishop Y. M. M, Fienberg S. E. and Holland P. W. (1975) Discrete multivariate analysis, MIT Press, Cambridge, Massachusetts, pp. 491–492.

    MATH  Google Scholar 

  7. Freeman M. F. and Tukey J. W. (1950) Transformations related to the angular and the square root, volume 21, issue 4, Annals of mathematical statistics, pp. 607–611.

    Article  MathSciNet  MATH  Google Scholar 

  8. Angoss knowledge engineering (1987) http://www.angoss.com.

    Google Scholar 

  9. Richards G. and Rayward-Smith V. J. (2001) Discovery of association rules in tabular data, IEEE international conference on data mining, pp. 465–472.

    Google Scholar 

  10. Kaufman L. and Rousseeuw P. (1990) Finding groups in data: An introduction to cluster analysis, John Wiley and Sons Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Wien

About this paper

Cite this paper

Burgess, M., Janacek, G.J., Rayward-Smith, V.J. (2003). Handling categorical data in rule induction. In: Pearson, D.W., Steele, N.C., Albrecht, R.F. (eds) Artificial Neural Nets and Genetic Algorithms. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0646-4_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-0646-4_45

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-00743-3

  • Online ISBN: 978-3-7091-0646-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics