Skip to main content

Data Mining with Products of Trees

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis (IDA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2189))

Included in the following conference series:

Abstract

We propose a new model for supervised classification for data mining applications. This model is based on products of trees. The information given by each predictor variable is separately extracted by means of a recursive partition structure. This information is then combined across predictors using a weighted product model form, an extension of the naive Bayes model. Empirical results are presented comparing this new method with other methods in the machine learning literature, for several data sets. Two typical data mining applications, a chromosome identification problem and a forest cover type identification problem are used to illustrate the ideas. The new approach is fast and surprisingly accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blake, C., Keogh, E. and Merz, C.J. (1998) UCI repository of machine learning databases [http://ics.uci.edu/mlearn/MLRepository.html]. Department of Information and Computing Science, University of California, Irvine, CA.

  2. Breiman, L., Freidman, J.H., Olshen, R.A., and Stone, C.J. (1984) Classification and Regression Trees. Belmont, California: Wadsworth.

    MATH  Google Scholar 

  3. Chipman, H., George, E.I. and McCulloch, R.E. (1998) Bayesian CART model search (with discussion). J. Am. Statist. Assoc., 93, 935–960.

    Article  Google Scholar 

  4. Denison, D.G.T., Adams, N.M., Holmes, C.C. and Hand, D.J. (2000) Bayesian Partition Modelling. Technical Report, Imperial College.

    Google Scholar 

  5. Ferreira, J.T.A.S., Denison, D.G.T. and Hand, D.J. (2001) Weighted naive Bayes modelling for data mining. Technical Report, Imperial College.

    Google Scholar 

  6. Hand, D.J. (1997) Construction and Assessment of Classification Rules. Chichester, Wiley.

    MATH  Google Scholar 

  7. Hand, D.J. (1998) Data Mining: Statistics and More?. The American Statistician, 52(2), 112–118.

    Article  MathSciNet  Google Scholar 

  8. Hand, D.J. and Adams, N.M. (2000) Defining attributes for scorecard construction in credit scoring. Journal of Applied Statistics, 27, 527–540.

    Article  MATH  Google Scholar 

  9. Hand, D.J., Blunt, G., Kelly, M.G. and Adams, N.M. (2000) Data Mining for Fun and Profit. Statistical Science, 15(2) 111–131.

    Article  Google Scholar 

  10. Hand, D.J. and Yu, K. (2001) Idiot’s Bayes-not so stupid after all? To appear in International Statistical Review

    Google Scholar 

  11. Kohavi, R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Knowledge Discovery and Artificial Intelligence, San Mateo,CA, Morgan Kaufmann, 1137–1143.

    Google Scholar 

  12. Kohavi, R. (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decisiontree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 202–207.

    Google Scholar 

  13. Kohavi, R. and John, G.H. (1997) Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273–324.

    Article  MATH  Google Scholar 

  14. Kononenko, I. (1990) Comparision of inductive and naive Bayesian learning approaches to automatic knowledge aquisition. In B. Wielinga et al. (eds.) Current trends in knowledge acquisition, Amsterdam, IOS Press.

    Google Scholar 

  15. Langley, P. (1994) Selection of relevant features in machine learning. In AAAI Fall Symposium on Relevance, 140–144.

    Google Scholar 

  16. McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley and Sons.

    Google Scholar 

  17. Moore,. A.W., Hill, D.J. and Johnson, M.P. (1992) An empirical investigation of brute force to choose features, smoothers and function approximators. In Hansin, S. et al. (eds.) Computational Learning Theory and Natural Learning Systems Conference, Vol. 3, MIT Press.

    Google Scholar 

  18. Ohmann, C., Yang, Q., Künneke, M., Stöltzing, H., Thon, K. and Lorenz W. (1988) Bayes theorem and conditional dependence of symptoms: different models applied to data of upper gastrointestinal bleeding. Methods of information in Medicine, 27, 73–83.

    Google Scholar 

  19. Quinlan, J.R. (1993) C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  20. Shapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998) Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.

    Article  MathSciNet  Google Scholar 

  21. Todd, B.S. amd Stamper, R. (1994) The relative accuracy of a variety of medical diagnostic programmes. Methods of Information in Medicine, 33, 402–416.

    Google Scholar 

  22. Webb, A. (1999) Statistical Pattern Recognition. London: Arnold.

    MATH  Google Scholar 

  23. Zheng, Z. and Webb, G.I. (2000) Lazy Learning of Bayesian Rules. Machine Learning, 41, 53–84.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferreira, J.T.A., Denison, D.G., Hand, D.J. (2001). Data Mining with Products of Trees. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-44816-0_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42581-6

  • Online ISBN: 978-3-540-44816-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics