Data Mining with Products of Trees

Ferreira, José Tomé A.S.; Denison, David G.T.; Hand, David J.

doi:10.1007/3-540-44816-0_17

José Tomé A.S. Ferreira⁵,
David G.T. Denison⁵ &
David J. Hand⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2189))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1126 Accesses
2 Citations

Abstract

We propose a new model for supervised classification for data mining applications. This model is based on products of trees. The information given by each predictor variable is separately extracted by means of a recursive partition structure. This information is then combined across predictors using a weighted product model form, an extension of the naive Bayes model. Empirical results are presented comparing this new method with other methods in the machine learning literature, for several data sets. Two typical data mining applications, a chromosome identification problem and a forest cover type identification problem are used to illustrate the ideas. The new approach is fast and surprisingly accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C., Keogh, E. and Merz, C.J. (1998) UCI repository of machine learning databases [http://ics.uci.edu/mlearn/MLRepository.html]. Department of Information and Computing Science, University of California, Irvine, CA.
Breiman, L., Freidman, J.H., Olshen, R.A., and Stone, C.J. (1984) Classification and Regression Trees. Belmont, California: Wadsworth.
MATH Google Scholar
Chipman, H., George, E.I. and McCulloch, R.E. (1998) Bayesian CART model search (with discussion). J. Am. Statist. Assoc., 93, 935–960.
Article Google Scholar
Denison, D.G.T., Adams, N.M., Holmes, C.C. and Hand, D.J. (2000) Bayesian Partition Modelling. Technical Report, Imperial College.
Google Scholar
Ferreira, J.T.A.S., Denison, D.G.T. and Hand, D.J. (2001) Weighted naive Bayes modelling for data mining. Technical Report, Imperial College.
Google Scholar
Hand, D.J. (1997) Construction and Assessment of Classification Rules. Chichester, Wiley.
MATH Google Scholar
Hand, D.J. (1998) Data Mining: Statistics and More?. The American Statistician, 52(2), 112–118.
Article MathSciNet Google Scholar
Hand, D.J. and Adams, N.M. (2000) Defining attributes for scorecard construction in credit scoring. Journal of Applied Statistics, 27, 527–540.
Article MATH Google Scholar
Hand, D.J., Blunt, G., Kelly, M.G. and Adams, N.M. (2000) Data Mining for Fun and Profit. Statistical Science, 15(2) 111–131.
Article Google Scholar
Hand, D.J. and Yu, K. (2001) Idiot’s Bayes-not so stupid after all? To appear in International Statistical Review
Google Scholar
Kohavi, R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Knowledge Discovery and Artificial Intelligence, San Mateo,CA, Morgan Kaufmann, 1137–1143.
Google Scholar
Kohavi, R. (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decisiontree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 202–207.
Google Scholar
Kohavi, R. and John, G.H. (1997) Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273–324.
Article MATH Google Scholar
Kononenko, I. (1990) Comparision of inductive and naive Bayesian learning approaches to automatic knowledge aquisition. In B. Wielinga et al. (eds.) Current trends in knowledge acquisition, Amsterdam, IOS Press.
Google Scholar
Langley, P. (1994) Selection of relevant features in machine learning. In AAAI Fall Symposium on Relevance, 140–144.
Google Scholar
McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley and Sons.
Google Scholar
Moore,. A.W., Hill, D.J. and Johnson, M.P. (1992) An empirical investigation of brute force to choose features, smoothers and function approximators. In Hansin, S. et al. (eds.) Computational Learning Theory and Natural Learning Systems Conference, Vol. 3, MIT Press.
Google Scholar
Ohmann, C., Yang, Q., Künneke, M., Stöltzing, H., Thon, K. and Lorenz W. (1988) Bayes theorem and conditional dependence of symptoms: different models applied to data of upper gastrointestinal bleeding. Methods of information in Medicine, 27, 73–83.
Google Scholar
Quinlan, J.R. (1993) C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Shapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998) Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.
Article MathSciNet Google Scholar
Todd, B.S. amd Stamper, R. (1994) The relative accuracy of a variety of medical diagnostic programmes. Methods of Information in Medicine, 33, 402–416.
Google Scholar
Webb, A. (1999) Statistical Pattern Recognition. London: Arnold.
MATH Google Scholar
Zheng, Z. and Webb, G.I. (2000) Lazy Learning of Bayesian Rules. Machine Learning, 41, 53–84.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Imperial College, 180 Queen’s Gate, London, SW7 2BZ, UK
José Tomé A.S. Ferreira, David G.T. Denison & David J. Hand

Authors

José Tomé A.S. Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
David G.T. Denison
View author publications
You can also search for this author in PubMed Google Scholar
David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Royal Institute of Technology, Centre for Autonomous Systems, 10044, Stockholm, Sweden
Frank Hoffmann
Imperial College, Huxley Building 180 Queen’s Gate, London, SW7 2BZ, UK
David J. Hand & Niall Adams &
Department of Computer Science, Vanderbilt University, Box 1679, Station B, Nashville, TN, 37235, USA
Douglas Fisher
Department of Computer Science, New University of Lisbon, 2825-114, Caparica, Portugal
Gabriela Guimaraes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, J.T.A., Denison, D.G., Hand, D.J. (2001). Data Mining with Products of Trees. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-44816-0_17
Published: 03 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42581-6
Online ISBN: 978-3-540-44816-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics