Splitting Data in Decision Trees Using the New False-Positives Criterion

Boutsinas, Basilis; Tsekouronas, Ioannis X.

doi:10.1007/978-3-540-24674-9_19

Splitting Data in Decision Trees Using the New False-Positives Criterion

Basilis Boutsinas¹⁸ &
Ioannis X. Tsekouronas¹⁹

Conference paper

1397 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3025))

Abstract

Classification is a widely used technique in various fields, including data mining and statistical data analysis. Decision trees are one of the most frequently occurring knowledge representation schemes used in classification algorithms. Decision trees can offer a more practical way of capturing knowledge than coding rules in more conventional languages. Decision trees are generally constructed by means of a top down growth procedure, which starts from the root node and greedily chooses a split of the data that maximizes some cost function. The order, in which attributes are chosen, according to the cost function, determines how efficient the decision tree is. Gain, Gain ratio, Gini and Twoing are some of the most famous splitting criteria used in calculating the cost function. In this paper, we propose a new splitting criterion, namely the False-Positives criterion. The key idea behind the False-Positives criterion is to consider the instances having the most frequent class value, with respect to a certain attribute value, as true-positives and all the instances having the rest class values, with respect to that attribute value, as false positives. We present extensive empirical tests, which demonstrate the efficiency of the proposed criterion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html
Boutsinas, B., Vrahatis, M.N.: Artificial Nonmonotonic Neural Networks. Artif. Intelligence 132(1), 1–38 (2001)
Article MATH MathSciNet Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks, Calif. (1984)
MATH Google Scholar
Buntine, W.: Graphical Models for Discovering Knowledge. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 59–82 (1996)
Google Scholar
Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3(4), 261–283 (1989)
Google Scholar
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10, 57–78 (1993)
Google Scholar
Dzeroski, S.: Inductive Logic Programming and Knowledge Discovery in Databases. In: Fayyad, U.M., Piatetsky-Shapiro, U.M.,, G. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 117–152 (1996)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press (1996)
Google Scholar
Friedman, J.H.: Multiple Adaptive Regression Splines. Annals of Statistics 19, 1–141 (1991)
Article MATH MathSciNet Google Scholar
Muggleton, S.: Inductive Logic Programming. A.P.I.C, vol. 38. Academic Press, London (1992)
MATH Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, CA (1993)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318–363. MIT Press, Cambridge (1986)
Google Scholar
Utgoff, P.E.: Incremental Induction of Decision Trees. Machine Learning 4, 161–186 (1989)
Article Google Scholar
Watson, J.D., Hopkins, N.H., Roberts, J.W., Steitz, J.A., Weiner, A.M.: Molecular Biology of the Gene, vol. 1. Benjamin Cummings, Menlo Park (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Business Administration, University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, Greece
Basilis Boutsinas
Department of Mathematics, University of Patras, Greece
Ioannis X. Tsekouronas

Authors

Basilis Boutsinas
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis X. Tsekouronas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Info and Communication Systems Eng, Aegean University, 83200, Karlovassi, Samos, Greece
George A. Vouros
Department of Informatics, University of Piraeus, Piraeus, Greece
Themistoklis Panayiotopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boutsinas, B., Tsekouronas, I.X. (2004). Splitting Data in Decision Trees Using the New False-Positives Criterion. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-24674-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21937-8
Online ISBN: 978-3-540-24674-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics