Abstract
Decision tree induction has been studied extensively in machine learning as a solution for classification problems. The way the linear decision trees partition the search space is found to be comprehensible and hence appealing to data modelers. Comprehensibility is an important aspect of models used in medical data mining as it determines model credibility and even acceptability. In the practical sense though, inordinately long decision trees compounded by replication problems detracts from comprehensibility. This demerit can be partially attributed to their rigid structure that is unable to handle complex non-linear or/and continuous data. To address this issue we introduce a novel hybrid multivariate decision tree composed of polynomial, fuzzy and decision tree structures. The polynomial nature of these multivariate trees enable them to perform well in non-linear territory while the fuzzy members are used to squash continuous variables. By trading-off comprehensibility and performance using a multi-objective genetic programming optimization algorithm, we can induce polynomial-fuzzy decision trees (PFDT) that are smaller, more compact and of better performance than their linear decision tree (LDT) counterparts. In this paper we discuss the structural differences between PFDT and LDT (C4.5) and compare the size and performance of their models using medical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. K Murthy. Automatic construction of decision trees from data: a multidisciplinary survey. Kluwer academic publishers, Boston.1–49, 1998.
A. Ittner and M. Schlosser. Discovery of relevant new features by generating non-linear decision trees. Proc. of 2nd International Conference on Knowledge Discovery and Data Mining, 108–113. AAAI Press, Menlo Park, CA, Portland, Oregon, USA, 1996.
A. Ittner. Non-linear decision trees NDT. International conference on machine learning, 1996.
N. Nikolaev and V. Slavov. Concepts of inductive genetic programming. EuroGP: First European workshop on genetic programming, lecture notes in computer science. LNCS 1391, Springer, Berlin, 1998, pp. 49–59.
M.C.J Bot and W.B. Langdon. Application of genetic programming to induction of linear classification trees. EuroGP 2000:European conference, Edindburgh, Scotland, UK, April 2000.
L.A Breslow and D.W. Aha. Simplifying decision trees: a survey. Navy Center for Applied Research in Knowledge Engineering Review Technical Report, 1998.
C. Emmanouillidis. Evolutionary multi-objective feature selection and ROC analysis with applocation to industrial machinery fault diagnosis. Evolutionary methods for design, optimization and control, Barcelona 2002.
A. Hunter. Expression Inference-genetic symbolic classification integrated with non-linear coefficient optimization. AISC 2002:117–127.
J. Hanley and B. McNeill. The meaning and use of the area under a receiver operator characteristic curve. Diagn. Radiology, 143;29–36, (1982).
A. Ittner, J. Zeidler, R. Rossius, W. Dilger and M. Schlosser. Feature space partitioning by non-Linear and fuzzy decision trees. Chemnitz University of Technology, Department of Computer Science, Chemnitz, 1997.
L. Hyafil and R.L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17, 1976.
W. Loh and N. Vanichsetakul. Tree structured classification via generalized discriminant analysis. J. of the American Statistical Association, 83(403):715–728, 1988.
S. Murthy, S. Kasif, S. Salzberg and R. Beigel. A system for induction of oblique decision trees. J. of Artificial Intelligence Research, 2:1–33, August 1994.
S.E Hampson and D.J. Volper. Linear function neurons:Structure and training. Biological Cybernetics, 53(4):203–217, 1986.
D. Michie. Inducing knowledge from data; First Principles. Unpublished manuscript for a talk given at the Seventh International Conference on Machine Learning. Austin, Texas.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and regression trees. Belmont, CA: Wadsworth International Group.
C. Pena-Reyes, and M. Sipper. Fuzzy CoCo:Balancing accuracy and interpretability of fuzzy models by means of coevolution. Logic Systems Laboretory, Swiss Federal Institute of Technology in Lausanne, CH-1015 Lausanne, Switzerland.
N. Lavrac. Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1999) 3–23.
A. Hunter. Using multiobjective genetic programming to infer logistic polynomial regression models. 15th European Conference on Artificial Intelligence, Lyon, Prance, 2002.
M. Bohanec, and I. Bratko. Trading accuracy for simplicity in decision tree. Machine Learning, 15:223–250, 1994.
J.J. Oliver, and D.J. Hand. On pruning and averaging decision trees. Proceedings of the 12th International Machine Learning Conference(pp. 430–437). Tahoe City, CA: Morgan Kaufmann, 1995.
L.B. Holder. Intermediate decision trees. Proceedings of Fourteenth International Joint Conference on Artificial Intelligence(pp. 1056–1063). Montreal: Morgan Kaufmann, 1995.
G.I. Webb. Further experimental evidence against the utility of Occam’s razor. JAIR, 4, pp.397–417, 1996.
L. Breiman, J.H. Friedman, R.A. Olsen, and C.J. Stone. Classification and regression trees. Belmont, CA: Wadsworth International Group, 1984.
J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
A. Blumer, A. Ehrnfecht, A. Hausler and M.K. Warmuth. Occam’s razor. Information processing letters, 24:377–380, 1987.
J. Risannen. Stochastic complexity modelling. Ann. Statist, 14:1080–1100, 1986.
J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
K. Krawiec. Genetic programming using partial order of solutions for pattern recognition. Proceedings of II National Conference ‘Computer Recognition Systems’ KOSYR2001, pp. 427–433.
N.I. Nikolaev and V. Slavov. Inductive genetic programming with decision trees. Intelligent Data Analysis 2(1998) 31–44.
R.V. Katya and P.J. Fleming. Multiobjective genetic programming: A nonlinear system identification application. Electronics Letters, 34(9), pp. 930–931.
KC. Tan, Q. Yu, C.M. Cheng and T.H. Lee. Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in medicine 27(2003) 129–154.
C.M. Bishop. Neural Networks for PatternRecognition. Oxford University Press, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag London
About this paper
Cite this paper
Mugambi, E.M., Hunter, A., Oatley, G., Kennedy, L. (2004). Polynomial-Fuzzy Decision Tree Structures for Classifying Medical Data. In: Coenen, F., Preece, A., Macintosh, A. (eds) Research and Development in Intelligent Systems XX. SGAI 2003. Springer, London. https://doi.org/10.1007/978-0-85729-412-8_12
Download citation
DOI: https://doi.org/10.1007/978-0-85729-412-8_12
Publisher Name: Springer, London
Print ISBN: 978-1-85233-780-3
Online ISBN: 978-0-85729-412-8
eBook Packages: Springer Book Archive