Abstract
This chapter deals with the problem of missing values in decision trees during classification. Our approach is derived from the ordered attribute trees method, proposed by Lobo and Numao in 2000, which builds a decision tree for each attribute and uses these trees to fill the missing attribute values. Our method takes into account the dependence between attributes by using Mutual Information. The result of the classification process is a probability distribution instead of a single class. In this chapter, we explain our approach, we then present tests performed of our approach on several real databases and we compare them with those given by Lobo’s method and Quinlan’s method. We also measure the quality of our classification results. Finally, we calculate the complexity of our approach and we discuss some perspectives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth International Group, CA (1984)
Roderick, J.A.L., Donald, B.R.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, Chichester (2002)
Shannon, C., Weaver, W.: Théorie mathématique de la communication. Les classiques des sciences humaines (1949)
Witten Ian, H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Diego (1993)
Tan, M.S.P.N., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)
Friedman, J.H., Kohavi, R., Yun, Y.: Lazy Decision Trees. In: Proc. 13th National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pp. 717–724. AAAI press, Menlo Park (1996)
Hawarah, L., Simonet, A., Simonet, M.: A probabilistic approach to classify incomplete objects using decision trees. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 549–558. Springer, Heidelberg (2004)
Hawarah, L., Simonet, A., Simonet, M.: Evaluation of a probabilistic approach to classify incomplete objects using decision trees. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 193–202. Springer, Heidelberg (2006a)
Hawarah, L., Simonet, A., Simonet, M.: The complexity of a probabilistic approach to deal with missing values in a decision tree. In: 8th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), Romania, pp. 26–29 (September 2006b)
Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Sixth IEEE International Conference on Data Mining-Workshops (ICDM Workshops 2006), Hong Kong, China, December 18-22 (2006c)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: ML 1992: Proceedings of the ninth international workshop on Machine learning, San Francisco, CA, USA, pp. 249–256 (1992)
Kononenko, I., Bratko, I., Roskar, E.: Experiments in Automatic Learning of Medical Diagnostic Rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia (1984)
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: ECML: European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)
Liu, W.Z., White, A.P., Thompson, S.G., Bramer, M.A.: Techniques for Dealing with Missing Values in Classification. In: Liu, X., Cohen, P., Berthold, M. (eds.) Advances on Intelligent Data Analysis. Springer, Heidelberg (1997)
Lobo, O., Numao, M.: Ordered estimation of missing values. In: PAKDD 1999: Proceedings of the Third Pacific Asia Conference on Methodologies for Knowledge Discovery and Data Mining, pp. 499–503. Springer, London (1999)
Lobo, O., Numao, M.: Ordered estimation of missing values for propositional learning. The Japanese Society for Artificial Intelligence 1, 162–168 (2000)
Lobo, O., Numao, M.: Suitable domains for using ordered attribute trees to impute missing values. IEICE TRANS INF. and SYST, E84-D, no. 2 (February 2001)
Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction. Technical Report. ICS-TR-95-27 (1995)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: Unknown attribute values in induction. In: Proc. Sixth International Machine Learning Workshop. Morgan Kaufmann, San Francisco (1989)
Quinlan, J.R.: Probabilistic decision trees. Machine Learning: an Artificial Intelligence Approach 3, 140–152 (1990)
Robnik-Sikonja, M., Kononenko, I.: Attribute dependencies, understandability and split selection in tree based models. In: Machine Learning: Proceedings of the Sixteenth International Conference. ICML 1999, pp. 344–353 (1999)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hawarah, L., Simonet, A., Simonet, M. (2009). Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds) Mining Complex Data. Studies in Computational Intelligence, vol 165. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88067-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-88067-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88066-0
Online ISBN: 978-3-540-88067-7
eBook Packages: EngineeringEngineering (R0)