Dealing with Missing Values in a Probabilistic Decision Tree during Classification

Hawarah, Lamis; Simonet, Ana; Simonet, Michel

doi:10.1007/978-3-540-88067-7_4

Lamis Hawarah⁴,
Ana Simonet⁴ &
Michel Simonet⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 165))

853 Accesses
2 Citations
1 Altmetric

Abstract

This chapter deals with the problem of missing values in decision trees during classification. Our approach is derived from the ordered attribute trees method, proposed by Lobo and Numao in 2000, which builds a decision tree for each attribute and uses these trees to fill the missing attribute values. Our method takes into account the dependence between attributes by using Mutual Information. The result of the classification process is a probability distribution instead of a single class. In this chapter, we explain our approach, we then present tests performed of our approach on several real databases and we compare them with those given by Lobo’s method and Quinlan’s method. We also measure the quality of our classification results. Finally, we calculate the complexity of our approach and we discuss some perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth International Group, CA (1984)
MATH Google Scholar
Roderick, J.A.L., Donald, B.R.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, Chichester (2002)
MATH Google Scholar
Shannon, C., Weaver, W.: Théorie mathématique de la communication. Les classiques des sciences humaines (1949)
Google Scholar
Witten Ian, H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Diego (1993)
Google Scholar
Tan, M.S.P.N., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)
Google Scholar
Friedman, J.H., Kohavi, R., Yun, Y.: Lazy Decision Trees. In: Proc. 13th National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pp. 717–724. AAAI press, Menlo Park (1996)
Google Scholar
Hawarah, L., Simonet, A., Simonet, M.: A probabilistic approach to classify incomplete objects using decision trees. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 549–558. Springer, Heidelberg (2004)
Google Scholar
Hawarah, L., Simonet, A., Simonet, M.: Evaluation of a probabilistic approach to classify incomplete objects using decision trees. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 193–202. Springer, Heidelberg (2006a)
Chapter Google Scholar
Hawarah, L., Simonet, A., Simonet, M.: The complexity of a probabilistic approach to deal with missing values in a decision tree. In: 8th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), Romania, pp. 26–29 (September 2006b)
Google Scholar
Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Sixth IEEE International Conference on Data Mining-Workshops (ICDM Workshops 2006), Hong Kong, China, December 18-22 (2006c)
Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: ML 1992: Proceedings of the ninth international workshop on Machine learning, San Francisco, CA, USA, pp. 249–256 (1992)
Google Scholar
Kononenko, I., Bratko, I., Roskar, E.: Experiments in Automatic Learning of Medical Diagnostic Rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia (1984)
Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: ECML: European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Liu, W.Z., White, A.P., Thompson, S.G., Bramer, M.A.: Techniques for Dealing with Missing Values in Classification. In: Liu, X., Cohen, P., Berthold, M. (eds.) Advances on Intelligent Data Analysis. Springer, Heidelberg (1997)
Google Scholar
Lobo, O., Numao, M.: Ordered estimation of missing values. In: PAKDD 1999: Proceedings of the Third Pacific Asia Conference on Methodologies for Knowledge Discovery and Data Mining, pp. 499–503. Springer, London (1999)
Google Scholar
Lobo, O., Numao, M.: Ordered estimation of missing values for propositional learning. The Japanese Society for Artificial Intelligence 1, 162–168 (2000)
Google Scholar
Lobo, O., Numao, M.: Suitable domains for using ordered attribute trees to impute missing values. IEICE TRANS INF. and SYST, E84-D, no. 2 (February 2001)
Google Scholar
Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction. Technical Report. ICS-TR-95-27 (1995)
Google Scholar
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: Unknown attribute values in induction. In: Proc. Sixth International Machine Learning Workshop. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Quinlan, J.R.: Probabilistic decision trees. Machine Learning: an Artificial Intelligence Approach 3, 140–152 (1990)
Google Scholar
Robnik-Sikonja, M., Kononenko, I.: Attribute dependencies, understandability and split selection in tree based models. In: Machine Learning: Proceedings of the Sixteenth International Conference. ICML 1999, pp. 344–353 (1999)
Google Scholar
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculté de Médecine, Institut d’Ingénierie et de l’Information de Santé (TIMC), 38700, La Tronche, France
Lamis Hawarah, Ana Simonet & Michel Simonet

Authors

Lamis Hawarah
View author publications
You can also search for this author in PubMed Google Scholar
Ana Simonet
View author publications
You can also search for this author in PubMed Google Scholar
Michel Simonet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Lyon, Lyon, France
Djamel A. Zighed & Hakim Hacid &
Shimane University, Shimane, Japan
Shusaku Tsumoto
University of North Carolina, Charlotte, NC, USA
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hawarah, L., Simonet, A., Simonet, M. (2009). Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds) Mining Complex Data. Studies in Computational Intelligence, vol 165. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88067-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-88067-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88066-0
Online ISBN: 978-3-540-88067-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics