Skip to main content

Dealing with Missing Values in a Probabilistic Decision Tree during Classification

  • Chapter
Mining Complex Data

Part of the book series: Studies in Computational Intelligence ((SCI,volume 165))

Abstract

This chapter deals with the problem of missing values in decision trees during classification. Our approach is derived from the ordered attribute trees method, proposed by Lobo and Numao in 2000, which builds a decision tree for each attribute and uses these trees to fill the missing attribute values. Our method takes into account the dependence between attributes by using Mutual Information. The result of the classification process is a probability distribution instead of a single class. In this chapter, we explain our approach, we then present tests performed of our approach on several real databases and we compare them with those given by Lobo’s method and Quinlan’s method. We also measure the quality of our classification results. Finally, we calculate the complexity of our approach and we discuss some perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth International Group, CA (1984)

    MATH  Google Scholar 

  2. Roderick, J.A.L., Donald, B.R.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, Chichester (2002)

    MATH  Google Scholar 

  3. Shannon, C., Weaver, W.: Théorie mathématique de la communication. Les classiques des sciences humaines (1949)

    Google Scholar 

  4. Witten Ian, H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  5. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Diego (1993)

    Google Scholar 

  6. Tan, M.S.P.N., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)

    Google Scholar 

  7. Friedman, J.H., Kohavi, R., Yun, Y.: Lazy Decision Trees. In: Proc. 13th National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pp. 717–724. AAAI press, Menlo Park (1996)

    Google Scholar 

  8. Hawarah, L., Simonet, A., Simonet, M.: A probabilistic approach to classify incomplete objects using decision trees. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 549–558. Springer, Heidelberg (2004)

    Google Scholar 

  9. Hawarah, L., Simonet, A., Simonet, M.: Evaluation of a probabilistic approach to classify incomplete objects using decision trees. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 193–202. Springer, Heidelberg (2006a)

    Chapter  Google Scholar 

  10. Hawarah, L., Simonet, A., Simonet, M.: The complexity of a probabilistic approach to deal with missing values in a decision tree. In: 8th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), Romania, pp. 26–29 (September 2006b)

    Google Scholar 

  11. Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Sixth IEEE International Conference on Data Mining-Workshops (ICDM Workshops 2006), Hong Kong, China, December 18-22 (2006c)

    Google Scholar 

  12. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: ML 1992: Proceedings of the ninth international workshop on Machine learning, San Francisco, CA, USA, pp. 249–256 (1992)

    Google Scholar 

  13. Kononenko, I., Bratko, I., Roskar, E.: Experiments in Automatic Learning of Medical Diagnostic Rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia (1984)

    Google Scholar 

  14. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: ECML: European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

  15. Liu, W.Z., White, A.P., Thompson, S.G., Bramer, M.A.: Techniques for Dealing with Missing Values in Classification. In: Liu, X., Cohen, P., Berthold, M. (eds.) Advances on Intelligent Data Analysis. Springer, Heidelberg (1997)

    Google Scholar 

  16. Lobo, O., Numao, M.: Ordered estimation of missing values. In: PAKDD 1999: Proceedings of the Third Pacific Asia Conference on Methodologies for Knowledge Discovery and Data Mining, pp. 499–503. Springer, London (1999)

    Google Scholar 

  17. Lobo, O., Numao, M.: Ordered estimation of missing values for propositional learning. The Japanese Society for Artificial Intelligence 1, 162–168 (2000)

    Google Scholar 

  18. Lobo, O., Numao, M.: Suitable domains for using ordered attribute trees to impute missing values. IEICE TRANS INF. and SYST, E84-D, no. 2 (February 2001)

    Google Scholar 

  19. Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction. Technical Report. ICS-TR-95-27 (1995)

    Google Scholar 

  20. Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  21. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  22. Quinlan, J.R.: Unknown attribute values in induction. In: Proc. Sixth International Machine Learning Workshop. Morgan Kaufmann, San Francisco (1989)

    Google Scholar 

  23. Quinlan, J.R.: Probabilistic decision trees. Machine Learning: an Artificial Intelligence Approach 3, 140–152 (1990)

    Google Scholar 

  24. Robnik-Sikonja, M., Kononenko, I.: Attribute dependencies, understandability and split selection in tree based models. In: Machine Learning: Proceedings of the Sixteenth International Conference. ICML 1999, pp. 344–353 (1999)

    Google Scholar 

  25. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hawarah, L., Simonet, A., Simonet, M. (2009). Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds) Mining Complex Data. Studies in Computational Intelligence, vol 165. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88067-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88067-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88066-0

  • Online ISBN: 978-3-540-88067-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics