Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6870))

Included in the following conference series:

Abstract

Data mining methods are widely used across many disciplines to identify patterns, rules or associations among huge volumes of data. While in the past mostly black box methods such as neural nets and support vector machines have been heavily used in technical domains, methods that have explanation capability are preferred in medical domains. Nowadays, data mining methods with explanation capability are also used for technical domains after more work on advantages and disadvantages of the methods has been done. Decision tree induction such as C4.5 is the most preferred method since it works well on average regardless of the data set being used. This method can easily learn a decision tree without heavy user interaction while in neural nets a lot of time is spent on training the net. Cross-validation methods can be applied to decision tree induction methods; these methods ensure that the calculated error rate comes close to the true error rate. The error rate and the particular goodness measures described in this paper are quantitative measures that provide help in understanding the quality of the model. The data collection problem with its noise problem has to be considered. Specialized accuracy measures and proper visualization methods help to understand this problem. Since decision tree induction is a supervised method, the associated data labels constitute another problem. Re-labeling should be considered after the model has been learnt. This paper also discusses how to fit the learnt model to the expert´s knowledge. The problem of comparing two decision trees in accordance with its explanation power is discussed. Finally, we summarize our methodology on interpretation of decision trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Perner, P. (ed.): Data Mining on Multimedia Data. LNCS, vol. 2558. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  2. Dougherty, J., Kohavi, R., Sahamin, M.: Supervised and Unsupervised Discretization of Continuous Features. In: 14th IJCAI on Machine Learning, pp. 194–202 (1995)

    Google Scholar 

  3. Quinlan, J.R.: Decision trees and multivalued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence, vol. 11. Oxford University Press, Oxford (1988)

    Google Scholar 

  4. Copersmith, D., Hong, S.J., Hosking, J.: Partitioning nominal attributes in decision trees. Journal of Data Mining and Knowledge Discovery 3(2), 100–200 (1999)

    Google Scholar 

  5. de Mantaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)

    Article  Google Scholar 

  6. White, A.P., Lui, W.Z.: Bias in information-based measures in decision tree induction. Machine Learning 15, 321–329 (1994)

    MATH  Google Scholar 

  7. Philipow, E.: Handbuch der Elektrotechnik, Bd 2 Grundlagen der Informationstechnik, pp. 158–171. Technik Verlag, Berlin (1987)

    Google Scholar 

  8. Perner, P., Zscherpel, U., Jacobsen, C.: A Comparision between Neural Networks and Decision Trees based on Data from Industrial Radiographic Testing. Pattern Recognition Letters 22, 47–54 (2001)

    Article  MATH  Google Scholar 

  9. Georg, G., Séroussi, B., Bouaud, J.: Does GEM-Encoding Clinical Practice Guidelines Improve the Quality of Knowledge Bases? A Study with the Rule-Based Formalism. In: AMIA Annu Symp Proc. 2003, pp. 254–258 (2003)

    Google Scholar 

  10. Lee, S., Lee, S.H., Lee, K.C., Lee, M.H., Harashima, F.: Intelligent performance management of networks for advanced manufacturing systems. IEEE Transactions on Industrial Electronics 48(4), 731–741 (2001)

    Article  Google Scholar 

  11. Bazijanec, B., Gausmann, O., Turowski, K.: Parsing Effort in a B2B Integration Scenario - An Industrial Case Study. In: Enterprise Interoperability II, Part IX, pp. 783–794. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Muggleton, S.: Duce - An Oracle-based Approach to Constructive Induction. In: Proceeding of the Tenth International Join Conference on Artificial Intelligence (IJCAI 1987), pp. 287–292 (1987)

    Google Scholar 

  13. Wu, B., Nevatia, R.: Improving Part based Object Detection by Unsupervised, Online Boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8 (2007)

    Google Scholar 

  14. Whiteley, J.R., Davis, J.F.: A similarity-based approach to interpretation of sensor data using adaptive resonance theory. Computers & Chemical Engineering 18(7), 637–661 (1994)

    Article  Google Scholar 

  15. Perner, P.: Prototype-Based Classification. Applied Intelligence 28(3), 238–246 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Perner, P. (2011). How to Interpret Decision Trees?. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2011. Lecture Notes in Computer Science(), vol 6870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23184-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23184-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23183-4

  • Online ISBN: 978-3-642-23184-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics