Skip to main content

The Need for Interpretability Biases

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XVII (IDA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11191))

Included in the following conference series:

Abstract

In his seminal paper, Mitchell has defined bias as “any basis for choosing one generalization over another, other than strict consistency with the observed training instances”, such as the choice of the hypothesis language or any form of preference relation between its elements. The most commonly used form is a simplicity bias, which prefers simpler hypotheses over more complex ones, even in cases when the latter provide a better fit to the data. Such a bias not only helps to avoid overfitting, but is also commonly considered to foster interpretability. In this talk, we will question this assumption, in particular with respect to commonly used rule learning heuristics that aim at learning rules that are as simple as possible. We will, in contrary, argue that in many cases, short rules are not desirable from the point of view of interpretability, and present some evidence from crowdsourcing experiments that support this hypothesis. To understand interpretability, we must relate machine learning biases to cognitive biases, which let humans prefer certain explanations over others, even in cases when such a preference cannot be rationally justified. Only then can we develop suitable interpretability biases for machine learning.

Much of the material in this paper is based on Fürnkranz et al. (2018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Entities should not be multiplied beyond necessity.

  2. 2.

    The differences between the two views are irrelevant for our argumentation.

  3. 3.

    https://archive.ics.uci.edu/ml/datasets.html.

  4. 4.

    Since our experiments were based on subjective comparisons of pairs of rules, a more precise formulation would be, “comparatively more relevant than the most relevant condition in an alternative rule”.

References

  • Allahyari, H., Lavesson, N.: User-oriented assessment of classification model understandability. In: Kofod-Petersen, A., Heintz, F., Langseth, H. (eds.) Proceedings of the 11th Scandinavian Conference on Artificial Intelligence (SCAI-11), pp. 11–19 (2011)

    Google Scholar 

  • Bensusan, H.: God doesn’t always shave with Occam’s Razor — learning when and how to prune. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 119–124. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026680

    Chapter  Google Scholar 

  • Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)

    Article  MathSciNet  Google Scholar 

  • Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning (ML-95), pp. 115–123. Morgan Kaufmann, Lake Tahoe (1995)

    Chapter  Google Scholar 

  • Domingos, P.: The role of Occam’s Razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)

    Article  Google Scholar 

  • Freitas, A.A.: Comprehensible classification models: a position paper. SIGKDD Explor. 15(1), 1–10 (2013)

    Article  Google Scholar 

  • Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)

    Article  Google Scholar 

  • Fürnkranz, J., Kliegr, T., Paulheim, H.: On cognitive preferences and the interpretability of rule-based models. arXiv preprint arXiv:1803.01316 (2018)

  • Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2

    Book  MATH  Google Scholar 

  • Goldstein, D.G., Gigerenzer, G.: The recognition heuristic: how ignorance makes us smart. Simple Heuristics That Make Us Smart, pp. 37–58. Oxford (1999)

    Google Scholar 

  • Gordon, D.F., DesJardins, M.: Evaluation and selection of biases in machine learning. Mach. Learn. 20(1–2), 5–22 (1995)

    Google Scholar 

  • Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)

    Google Scholar 

  • Hahn, H.: Überflüssige Wesenheiten: Occams Rasiermesser. Veröffentlichungen des Vereines Ernst Mach, Wien (1930)

    Google Scholar 

  • Hertwig, R., Benz, B., Krauss, S.: The conjunction fallacy and the many meanings of and. Cognition 108(3), 740–753 (2008)

    Article  Google Scholar 

  • Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., Baesens, B.: An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 51(1), 141–154 (2011)

    Article  Google Scholar 

  • Kemeny, J.G.: The use of simplicity in induction. Philos. Rev. 62(3), 391–408 (1953)

    Article  Google Scholar 

  • Kliegr, T., Bahník, Š., Fürnkranz, J.: A review of possible effects of cognitive biases on interpretation of rule-based machine learning models. arXiv preprint arXiv:1804.02969 (2018)

  • Kodratoff, Y.: The comprehensibility manifesto. KDD Nuggets, 94(9) (1994)

    Google Scholar 

  • Kononenko, I.: Inductive and Bayesian learning in medical diagnosis. Appl. Artif. Intell. 7, 317–337 (1993)

    Article  Google Scholar 

  • Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. TCS. Springer, New York (2008). https://doi.org/10.1007/978-0-387-49820-1

    Book  MATH  Google Scholar 

  • Mehta, M., Rissanen, J., Agrawal, R.: MDL-based decision tree pruning. In: Fayyad, U., Uthurusamy, R. (eds.) Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 216–221. AAAI Press (1995)

    Google Scholar 

  • Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–162 (1983)

    Article  MathSciNet  Google Scholar 

  • Michie, D.: Machine learning in the next five years. In: Proceedings of the 3rd European Working Session on Learning (EWSL-88), pp. 107–122. Pitman (1988)

    Google Scholar 

  • Mitchell, T.M., The need for biases in learning generalizations. Technical report, Computer Science Department, Rutgers University, New Brunswick (1980)

    Google Scholar 

  • Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Reddy, R. (ed.) Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI-77), pp. 305–310. William Kaufmann (1977)

    Google Scholar 

  • Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)

    Google Scholar 

  • Muggleton, S.H., Schmid, U., Zeller, C., Tamaddoni-Nezhad, A., Besold, T.: Ultra-strong machine learning: comprehensibility of programs learned with ILP. Mach. Learn. 1–22 (2018)

    Google Scholar 

  • Munroe, R. Kolmogorov directions. www.xkcd.com, A webcomic of romance, sarcasm, math, and language (2013)

  • Murphy, P.M., Pazzani, M.J.: Exploring the decision forest: an empirical investigation of Occam’s Razor in decision tree induction. J. Artif. Intell. Res. 1, 257–275 (1994)

    Article  Google Scholar 

  • Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_44

    Chapter  Google Scholar 

  • Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proceedings of the International Conference on Web Intelligence and Semantics (WIMS’12) (2012)

    Google Scholar 

  • Piltaver, R., Luštrek, M., Gams, M., Martinčić-Ipšić, S.: What makes classification trees comprehensible? Expert Syst. Appl. 62, 333–346 (2016)

    Article  Google Scholar 

  • Pohl, R.: Cognitive Illusions: A Handbook on Fallacies and Biases in Thinking, Judgement and Memory, 2nd edn. Psychology Press, London (2017)

    Google Scholar 

  • Post, H.: Simplicity in scientific theories. Br. J. Philos. Sci. 11(41), 32–41 (1960)

    Article  Google Scholar 

  • Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990)

    Google Scholar 

  • Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  Google Scholar 

  • Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10, 153–178 (1993)

    Google Scholar 

  • Stecher, J., Janssen, F., Fürnkranz, J.: Separating rule refinement and rule selection heuristics in inductive rule learning. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 114–129. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_8

    Chapter  Google Scholar 

  • Stecher, J., Janssen, F., Fürnkranz, J.: Shorter rules are better, aren’t they? In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 279–294. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_18

    Chapter  Google Scholar 

  • Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. Data Knowl. Eng. 42(2), 189–222 (2002)

    Article  Google Scholar 

  • Tversky, A., Kahneman, D.: Belief in the law of small numbers. Psychol. Bull. 76(2), 105–110 (1971)

    Article  Google Scholar 

  • Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases. Science 185(4157), 1124–1131 (1974)

    Article  Google Scholar 

  • Tversky, A., Kahneman, D.: Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychol. Rev. 90(4), 293–315 (1983)

    Article  Google Scholar 

  • Valmarska, A., Lavrač, N., Fürnkranz, J., Robnik-Sikonja, M.: Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 81, 147–162 (2017)

    Article  Google Scholar 

  • Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)

    Article  MathSciNet  Google Scholar 

  • Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–194 (1968)

    Article  Google Scholar 

  • Webb, G.I.: Further experimental evidence against the utility of Occam’s razor. J. Artif. Intell. Res. 4, 397–417 (1996)

    Article  Google Scholar 

  • Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)

    Chapter  Google Scholar 

  • Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) Proceedings of the 2nd SIAM International Conference on Data Mining (SDM-02), Arlington (2002)

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank Frederik Janssen and Julius Stecher for providing us with their code, Eyke Hüllermeier, Frank Jäkel, Niklas Lavesson, Nada Lavrač and Kai-Ming Ting for interesting discussions and pointers to related work, and Jilles Vreeken for pointing us to Munroe (2013). We are also grateful for the insightful comments of the reviewers of (Fürnkranz et al., 2018), which helped us considerably to focus our paper. TK was supported by grant IGA 33/2018 of the Faculty of Informatics and Statistics, University of Economics, Prague.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Fürnkranz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fürnkranz, J., Kliegr, T. (2018). The Need for Interpretability Biases. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01768-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01767-5

  • Online ISBN: 978-3-030-01768-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics