The Need for Interpretability Biases

Fürnkranz, Johannes; Kliegr, Tomáš

doi:10.1007/978-3-030-01768-2_2

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11191))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1256 Accesses
2 Citations

Abstract

In his seminal paper, Mitchell has defined bias as “any basis for choosing one generalization over another, other than strict consistency with the observed training instances”, such as the choice of the hypothesis language or any form of preference relation between its elements. The most commonly used form is a simplicity bias, which prefers simpler hypotheses over more complex ones, even in cases when the latter provide a better fit to the data. Such a bias not only helps to avoid overfitting, but is also commonly considered to foster interpretability. In this talk, we will question this assumption, in particular with respect to commonly used rule learning heuristics that aim at learning rules that are as simple as possible. We will, in contrary, argue that in many cases, short rules are not desirable from the point of view of interpretability, and present some evidence from crowdsourcing experiments that support this hypothesis. To understand interpretability, we must relate machine learning biases to cognitive biases, which let humans prefer certain explanations over others, even in cases when such a preference cannot be rationally justified. Only then can we develop suitable interpretability biases for machine learning.

Much of the material in this paper is based on Fürnkranz et al. (2018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Entities should not be multiplied beyond necessity.
2.
The differences between the two views are irrelevant for our argumentation.
3.
https://archive.ics.uci.edu/ml/datasets.html.
4.
Since our experiments were based on subjective comparisons of pairs of rules, a more precise formulation would be, “comparatively more relevant than the most relevant condition in an alternative rule”.

References

Allahyari, H., Lavesson, N.: User-oriented assessment of classification model understandability. In: Kofod-Petersen, A., Heintz, F., Langseth, H. (eds.) Proceedings of the 11th Scandinavian Conference on Artificial Intelligence (SCAI-11), pp. 11–19 (2011)
Google Scholar
Bensusan, H.: God doesn’t always shave with Occam’s Razor — learning when and how to prune. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 119–124. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026680
Chapter Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)
Article MathSciNet Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning (ML-95), pp. 115–123. Morgan Kaufmann, Lake Tahoe (1995)
Chapter Google Scholar
Domingos, P.: The role of Occam’s Razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)
Article Google Scholar
Freitas, A.A.: Comprehensible classification models: a position paper. SIGKDD Explor. 15(1), 1–10 (2013)
Article Google Scholar
Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)
Article Google Scholar
Fürnkranz, J., Kliegr, T., Paulheim, H.: On cognitive preferences and the interpretability of rule-based models. arXiv preprint arXiv:1803.01316 (2018)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2
Book MATH Google Scholar
Goldstein, D.G., Gigerenzer, G.: The recognition heuristic: how ignorance makes us smart. Simple Heuristics That Make Us Smart, pp. 37–58. Oxford (1999)
Google Scholar
Gordon, D.F., DesJardins, M.: Evaluation and selection of biases in machine learning. Mach. Learn. 20(1–2), 5–22 (1995)
Google Scholar
Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Google Scholar
Hahn, H.: Überflüssige Wesenheiten: Occams Rasiermesser. Veröffentlichungen des Vereines Ernst Mach, Wien (1930)
Google Scholar
Hertwig, R., Benz, B., Krauss, S.: The conjunction fallacy and the many meanings of and. Cognition 108(3), 740–753 (2008)
Article Google Scholar
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., Baesens, B.: An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 51(1), 141–154 (2011)
Article Google Scholar
Kemeny, J.G.: The use of simplicity in induction. Philos. Rev. 62(3), 391–408 (1953)
Article Google Scholar
Kliegr, T., Bahník, Š., Fürnkranz, J.: A review of possible effects of cognitive biases on interpretation of rule-based machine learning models. arXiv preprint arXiv:1804.02969 (2018)
Kodratoff, Y.: The comprehensibility manifesto. KDD Nuggets, 94(9) (1994)
Google Scholar
Kononenko, I.: Inductive and Bayesian learning in medical diagnosis. Appl. Artif. Intell. 7, 317–337 (1993)
Article Google Scholar
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. TCS. Springer, New York (2008). https://doi.org/10.1007/978-0-387-49820-1
Book MATH Google Scholar
Mehta, M., Rissanen, J., Agrawal, R.: MDL-based decision tree pruning. In: Fayyad, U., Uthurusamy, R. (eds.) Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 216–221. AAAI Press (1995)
Google Scholar
Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–162 (1983)
Article MathSciNet Google Scholar
Michie, D.: Machine learning in the next five years. In: Proceedings of the 3rd European Working Session on Learning (EWSL-88), pp. 107–122. Pitman (1988)
Google Scholar
Mitchell, T.M., The need for biases in learning generalizations. Technical report, Computer Science Department, Rutgers University, New Brunswick (1980)
Google Scholar
Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Reddy, R. (ed.) Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI-77), pp. 305–310. William Kaufmann (1977)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Google Scholar
Muggleton, S.H., Schmid, U., Zeller, C., Tamaddoni-Nezhad, A., Besold, T.: Ultra-strong machine learning: comprehensibility of programs learned with ILP. Mach. Learn. 1–22 (2018)
Google Scholar
Munroe, R. Kolmogorov directions. www.xkcd.com, A webcomic of romance, sarcasm, math, and language (2013)
Murphy, P.M., Pazzani, M.J.: Exploring the decision forest: an empirical investigation of Occam’s Razor in decision tree induction. J. Artif. Intell. Res. 1, 257–275 (1994)
Article Google Scholar
Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_44
Chapter Google Scholar
Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proceedings of the International Conference on Web Intelligence and Semantics (WIMS’12) (2012)
Google Scholar
Piltaver, R., Luštrek, M., Gams, M., Martinčić-Ipšić, S.: What makes classification trees comprehensible? Expert Syst. Appl. 62, 333–346 (2016)
Article Google Scholar
Pohl, R.: Cognitive Illusions: A Handbook on Fallacies and Biases in Thinking, Judgement and Memory, 2nd edn. Psychology Press, London (2017)
Google Scholar
Post, H.: Simplicity in scientific theories. Br. J. Philos. Sci. 11(41), 32–41 (1960)
Article Google Scholar
Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article Google Scholar
Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10, 153–178 (1993)
Google Scholar
Stecher, J., Janssen, F., Fürnkranz, J.: Separating rule refinement and rule selection heuristics in inductive rule learning. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 114–129. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_8
Chapter Google Scholar
Stecher, J., Janssen, F., Fürnkranz, J.: Shorter rules are better, aren’t they? In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 279–294. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_18
Chapter Google Scholar
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. Data Knowl. Eng. 42(2), 189–222 (2002)
Article Google Scholar
Tversky, A., Kahneman, D.: Belief in the law of small numbers. Psychol. Bull. 76(2), 105–110 (1971)
Article Google Scholar
Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases. Science 185(4157), 1124–1131 (1974)
Article Google Scholar
Tversky, A., Kahneman, D.: Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychol. Rev. 90(4), 293–315 (1983)
Article Google Scholar
Valmarska, A., Lavrač, N., Fürnkranz, J., Robnik-Sikonja, M.: Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 81, 147–162 (2017)
Article Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Article MathSciNet Google Scholar
Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–194 (1968)
Article Google Scholar
Webb, G.I.: Further experimental evidence against the utility of Occam’s razor. J. Artif. Intell. Res. 4, 397–417 (1996)
Article Google Scholar
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)
Chapter Google Scholar
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) Proceedings of the 2nd SIAM International Conference on Data Mining (SDM-02), Arlington (2002)
Chapter Google Scholar

Download references

Acknowledgements

We would like to thank Frederik Janssen and Julius Stecher for providing us with their code, Eyke Hüllermeier, Frank Jäkel, Niklas Lavesson, Nada Lavrač and Kai-Ming Ting for interesting discussions and pointers to related work, and Jilles Vreeken for pointing us to Munroe (2013). We are also grateful for the insightful comments of the reviewers of (Fürnkranz et al., 2018), which helped us considerably to focus our paper. TK was supported by grant IGA 33/2018 of the Faculty of Informatics and Statistics, University of Economics, Prague.

Author information

Authors and Affiliations

Department of Computer Science, Knowledge Engineering Group, TU Darmstadt, Darmstadt, Germany
Johannes Fürnkranz
Department of Information and Knowledge Engineering, University of Economics, Prague, Czech Republic
Tomáš Kliegr

Authors

Johannes Fürnkranz
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Kliegr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Fürnkranz .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Wouter Duivesteijn
Department of Information and Computing Sciences, University Utrecht, Utrecht, The Netherlands
Arno Siebes
University of Helsinki, Helsinki, Finland
Antti Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fürnkranz, J., Kliegr, T. (2018). The Need for Interpretability Biases. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-01768-2_2
Published: 05 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01767-5
Online ISBN: 978-3-030-01768-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics