Abstract
In his seminal paper, Mitchell has defined bias as “any basis for choosing one generalization over another, other than strict consistency with the observed training instances”, such as the choice of the hypothesis language or any form of preference relation between its elements. The most commonly used form is a simplicity bias, which prefers simpler hypotheses over more complex ones, even in cases when the latter provide a better fit to the data. Such a bias not only helps to avoid overfitting, but is also commonly considered to foster interpretability. In this talk, we will question this assumption, in particular with respect to commonly used rule learning heuristics that aim at learning rules that are as simple as possible. We will, in contrary, argue that in many cases, short rules are not desirable from the point of view of interpretability, and present some evidence from crowdsourcing experiments that support this hypothesis. To understand interpretability, we must relate machine learning biases to cognitive biases, which let humans prefer certain explanations over others, even in cases when such a preference cannot be rationally justified. Only then can we develop suitable interpretability biases for machine learning.
Much of the material in this paper is based on Fürnkranz et al. (2018).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Entities should not be multiplied beyond necessity.
- 2.
The differences between the two views are irrelevant for our argumentation.
- 3.
- 4.
Since our experiments were based on subjective comparisons of pairs of rules, a more precise formulation would be, “comparatively more relevant than the most relevant condition in an alternative rule”.
References
Allahyari, H., Lavesson, N.: User-oriented assessment of classification model understandability. In: Kofod-Petersen, A., Heintz, F., Langseth, H. (eds.) Proceedings of the 11th Scandinavian Conference on Artificial Intelligence (SCAI-11), pp. 11–19 (2011)
Bensusan, H.: God doesn’t always shave with Occam’s Razor — learning when and how to prune. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 119–124. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026680
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)
Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning (ML-95), pp. 115–123. Morgan Kaufmann, Lake Tahoe (1995)
Domingos, P.: The role of Occam’s Razor in knowledge discovery. Data Min. Knowl. Discov. 3(4), 409–425 (1999)
Freitas, A.A.: Comprehensible classification models: a position paper. SIGKDD Explor. 15(1), 1–10 (2013)
Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)
Fürnkranz, J., Kliegr, T., Paulheim, H.: On cognitive preferences and the interpretability of rule-based models. arXiv preprint arXiv:1803.01316 (2018)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2
Goldstein, D.G., Gigerenzer, G.: The recognition heuristic: how ignorance makes us smart. Simple Heuristics That Make Us Smart, pp. 37–58. Oxford (1999)
Gordon, D.F., DesJardins, M.: Evaluation and selection of biases in machine learning. Mach. Learn. 20(1–2), 5–22 (1995)
Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Hahn, H.: Überflüssige Wesenheiten: Occams Rasiermesser. Veröffentlichungen des Vereines Ernst Mach, Wien (1930)
Hertwig, R., Benz, B., Krauss, S.: The conjunction fallacy and the many meanings of and. Cognition 108(3), 740–753 (2008)
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., Baesens, B.: An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 51(1), 141–154 (2011)
Kemeny, J.G.: The use of simplicity in induction. Philos. Rev. 62(3), 391–408 (1953)
Kliegr, T., Bahník, Š., Fürnkranz, J.: A review of possible effects of cognitive biases on interpretation of rule-based machine learning models. arXiv preprint arXiv:1804.02969 (2018)
Kodratoff, Y.: The comprehensibility manifesto. KDD Nuggets, 94(9) (1994)
Kononenko, I.: Inductive and Bayesian learning in medical diagnosis. Appl. Artif. Intell. 7, 317–337 (1993)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. TCS. Springer, New York (2008). https://doi.org/10.1007/978-0-387-49820-1
Mehta, M., Rissanen, J., Agrawal, R.: MDL-based decision tree pruning. In: Fayyad, U., Uthurusamy, R. (eds.) Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 216–221. AAAI Press (1995)
Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–162 (1983)
Michie, D.: Machine learning in the next five years. In: Proceedings of the 3rd European Working Session on Learning (EWSL-88), pp. 107–122. Pitman (1988)
Mitchell, T.M., The need for biases in learning generalizations. Technical report, Computer Science Department, Rutgers University, New Brunswick (1980)
Mitchell, T.M.: Version spaces: a candidate elimination approach to rule learning. In: Reddy, R. (ed.) Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI-77), pp. 305–310. William Kaufmann (1977)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Muggleton, S.H., Schmid, U., Zeller, C., Tamaddoni-Nezhad, A., Besold, T.: Ultra-strong machine learning: comprehensibility of programs learned with ILP. Mach. Learn. 1–22 (2018)
Munroe, R. Kolmogorov directions. www.xkcd.com, A webcomic of romance, sarcasm, math, and language (2013)
Murphy, P.M., Pazzani, M.J.: Exploring the decision forest: an empirical investigation of Occam’s Razor in decision tree induction. J. Artif. Intell. Res. 1, 257–275 (1994)
Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_44
Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proceedings of the International Conference on Web Intelligence and Semantics (WIMS’12) (2012)
Piltaver, R., Luštrek, M., Gams, M., Martinčić-Ipšić, S.: What makes classification trees comprehensible? Expert Syst. Appl. 62, 333–346 (2016)
Pohl, R.: Cognitive Illusions: A Handbook on Fallacies and Biases in Thinking, Judgement and Memory, 2nd edn. Psychology Press, London (2017)
Post, H.: Simplicity in scientific theories. Br. J. Philos. Sci. 11(41), 32–41 (1960)
Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10, 153–178 (1993)
Stecher, J., Janssen, F., Fürnkranz, J.: Separating rule refinement and rule selection heuristics in inductive rule learning. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 114–129. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_8
Stecher, J., Janssen, F., Fürnkranz, J.: Shorter rules are better, aren’t they? In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 279–294. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_18
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. Data Knowl. Eng. 42(2), 189–222 (2002)
Tversky, A., Kahneman, D.: Belief in the law of small numbers. Psychol. Bull. 76(2), 105–110 (1971)
Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases. Science 185(4157), 1124–1131 (1974)
Tversky, A., Kahneman, D.: Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychol. Rev. 90(4), 293–315 (1983)
Valmarska, A., Lavrač, N., Fürnkranz, J., Robnik-Sikonja, M.: Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 81, 147–162 (2017)
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–194 (1968)
Webb, G.I.: Further experimental evidence against the utility of Occam’s razor. J. Artif. Intell. Res. 4, 397–417 (1996)
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) Proceedings of the 2nd SIAM International Conference on Data Mining (SDM-02), Arlington (2002)
Acknowledgements
We would like to thank Frederik Janssen and Julius Stecher for providing us with their code, Eyke Hüllermeier, Frank Jäkel, Niklas Lavesson, Nada Lavrač and Kai-Ming Ting for interesting discussions and pointers to related work, and Jilles Vreeken for pointing us to Munroe (2013). We are also grateful for the insightful comments of the reviewers of (Fürnkranz et al., 2018), which helped us considerably to focus our paper. TK was supported by grant IGA 33/2018 of the Faculty of Informatics and Statistics, University of Economics, Prague.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Fürnkranz, J., Kliegr, T. (2018). The Need for Interpretability Biases. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-01768-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01767-5
Online ISBN: 978-3-030-01768-2
eBook Packages: Computer ScienceComputer Science (R0)