Authors:
Alexandros Doumanoglou
1
;
2
;
Dimitrios Zarpalas
1
and
Kurt Driessens
2
Affiliations:
1
Information Technologies Institute, Centre for Research and Technology Hellas, 1st Km Charilaou - Thermi Road, Thessaloniki, Greece
;
2
Department of Advanced Computing Sciences, Maastricht University, 6200 MD Maastricht, The Netherlands
Keyword(s):
Concept Basis, Interpretable Basis, Unsupervised Learning, Explainable AI, Interpretability, Computer Vision, Deep Learning.
Abstract:
Previous research has shown that, to a large-extend, deep feature representations of image-patches that belong
to the same semantic concept, lie in the same direction of an image classifier’s feature space. Conventional
approaches compute these directions using annotated data, forming an interpretable feature space basis (also
referred as concept basis). Unsupervised Interpretable Basis Extraction (UIBE) was recently proposed as a
novel method that can suggest an interpretable basis without annotations. In this work, we show that the
addition of a classification loss term to the unsupervised basis search, can lead to bases suggestions that align
even more with interpretable concepts. This loss term enforces the basis vectors to point towards directions
that maximally influence the classifier’s predictions, exploiting concept knowledge encoded by the network.
We evaluate our work by deriving a concept basis for three popular convolutional networks, trained on three
different
datasets. Experiments show that our contributions enhance the interpretability of the learned bases,
according to the interpretability metrics, by up-to +45.8% relative improvement. As additional practical
contribution, we report hyper-parameters, found by hyper-parameter search in controlled benchmarks, that can
serve as a starting point for applications of the proposed method in real-world scenarios that lack annotations.
(More)