Conventional symbolic rule extraction from multi layer perceptrons with discrete and continuous activation functions

Bologna, Guido

doi:10.13097/archive-ouverte/unige:105875

This last decade multi-layer perceptrons (MLPs) have been widely used in classification tasks. Nevertheless, the difficulty of explaining their results represents a main drawback for their acceptance in critical domain applications such as medical diagnosis. In this context how can we trust a black box without any form of explanation capability ? To redress this situation, the internal representation of a multi-layer perceptron should be transformed into symbolic rules. Such a network is a neural expert system. In the field of symbolic rule extraction from neural networks Andrews et al. proposed a taxonomy to explain and compare the characteristics of the existing techniques. After having studied what we consider the main contribution of the domain we propose the new approach of extracting symbolic rules by precisely locating the discriminant frontiers between two classes. Basically, in our mathematical analysis we point out that a frontier is built according to an equation with one linear term and one logarithmic term. When the logarithmic term is constant the frontier is a hyper-plane. However, as the combination of hyper-planes gives polyhedrons, the symbolic rule representation corresponding to hyper-rectangles is not matched. So, the idea is to introduce an MLP architecture which builds axis-parallel hyper-planes. The Interpretable Multi Layer Perceptron is a special multi layer perceptron architecture which splits the input space into hyper-rectangles. In this model the key idea is to use threshold activation functions in the first hidden layer. Rule extraction is carried out by solving a Boolean minimization problem. In practice rules with 100% fidelity are extracted in polynomial time. To our knowledge no other rule extraction technique reaches such a performance in every classification problem. In addition, input variables do not need to be quantized and rules can also be inserted to perform rule refinement. Finally, in spite of the reduced "power of expression" with respect to the standard multi layer perceptron, IMLP is also an universal approximator. The key ideas introduced in IMLP have been applied to other architectures denoted as OMLP, HOOMLP, DIMLP, and MTB. Briefly, OMLP (Oblique Multi Layer Perceptron) is a model from which we extract rules having linear combinations of antecedents. At the level of its internal representation it is the most similar to the standard multi-layer perceptron. By creating the HOOMLP (High Order OMLP) model we introduce the notion of paraboloidal ellipsoidal, spherical rule extraction. That is the creation of rules splitting the input space into hyperparaboloids, hyper-ellipsis, and hyper-spheres. DIMLP (Discretized IMLP) is a generalization of IMLP with a more compact internal representation than IMLP. Finally, MTB (Modular Transparent Boxes) is a model in which several interpretable sub-models are combined together. The remarkable characteristic of modular transparent boxes resides in the fact that symbolic rules are not uniquely extracted at the level of each single model, but also at the level of the global combination. IMLP has been tested on 1L applications of the public domain and two real world special applications. The conclusion is that from a predictive accuracy point of view, in half of the classification problems IMLP performs better than MLP and C4.5 (one of the main references in rule extraction from datasets). Finally, concerning symbolic rules, IMLP has the tendency to generate less comprehensible rules than C4.5. However, applying an approximate covering in the boolean minimization step of the IMLP rule extraction algorithm has given more understandable rules at the price of slightly worse accuracy.

Archive ouverte UNIGE

Conventional symbolic rule extraction from multi layer perceptrons with discrete and continuous activation functions

Technical informations