Extracting rules from multilayer perceptrons in classification problems: A clustering-based approach
Introduction
Neural networks have been successfully applied to solve data mining problems in several domains. In this sense, multilayer perceptrons (MPs) may achieve high classification accuracy, but the knowledge acquired by such neural networks is usually incomprehensible for humans [13]. This fact can be a major obstacle in data mining applications, in which human-interpretable patterns describing the data, like symbolic rules or other forms of knowledge structure, are important [37]. Therefore, many methods have been developed to alleviate the lack of explanation of neural network (NN) models.
Neural networks (NNs) learn by adjusting their connection weights, which somehow reflect the statistical properties of the data [17]. Thus, the knowledge acquired by a NN is codified on its connection weights, which in turn are associated to both its architecture and activation functions [2]. In this context, the process of knowledge acquisition from NNs usually implies the use of algorithms based on the values of either connection weights or hidden unit activations. The algorithms designed to perform such task are generally called algorithms for rule extraction from neural networks. The task of rule extraction from NNs is a computationally hard problem [23], and heuristics have been developed to overcome its combinatorial complexity [69]. In our work, a clustering genetic algorithm (CGA) is employed for rule extraction from MPs. The proposed method is based on the hidden unit activation values and consists of two main steps. First, the CGA is employed to find clusters of hidden unit activation values. Then, these clusters are translated into logical rules.
Andrews et al. [2] suggested a classification scheme for rule extraction algorithms. The proposed scheme is based on four aspects: (i) form and quality of the extracted rules; (ii) necessity of specific neural network training algorithms; (iii) complexity of the rule extraction algorithm; (iv) translucency of the neural network. According to this scheme, our method provides If…Then propositional rules and it does not require any specific MP training algorithm. In addition, it can be applied in classification problems involving discrete and continuous attributes. The rule extraction algorithm complexity is based on the employed CGA. As far as the translucency of the NN is concerned, there are three approaches: decompositional, pedagogical and eclectic. Decompositional approaches involve rule extraction at the level of hidden and output units, which are mapped in a binary form. Pedagogical approaches try to map inputs directly into outputs, using machine-learning techniques. In our work, hidden unit activation expressions are employed to get classification rules by means of a CGA. Thus, our approach can be classified as eclectic, because it is based on both decompositional and pedagogical approaches.
The remainder of the paper is organized as follows. Section 2 situates the proposed method in the context of related work. Section 3 describes the CGA, which is applied to extract rules from MPs trained in classification problems. In Section 4, we present empirical results in four datasets that are benchmarks for data mining (Iris Plants, Wisconsin Breast Cancer, Australian Credit Approval and Pima Indians Diabetes) as well as in a real-world meteorological dataset. Finally, Section 5 concludes our work.
Section snippets
Related work
Several methods for rule extraction from NNs have been proposed in the literature, showing the increasing importance of this issue in several domains. Under this perspective, this section provides a brief description of several rule extraction methods. To do so, we follow a chronological order, considering the original work of each author. Then, we present our proposed method, comparing it with similar ones described in the literature.
In 1988, Gallant [19] proposed the first approach to
Clustering Genetic Algorithm (CGA)
Clustering is a task in which one seeks to identify a finite set of categories (clusters) to describe a given data set, both maximizing homogeneity within each cluster and heterogeneity among different clusters. In other words, instances that belong to the same cluster should be more similar to each other than instances that belong to different clusters. Thus, it is necessary to devise means of evaluating the similarities among instances. This problem is usually tackled indirectly, i.e.
Experimental evaluation
The proposed method was evaluated by means of experiments in five datasets. The first case studied is a pedagogical example, which shows how our method works. To do so, we used the Iris Plants dataset, which is a well-known data mining benchmark. In the sequel, we describe experiments performed in three datasets that are also data mining benchmarks — Wisconsin Breast Cancer, Australian Credit Approval, and Pima Indians Diabetes — and in a real-world meteorological dataset. The benchmark
Conclusions
Neural networks usually provide high classification accuracy. However, the knowledge acquired by such models is generally incomprehensible for humans. This fact is a major obstacle in data mining applications, in which ultimately understandable patterns (like classification rules) are very important. Therefore, many algorithms for rule extraction from neural networks have been developed. This paper described a method that employs a CGA to extract rules from MPs trained in classification
Acknowledgments
We are grateful to the Brazilian Research Agencies CNPq, FAPESP, and FAPERJ for their financial support. We would also like to thank Dr. Ricardo J. G. B. Campello and Dr. Leandro N. de Castro for their valuable suggestions on making Section 3 more readable.
Eduardo Raul Hruschka received his B.Sc. degree in Civil Engineering from Federal University of Paraná, Brazil, in 1995, and his M.Sc. and Ph.D. degrees in Computational Systems from Federal University of Rio de Janeiro, Brazil, in 1998 and 2001, respectively. He is currently assistant professor at Catholic University of Santos (UniSantos), Brazil. His main research interest is data mining, with particular emphasis on evolutionary algorithms, artificial neural networks, clustering algorithms,
References (70)
- et al.
Critique of techniques for extracting rules from trained artificial neural networks
Knowledge Based Sys.
(1995) - et al.
Dynamic on-line clustering and state extraction: an approach to symbolic learning
Neural Networks
(1998) - et al.
A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders
Artif. Intell. Med.
(2000) - et al.
Symbolic approximation of feedforward neural networks
- et al.
Neurolinear: from neural networks to oblique decision rules
Neurocomputing
(1997) - et al.
Template-based algorithm for connectionist rule extraction
- et al.
An Overview of Combinatorial Data Analysis
- R. Baron, Knowledge extraction from neural networks: A survey, in: Report no. 94-17, Laboratoire de l’Informatique du...
- et al.
Are artificial neural networks black boxes?
IEEE Trans. Neural Networks
(1997) - M.W. Craven, J.W. Shavlik, Extracting tree-structured representations of trained networks, Advances in Neural...
Extraction of logical rules from training data using backpropagation networks
Neural Process. Lett.
Hybrid neural-global minimization method of logical rule extraction
J. Adv. Comput. Intell.
A new methodology of extraction
optimization and application of crisp and fuzzy logical rules, IEEE Trans. Neural Networks
Cluster Analysis
Genetic Algorithms and Grouping Problems
Rule generation from neural networks
IEEE Trans. Syst. Man Cybern
Neural Networks in Computer Intelligence
Knowledge-based connectionism for revising domain theories
IEEE Trans. Syst. Man Cybern.
Connectionist expert systems
Commun. ACM
Neural Network Learning and Expert Systems
Symbolic knowledge extraction from trained neural networks: a sound approach
Artif. Intell.
Genetic Algorithms in Search, Optimization and Machine Learning
Neural Networks: A Comprehensive Foundation
A clustering genetic algorithm for extracting rules from supervised neural network models in data mining tasks
Int. J. Comput. Syst. Signals (IJCSS)
A genetic algorithm for cluster analysis
Intell. Data Anal.
Cited by (78)
Explainable AI for Industry 4.0: Semantic Representation of Deep Learning Models
2022, Procedia Computer ScienceExtract interpretability-accuracy balanced rules from artificial neural networks: A review
2020, NeurocomputingCitation Excerpt :The proposed algorithm consists of two phases as follow: Eclectic approach combines both the decompositional and pedagogical approaches.and Ebecken [64] present RX algorithm which is based on the work of [65] to extract rules from trained MLP in classification problems. This technique is designed for shallow MLP which consists of two parts shown as follow [57]:
A new transparent ensemble method based on deep learning
2019, Procedia Computer ScienceComprehensible and transparent rule extraction using neural network
2024, Multimedia Tools and Applications
Eduardo Raul Hruschka received his B.Sc. degree in Civil Engineering from Federal University of Paraná, Brazil, in 1995, and his M.Sc. and Ph.D. degrees in Computational Systems from Federal University of Rio de Janeiro, Brazil, in 1998 and 2001, respectively. He is currently assistant professor at Catholic University of Santos (UniSantos), Brazil. His main research interest is data mining, with particular emphasis on evolutionary algorithms, artificial neural networks, clustering algorithms, feature selection, and missing values imputation.
Nelson Francisco Favilla Ebecken is Professor of Computational Systems at COPPE/UFRJ, the Engineering Graduated Center of Federal University of Rio de Janeiro. His research focuses on basic methodologies for modeling and extracting knowledge from data and their application across different disciplines. He develops and integrates ideas and computational tools from statistics and information theory with artificial intelligence paradigms.