Elsevier

Neurocomputing

Volume 70, Issues 1–3, December 2006, Pages 384-397
Neurocomputing

Extracting rules from multilayer perceptrons in classification problems: A clustering-based approach

https://doi.org/10.1016/j.neucom.2005.12.127Get rights and content

Abstract

Multilayer perceptrons adjust their internal parameters performing vector mappings from the input to the output space. Although they may achieve high classification accuracy, the knowledge acquired by such neural networks is usually incomprehensible for humans. This fact is a major obstacle in data mining applications, in which ultimately understandable patterns (like classification rules) are very important. Therefore, many algorithms for rule extraction from neural networks have been developed. This work presents a method to extract rules from multilayer perceptrons trained in classification problems. The rule extraction algorithm basically consists of two steps. First, a clustering genetic algorithm is applied to find clusters of hidden unit activation values. Then, classification rules describing these clusters, in relation to the inputs, are generated. The proposed approach is experimentally evaluated in four datasets that are benchmarks for data mining applications and in a real-world meteorological dataset, leading to interesting results.

Introduction

Neural networks have been successfully applied to solve data mining problems in several domains. In this sense, multilayer perceptrons (MPs) may achieve high classification accuracy, but the knowledge acquired by such neural networks is usually incomprehensible for humans [13]. This fact can be a major obstacle in data mining applications, in which human-interpretable patterns describing the data, like symbolic rules or other forms of knowledge structure, are important [37]. Therefore, many methods have been developed to alleviate the lack of explanation of neural network (NN) models.

Neural networks (NNs) learn by adjusting their connection weights, which somehow reflect the statistical properties of the data [17]. Thus, the knowledge acquired by a NN is codified on its connection weights, which in turn are associated to both its architecture and activation functions [2]. In this context, the process of knowledge acquisition from NNs usually implies the use of algorithms based on the values of either connection weights or hidden unit activations. The algorithms designed to perform such task are generally called algorithms for rule extraction from neural networks. The task of rule extraction from NNs is a computationally hard problem [23], and heuristics have been developed to overcome its combinatorial complexity [69]. In our work, a clustering genetic algorithm (CGA) is employed for rule extraction from MPs. The proposed method is based on the hidden unit activation values and consists of two main steps. First, the CGA is employed to find clusters of hidden unit activation values. Then, these clusters are translated into logical rules.

Andrews et al. [2] suggested a classification scheme for rule extraction algorithms. The proposed scheme is based on four aspects: (i) form and quality of the extracted rules; (ii) necessity of specific neural network training algorithms; (iii) complexity of the rule extraction algorithm; (iv) translucency of the neural network. According to this scheme, our method provides If…Then propositional rules and it does not require any specific MP training algorithm. In addition, it can be applied in classification problems involving discrete and continuous attributes. The rule extraction algorithm complexity is based on the employed CGA. As far as the translucency of the NN is concerned, there are three approaches: decompositional, pedagogical and eclectic. Decompositional approaches involve rule extraction at the level of hidden and output units, which are mapped in a binary form. Pedagogical approaches try to map inputs directly into outputs, using machine-learning techniques. In our work, hidden unit activation expressions are employed to get classification rules by means of a CGA. Thus, our approach can be classified as eclectic, because it is based on both decompositional and pedagogical approaches.

The remainder of the paper is organized as follows. Section 2 situates the proposed method in the context of related work. Section 3 describes the CGA, which is applied to extract rules from MPs trained in classification problems. In Section 4, we present empirical results in four datasets that are benchmarks for data mining (Iris Plants, Wisconsin Breast Cancer, Australian Credit Approval and Pima Indians Diabetes) as well as in a real-world meteorological dataset. Finally, Section 5 concludes our work.

Section snippets

Related work

Several methods for rule extraction from NNs have been proposed in the literature, showing the increasing importance of this issue in several domains. Under this perspective, this section provides a brief description of several rule extraction methods. To do so, we follow a chronological order, considering the original work of each author. Then, we present our proposed method, comparing it with similar ones described in the literature.

In 1988, Gallant [19] proposed the first approach to

Clustering Genetic Algorithm (CGA)

Clustering is a task in which one seeks to identify a finite set of categories (clusters) to describe a given data set, both maximizing homogeneity within each cluster and heterogeneity among different clusters. In other words, instances that belong to the same cluster should be more similar to each other than instances that belong to different clusters. Thus, it is necessary to devise means of evaluating the similarities among instances. This problem is usually tackled indirectly, i.e.

Experimental evaluation

The proposed method was evaluated by means of experiments in five datasets. The first case studied is a pedagogical example, which shows how our method works. To do so, we used the Iris Plants dataset, which is a well-known data mining benchmark. In the sequel, we describe experiments performed in three datasets that are also data mining benchmarks — Wisconsin Breast Cancer, Australian Credit Approval, and Pima Indians Diabetes — and in a real-world meteorological dataset. The benchmark

Conclusions

Neural networks usually provide high classification accuracy. However, the knowledge acquired by such models is generally incomprehensible for humans. This fact is a major obstacle in data mining applications, in which ultimately understandable patterns (like classification rules) are very important. Therefore, many algorithms for rule extraction from neural networks have been developed. This paper described a method that employs a CGA to extract rules from MPs trained in classification

Acknowledgments

We are grateful to the Brazilian Research Agencies CNPq, FAPESP, and FAPERJ for their financial support. We would also like to thank Dr. Ricardo J. G. B. Campello and Dr. Leandro N. de Castro for their valuable suggestions on making Section 3 more readable.

Eduardo Raul Hruschka received his B.Sc. degree in Civil Engineering from Federal University of Paraná, Brazil, in 1995, and his M.Sc. and Ph.D. degrees in Computational Systems from Federal University of Rio de Janeiro, Brazil, in 1998 and 2001, respectively. He is currently assistant professor at Catholic University of Santos (UniSantos), Brazil. His main research interest is data mining, with particular emphasis on evolutionary algorithms, artificial neural networks, clustering algorithms,

References (70)

  • M.W. Craven, J.W. Shavlik, Using sampling and queries to extract rules from trained neural networks, in: Proceedings of...
  • W. Duch et al.

    Extraction of logical rules from training data using backpropagation networks

    Neural Process. Lett.

    (1998)
  • W. Duch, R. Adamczak, K. Grabczewski, M. Ishikawa, H. Ueda, Extraction of crisp logical rules using constrained...
  • W. Duch, R. Adamczak, K. Grabczewski, Optimization of Logical Rules Derived by Neural Procedures, in: Proceedings of...
  • W. Duch et al.

    Hybrid neural-global minimization method of logical rule extraction

    J. Adv. Comput. Intell.

    (1999)
  • W. Duch et al.

    A new methodology of extraction

    optimization and application of crisp and fuzzy logical rules, IEEE Trans. Neural Networks

    (2000)
  • B.S. Everitt et al.

    Cluster Analysis

    (2001)
  • E. Falkenauer

    Genetic Algorithms and Grouping Problems

    (1998)
  • L. Fu

    Rule generation from neural networks

    IEEE Trans. Syst. Man Cybern

    (1994)
  • L. Fu

    Neural Networks in Computer Intelligence

    (1994)
  • L. Fu

    Knowledge-based connectionism for revising domain theories

    IEEE Trans. Syst. Man Cybern.

    (1993)
  • S.I. Gallant

    Connectionist expert systems

    Commun. ACM

    (1988)
  • S.I. Gallant

    Neural Network Learning and Expert Systems

    (1994)
  • A.S.D. Garcez et al.

    Symbolic knowledge extraction from trained neural networks: a sound approach

    Artif. Intell.

    (2001)
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • M. Golea, On the complexity of rule-extraction from neural networks and network-querying, in: Proceedings of the Rule...
  • S.S. Haykin

    Neural Networks: A Comprehensive Foundation

    (1998)
  • C. Hermann, A. Their, Backpropagation for neural DNF- and CNF-networks, Technical Report, FG Intellektik, TH Darmstadt,...
  • E.R. Hruschka, N.F.F. Ebecken, A clustering genetic algorithm for extracting rules from multilayer perceptrons trained...
  • E.R. Hruschka et al.

    A clustering genetic algorithm for extracting rules from supervised neural network models in data mining tasks

    Int. J. Comput. Syst. Signals (IJCSS)

    (2000)
  • E.R. Hruschka, N.F.F. Ebecken, Applying a clustering genetic algorithm for extracting rules from a supervised neural...
  • E.R. Hruschka, N.F.F. Ebecken, Rule extraction from neural networks: modified RX algorithm, in: Proceedings of the IEEE...
  • E.R. Hruschka, N.F.F. Ebecken, Using a clustering genetic algorithm for rule extraction from artificial neural...
  • E.R. Hruschka et al.

    A genetic algorithm for cluster analysis

    Intell. Data Anal.

    (2003)
  • E.R. Hruschka, N.F.F. Ebecken, Rules from supervised neural network in data mining tasks. In: J.M. Abbe, J.I. da Silva...
  • Cited by (78)

    • Extract interpretability-accuracy balanced rules from artificial neural networks: A review

      2020, Neurocomputing
      Citation Excerpt :

      The proposed algorithm consists of two phases as follow: Eclectic approach combines both the decompositional and pedagogical approaches.and Ebecken [64] present RX algorithm which is based on the work of [65] to extract rules from trained MLP in classification problems. This technique is designed for shallow MLP which consists of two parts shown as follow [57]:

    View all citing articles on Scopus

    Eduardo Raul Hruschka received his B.Sc. degree in Civil Engineering from Federal University of Paraná, Brazil, in 1995, and his M.Sc. and Ph.D. degrees in Computational Systems from Federal University of Rio de Janeiro, Brazil, in 1998 and 2001, respectively. He is currently assistant professor at Catholic University of Santos (UniSantos), Brazil. His main research interest is data mining, with particular emphasis on evolutionary algorithms, artificial neural networks, clustering algorithms, feature selection, and missing values imputation.

    Nelson Francisco Favilla Ebecken is Professor of Computational Systems at COPPE/UFRJ, the Engineering Graduated Center of Federal University of Rio de Janeiro. His research focuses on basic methodologies for modeling and extracting knowledge from data and their application across different disciplines. He develops and integrates ideas and computational tools from statistics and information theory with artificial intelligence paradigms.

    View full text