A neuro-fuzzy network to generate human-understandable knowledge from data

Action editor: Paolo Frasconi
https://doi.org/10.1016/S1389-0417(01)00055-9Get rights and content

Abstract

Neuro-fuzzy networks have been successfully applied to extract knowledge from data in the form of fuzzy rules. However, one drawback with the neuro-fuzzy approach is that the fuzzy rules induced by the learning process are not necessarily understandable. The lack of readability is essentially due to the high dimensionality of the parameter space that leads to excessive flexibility in the modification of parameters during learning. In this paper, to obtain readable knowledge from data, we propose a new neuro-fuzzy model and its learning algorithm that works in a parameter space with reduced dimensionality. The dimensionality of the new parameter space is necessary and sufficient to generate human-understandable fuzzy rules, in the sense formally defined by a set of properties. The learning procedure is based on a gradient descent technique and the proposed model is general enough to be applied to other neuro-fuzzy architectures. Simulation studies on a benchmark and a real-life problem are carried out to embody the idea of the paper.

Introduction

Neuro-Fuzzy models have been developed with the aim of integrating the learning capability of neural networks with the representational power of fuzzy inference systems, thus producing learning machines capable of acquiring knowledge from data and representing it in form of fuzzy rules Jang & Sun, 1995, Jang, 1993, Nauch et al., 1997, Brown & Harris, 1994, Zurada & Lozowski, 1996. However, the interpretability of fuzzy knowledge acquired by a neuro-fuzzy system may be heavily compromised by the learning phase of the network, if no special attention is paid during data-based rule generation and adaptation.

The requirement of interpretability is particularly felt when neuro-fuzzy systems are applied to real-world problems Nauck, 1995, Halgamuge & Glesner, 1994 such as decision support in medicine, finance, commerce and other applications. In such application areas the knowledge about the behavior of the decision system should be transparent and physically sound so as to meet the cognitive capacity of human beings and to mimic the way they perform high-decision processes. As a consequence, the lack of interpretability often makes neuro-fuzzy models less useful than classical fuzzy inference systems Pedrycz & Gomide, 1998, Cios et al., 1998, Ross, 1997, where the knowledge base is manually built and learning techniques are not adopted.

Since interpretability itself is a fuzzy and subjective concept, it is hard to find an explicit and exhaustive list of properties that, when violated, make the fuzzy rule base to loose its readability. Some important aspects pertaining the interpretability of fuzzy rules have been discussed in (Lofti et al., 1996, Jin et al., 1998, Jin et al., 2000), while a comprehensive set of properties that fuzzy sets should verify to preserve interpretability is postulated in (Pedrycz & Gomide, 1998, de Oliveira, 1999). However, to date, there is no well-established definition for interpretability of a fuzzy rule base. Furthermore, even with a clear definition of readability, the preservation of readability during rule extraction and adaptation requires either reducing the degrees of freedom of the neuro-fuzzy model or using a constrained learning method which penalizes all solutions which are not readable (Bersini & Bontempi, 1997). Hence, the development of learning methods to induce understandable fuzzy rules from data is an important research issue.

Several approaches have been proposed to obtain interpretable knowledge by neuro-fuzzy learning Jin et al., 1998, Nauck et al., 1996, Nauck & Kruse, 1997, Lozowski & Zurada, 2000, Marin-Blazquez et al., 2000, Setnes et al., 1998a, Chow et al., 1999a, Chow et al., 1999b.

In (de Oliveira, 1999), the learning process is constrained to respect some properties that make fuzzy rules human-understandable. Such constraint is realized by means of regularization theory: the cost function to be minimized during training is composed of the Mean Squared Error (MSE), as usual, in addition to a penalty function, which is the mathematical counterpart of the properties that fuzzy rules have to satisfy. In (Jin et al., 1998, Jin et al., 1999) the authors proposed completeness and consistency indices for a fuzzy rule base that are treated as a means of regularization by incorporating them into the cost function of an evolution algorithm to generate an interpretable fuzzy rule base. The rule base is converted into a RBF network and refined through a regularization algorithm called Adaptive Weight Sharing that guarantees interpretability and compactness of the final rules. On the overall, this approach turns out to be flexible and gives promising results in handling high-dimensional problems. However, the approaches based on regularization have the drawback of introducing more hyper-parameters – the regularizing parameters – for which no efficient method exists to determine the optimal values, except by trial-and-error. Some mathematical techniques have been proposed, as in (Bengio, 2000; Craven & Wabba, 1979), but they are computationally intensive.

In (Chow et al., 1999a, Chow et al., 1999b), the authors propose a set of transformations to project the parameter space of a neuro-fuzzy network into a subspace where a number of properties (more stringent than those adopted in (de Oliveira, 1999)) are satisfied. This projection is applied at each iteration of the learning algorithm, resulting in a high computational cost.

In other works, interpretability of fuzzy systems from the view point of membership functions is discussed. In (Setnes et al., 1998a, Setnes et al., 1998b), similar fuzzy membership functions are merged so that the resulting fuzzy partitions are interpretable, while in (Lofti et al., 1996) a constraint is imposed to the location of membership functions during learning. In (Nauck et al., 1996, Nauck & Kruse, 1997), the authors propose NEFCLASS, an approach that creates fuzzy systems from data by applying an heuristic data-driven learning algorithm that constraints the modifications of fuzzy set parameters to take the semantical properties of the underlying fuzzy system into account. However, a good interpretation of the learning result cannot always be guaranteed, especially for high-dimensional problems. Hence, in (Nauck and Kruse, 1999) the NEFCLASS algorithm is added with interactive strategies for pruning rules and variables so as to improve readability. This approach provides good results, but it results in a long interactive process that cannot extract automatically rules from data but requires the ability of the user to supervise and interpret the learning procedure in all its stages.

This paper proposes an approach to extract automatically fuzzy rules by learning from data, with the main objective to obtain human-readable fuzzy knowledge base. A new neuro-fuzzy model and its learning algorithm is developed that works in a parameter space with reduced dimensionality with respect to the space of all the free parameters of the model. The dimensionality of the new parameter space is necessary and sufficient to generate human-understandable fuzzy rules, in the sense formally defined by a set of properties. Once the new parameter space is defined, the learning algorithm performs simple gradient descent with no additional constraint in the parameter modifications. The proposed model is general enough to implement different types of fuzzy rules, since its structure depends only on the form of the rule antecedents and does not depend on the form of the rule consequents. In this work, the proposed model has been defined to implement a zero-order Takagi–Sugeno (TS) fuzzy model Sugeno & Kang, 1988, Takagi & Sugeno, 1985. However, our model can be easily adapted to embody other neuro-fuzzy architectures, such as ANFIS (Jang, 1993), Lin and Lee network (Lin & Lee, 1991), Neuro-Fuzzy Classifiers Castellano & Fanelli, 2000a, Castellano et al., 2000, and Multistage Fuzzy Neural Networks Chung & Duan, 2000, Wang, 1999.

The paper is organized as follows. Section 2 gives a set of formal properties of a Fuzzy Knowledge Base (FKB) that must be satisfied to ensure readability. Section 3 focuses on the dimensionality of the parameter space of a readable FKB. Section 4 describes the proposed neuro-fuzzy architecture and its learning algorithm for the extraction of a FKB. Section 5 reports some experimental results, which support the theoretical framework, and Section 6 ends the paper with some conclusive remarks.

Section snippets

Interpretable fuzzy knowledge base

In this section, we first describe the Fuzzy Knowledge Base (FKB) and the input space fuzzy partition adopted. Then, we formalize the properties that must be satisfied in order to assure interpretability.

Parameter space of interpretable FKB

A Fuzzy Knowledge Base is characterized by several free parameters, defining the position and the width of each fuzzy set. The set of all possible values that parameters can assume, called parameter space, is usually highly dimensional. Usually, neuro-fuzzy approaches modify fuzzy set parameters in order to adapt fuzzy rules to the available data by a learning process. If the learning process is not constrained, fuzzy sets that do not respect the properties defined in Section 2.3 may be

The neuro-fuzzy model and its learning algorithm

In this section, we propose a new neuro-fuzzy model that is able to keep valid during learning all the properties that formalize an ‘understandable’ FKB. Specifically, we develop a new neuro-fuzzy network architecture that is able to provide a fuzzy rule base composed of fuzzy sets in Ω*. To achieve this, the proposed architecture uses T as parameter space of the antecedent part of the fuzzy rules (the parameter space of the consequence part depends on the particular FIS model, as explained in

Simulation results

To demonstrate our approach to extract human-understandable fuzzy knowledge base from data, simulations on a well-known identification problem of a non-linear system (Narendra & Parthasarathy, 1990) and a real-world example from medicine (Wolberg & Mangasarian, 1990) have been carried out. The results are compared with other methods, whenever possible.

Conclusions

Comprehensibility of knowledge extracted from data is a very attractive feature for a neuro-fuzzy approach, since it establishes a bridge between the so-called symbolic reasoning paradigm, that provides explicit knowledge representation, and the sub-symbolic paradigm, where systems like neural networks discover automatically knowledge from data. However, a fuzzy knowledge base that is precise and interpretable as well can hardly be found by a completely automatic learning process. Our work aims

References (48)

  • J.C. Bezdek

    Pattern recognition with fuzzy objective function algorithms

    (1981)
  • M. Brown et al.

    Neurofuzzy adaptive modeling and control

    (1994)
  • G. Castellano et al.

    Simplifying a neuro-fuzzy model

    Neural Processing Letters

    (1996)
  • G. Castellano et al.

    Fuzzy classifiers acquired from data

  • Castellano, G., Fanelli, A.M., & Mencar, C. (2000). A new empirical risk functional for a neuro-fuzzy classifier....
  • M.-Y. Chow et al.

    Heuristic constraints enforcement for training of an knowledge extraction from a fuzzy/neural architecture—Part I: Foundation

    IEEE Trans. on Fuzzy Systems

    (1999)
  • M.-Y. Chow et al.

    Heuristic constraints enforcement for training of an knowledge extraction from a fuzzy/neural architecture—Part II: Implementation and application

    IEEE Trans. on Fuzzy Systems

    (1999)
  • Chung, F.-L., & Duan, J.-C. (2000). On multistage fuzzy neural network modeling. IEEE Trans. on Fuzzy Systems,...
  • K.J. Cios et al.

    Data mining. Methods for knowledge discovery

    (1998)
  • P. Craven et al.

    Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation

    Numerische Mathematik

    (1979)
  • D. Dubois et al.

    Fuzzy sets and systems: theory and applications

    (1980)
  • W. Duch et al.

    A new methodology of extraction, optimization and application of crisp and fuzzy logical rules

    IEEE Transactions on Neural Networks

    (2001)
  • Y.P. Huang et al.

    Simplifying fuzzy modeling by both gray relational analysis and data transformation methods

    Fuzzy Sets and Systems

    (1999)
  • Jang, J.-S.R., & Sun, C.-T., (1995). Neuro-fuzzy modeling and control. Proc. of the IEEE,...
  • Cited by (43)

    • Prototype based granular neuro-fuzzy system for regression task

      2022, Fuzzy Sets and Systems
      Citation Excerpt :

      The three groups of partitions comprise grid [27], scatter [61,58], and hierarchical [26] partitions. The most common techniques used for tuning the system parameters include gradient optimization [27,54,12], Levenberg–Marquardt optimization [21], genetic algorithms [40], differential evolution-based methods [14], particle swarm optimization [6], memetic techniques [59,60]. Recurrent neuro-fuzzy systems are less common but they have successful applications [28,36].

    • Recent advances in neuro-fuzzy system: A survey

      2018, Knowledge-Based Systems
      Citation Excerpt :

      W. Zhao [35] improved the TSK fuzzy systems and proposed a highly interpretable neuro-fuzzy system by utilizing the idea of local learning capability. A zero order TSK based neuro-fuzzy system is developed for high interpretability [26]. Interpretability is improved by generating human-understandable fuzzy rules that work in a parameter space with reduced dimensionality with respect to the space of all the free parameters of the model.

    • Designing rule-based fuzzy systems for classification in medicine

      2017, Knowledge-Based Systems
      Citation Excerpt :

      Even if it is not a usual requirement for a good system, the property of differentiability is considered here at different levels. Firstly, the eventual use of gradient descent methods for optimization requires that each MF should be differentiable [8,58], and all the inference process should be made of differentiable operators. Moreover, here, the differentiability of CAFs describing yk(x), which derives from that of MFs and inference operators, is also considered as a requisite able to ensure system generality, which means that the system should be applicable to any type of data.

    • No-reference image quality assessment using interval type 2 fuzzy sets

      2015, Applied Soft Computing Journal
      Citation Excerpt :

      Therefore, designing of a no reference image quality assessment method depends on human interpretation about the interrelation between the input features and output quality of the images and modeled as fuzzy rules using linguistic variables. Fuzzy modeling is used to transform knowledge of human experts into mathematical models [1–6]. It can be used as a tool to assist human perception about a given task by transforming human observations into mathematical understanding [30–33,77].

    View all citing articles on Scopus
    View full text