Exploration of rule-based knowledge bases: A knowledge engineer’s support
Introduction
Data exploration helps us understand the investigated reality in a faster and better way. The data to be explored are domain knowledge bases with rules representation i.e. if... then chains which make it possible to conveniently describe a knowledge domain as relations connecting premises and conclusions arising out from the observations of those premises. Such rules can be generated automatically from data with one of the available rule induction methods or can be directly fed by an expert (or experts). The main advantages of this knowledge representation are easy interpretation and an ability to record the knowledge as if... then chains. In addition, such representation is free of limitations as to the type of data as it works well with both numeric and categorical data. For medicine, economics or any other subject, rule representation provides a domain expert with means of an accessible presentation of the gathered knowledge. A knowledge engineer just needs to save this knowledge using available IT tools. As a result, both simple and complex decision support systems proliferate. Rule-based knowledge representation requires an optimum selection of methods for its analysis so as to make new knowledge explored from it useful for a knowledge engineer as well as for an end user of a decision support system. A knowledge engineer’s perception of a rule-based knowledge base is aimed at an easy management of rules (searching within the rules, discovering frequent and rare rules or their conditions) and as far as an end user is concerned, the efficiency of inference which an end user is involved in becomes a priority when a given system is used. In each of the above cases, when there are too many rules in a knowledge base, their effective searching might be impaired. What is more, when there are too many rules in a given knowledge base, it is almost impossible to find any relationships or similarities between them as well as rare rules. There is a promising prospect of having a tool which could describe a knowledge base by giving information about the number of groups of rules with similar premises, the size of such groups, their representatives as well as the number of rules which contain premises irrelevant to any rule or group of rules and thus they are classified as so-called rare rules and in consequence are not clustered with others. Within the field of data mining, this issue is related to the outlier detection approach [19]. The knowledge on rare rules allows for a wider exploration of the field in a previously unknown area. Effective exploration of rule-based knowledge bases can be carried out through the creation of if... then clusters of rules and their representatives using hierarchical methods. This is a new and unique approach to the managing of domain knowledge bases as it facilitates the creation of cohesive and well-described clusters and detection of rare rules (those dissimilar to other rules) while concurrently providing visualization of a knowledge base. This issue has recently grown in importance. Almost in every aspect of everyday life, we need tools that allow us to swiftly and efficiently manage huge datasets and this does not only apply to information search but primarily it enables a user to generalize and visualize data for the needs of data arrangements. Development of computerization has been accompanied by development of decision support systems based on domain knowledge in almost every aspect of life, from industry to economics and medicine. With the passing of time, knowledge bases set up within such systems have contained more and more rules (while MYCIN contained a few hundred rules, modern systems can contain a few hundred thousand of them). A domain expert or knowledge engineer is simply unable to efficiently manage such a massive dataset. Hence the necessity to create tools that help to explore modern domain knowledge bases, which are often dispersed and contain data with a complex structure.
Clustering is one of the methods which allow huge datasets to be managed effectively. Depending on the context and the clustering method, the results may vary substantially. Among the available clustering techniques, non-hierarchical (partitional) and hierarchical methods can be used. The subject of clustering is another important factor. There are numerous available papers which discuss the clustering and managing of huge datasets (text documents, images and numerical data). The subject of research in this paper is the representation of specific data such as rule-based knowledge bases.
Even though there are numerous papers which present rule representation as decision tables and association rules and provide the methods and tools for their effective management (especially when there is a big number of rules in such sets), so far we have not found any papers which present available exploration tools which can deal with big sets of data for production rules. This has become our main motivation for research on exploration methods and tools for rule-based knowledge bases. Having analyzed numerous clustering methods (partitional, density-based and hierarchical), we decided to focus our efforts on the hierarchical ones as we found them most promising. As a result, we propose a modification of the classic clustering algorithm. This proposed new technique clusters rules on the grounds of their premises being similar and then labels each cluster with a representative. The innovation here involves clustering rule premises only using the maximum similarity criterion (not the distance criterion used by most existing methods and tools), various similarity measures and intra-cluster methods and, in particular, looking for an optimal number of clusters. Besides the knowledge on the structure of the generated clusters (i.e. the number of clusters and the composition of each cluster), in the proposed approach, the authors have used a new approach, based on descriptive representation of clusters in addition to their visualization. This new approach allows for the designation of representatives for the created clusters with the use of the generalization approach, the specification approach or an approach which combines both. This would undoubtedly provide massive support for a domain expert or knowledge engineer, who can improve their knowledge of the domain described in a knowledge base (as the number of rule clusters with similar conditions, the size of those clusters, the number of rare rules, which could not be clustered) and the cluster representatives would make it possible for a domain expert or knowledge engineer to find desired rules in order to update them or explore the domain in a previously unexplored part.
The authors claim that the clustering of similar premises and generating cluster representatives of these clusters enables the optimization of searching of rules to be activated in the inference process. An excessive number of rules in relation to given facts becomes a bottleneck in the data-driven inference and this phenomenon is investigated in this paper by the authors. At the same time, a generated cluster of rules with its representative supports a knowledge engineer by managing the knowledge recorded in a knowledge base. We believe that methods and tools for managing knowledge bases improve the effectiveness of every decision support system. Since so much depends on the quality of clusters, we wanted to know the influence of clustering parameters on the final clustering results. For this purpose, in the experiments we analyze the impact of clustering parameters and cluster representation methods on the effectiveness of the investigated knowledge bases. They have been explored with the use of four cluster representative designation methods, four inter-cluster methods and nine intra-cluster similarity measures. It turns out that each of the aforementioned factors substantially influences the size of the resulting clusters, the number of rare rules and the frequency of overgeneral or overspecific representatives of rule clusters.
The structure of the paper is as follows: In Section 2, the notions of knowledge base and inference processes as the engine of every decision support system are presented. Additionally, Section 2 presents the format of a knowledge base as proposed in this paper. Section 3 contains the description of the proposed approach to rule clustering together with the suggested clustering algorithm and the methods of designating cluster representatives (presented as pseudocode). The most important inter-cluster methods and intra-cluster similarity measures are also presented in brief in this paper. In Section 4, rare rules detection methods are described along with the algorithm used for this purpose. This section also contains a description of the CluVis tool used for clustering, designation of representatives and visualization of the resulting structure of clusters of rules. The results of the experiments are presented in Section 5. This paper concludes with a summary of the results and a description of research to be conducted in the near future.
Rules have been a central form of knowledge representation since the earliest development of intelligent systems. In many past and current rule-based systems, domain experts manually organize rules through labeling and assigning to different groups based on their semantics. For example, the Cyc project has an enormous number of rules, often referred to as a sea of assertions, in its comprehensive knowledge base of common sense knowledge for general-purpose reasoning [14]. To organize such a knowledge base, domain experts divide these rules into smaller parts. Although this manual categorization of rules is semantically precise, it is a laborious task. When the data size is huge, the speed of them cannot be ensured. There is a need for developing automated methods to manage rule bases when the number of rules becomes large and relationships among rules are too complicated for developers and domain experts to comprehend [8]. Several efforts have been deployed to tackle the problem of summarizing and pruning the huge number of rules. A distinction should be made between rule-related work involving clustering and research on managing the rules using other techniques. By other techniques we mean methods of generalization (shortening and joining) of rules or their filtering. It is necessary to choose a quality measure which controls the process of shortening or joining rules, for example a maximum acceptable decrease of the rule quality measure after shortening or joining, whether the set of joined rules should be joined later, and whether to create a ranking for special rules before they are joined. This ranking determines the order of the joining process. Only decision rules which show sufficient similarity to the selected basic decision rule are taken into account [23]. In [15] the authors propose to realize the process of rule shortening by removing elementary conditions using heuristic strategies (for example hill climbing) or exhaustive searching. Rules are shortened until the quality of the shortened rule drops below a certain fixed threshold. Research into the use of clustering for such representation of domain knowledge as rules resulted in the discovery of many interesting approaches. However, all of them were related to only two types of rules: fuzzy and association rules. These two types of rules are very specific and not always usable in practical applications (in the case of fuzzy rules we need to have continuous data as input data, while in the case of association rules, only nominal data are expected). There are many cases in which production rules are needed. The advantage of representation of knowledge using production rules lies in the fact that we may use any type of attributes in rules and usually such rules are short and easy to interpret for any kind of the system’s user (end user, knowledge engineer or domain expert). There are many research results related to the clustering of fuzzy rules. The construction of a rule base from fuzzy clusters provides an initial approximation for the data which can be used as a basis for further improvements. An interesting approach is proposed in [22]. The authors propose a rule clustering algorithm which allows the automatic organization of the sets of fuzzy rules of one monolithic fuzzy system in the hierarchical structure, with various sub-models. They believe that the readability of fuzzy models is related to their organizational structure and the corresponding rule base thus they use clustering to build the structure of the system. The objective of the fuzzy clustering partition is the separation of a set of fuzzy rules into a given number of clusters, according to a similarity criterion, finding the optimal centers of clusters and the partition matrix. An approach based on fuzzy clustering (a Fuzzy Clustering of Fuzzy Rules Algorithm (FCFRA)) which allows the automatic organization of the sets of fuzzy rules of one fuzzy system in a hierarchical prioritized structure, is presented in [21]. Fuzzy clustering seems to be a very appealing method for learning fuzzy rules since there is a close and canonical connection between fuzzy clusters and fuzzy rules. The idea of deriving fuzzy classification rules from data can be formulated as follows: the training data set is divided into homogeneous group and a fuzzy rule is associated with each group. The proposed FCFRA algorithm has been successfully applied to the modeling of a nonlinear small scale Pilot Plant Reactor. In paper [30], the authors propose to use the D-AFC(c)-algorithm as a direct possibilistic clustering algorithm, based on the construction of an allotment among an a priori given number c of partially separate fuzzy clusters.
The second approach for clustering the rules is the one related to the clustering association rules. When mining association rules we may find hundreds or thousands of rules corresponding to specific attribute values. In [9] the authors propose a method for grouping and summarizing large sets of association rules according to the items contained in each rule. Hierarchical clustering is used to partition the initial rule set into thematically coherent subsets. This enables the summarization of the rule set by adequately choosing a representative rule for each subset, and helps in the interactive exploration of the rule model by the user. In [20] the authors propose a method to analyze links between binary attributes in a large sparse data set. Initially the variables are clustered to obtain homogeneous clusters of attributes. Association rules are then mined in each cluster. The study shows that the combined use of association rules and classification methods is more relevant. Actually this approach brings about an important decrease in the number of rules produced.
To the best of our knowledge it is difficult to find research results for clustering production rules. Although such types of rules are very simple to build, it brings about problems related to creating coherent and well separated groups if different attributes are used in their conditional and decisional part. As they usually create cause and effect chains, it is quite difficult to partition them properly.
The approach proposed in this paper is similar to the ones recently published within the domain of decision making which involves a large number of experts, especially when building a consensus [11]. Both similarity measures and clustering are used to detect the most influential experts. The closeness of experts’ preferences is computed using a similarity function. When there is a multitude of experts, they can be divided into subgroups in such a way in that experts are placed in the same cluster when they are more similar to each other if compared to the ones assigned to different cluster(s). When using the agglomerative hierarchical clustering algorithm it is possible to find structurally equivalent experts and then, applying the centrality concept in determining a group (or network) leader, to drive advice in the feedback process. An interesting approach is presented in [32] where the authors propose a novel method to select the most influential rules in a fuzzy rule-based model. In such a model, especially when type-2 fuzzy rules are considered, using the back-propagation method will almost certainly suffer from rule redundancy. Therefore, it is necessary to select the most important fuzzy rules and remove the redundant ones from the generated rule base. According to the proposed idea a rule significance index is assigned to each rule, then rules are ranked and the influential rules are selected (based on the aforementioned index).
Section snippets
Knowledge base and rule representation
One of the most common and popular methods of domain knowledge representation is knowledge representation as rules i.e. if... then chains, for example: if a premise then a conclusion. Sometimes in the literature the notions of premise and conclusion are replaced with condition and decision [6]. When defining the way of inference with conditions that must be met to take a decision, domain experts convey their knowledge onto rules recorded in a knowledge base. The rules are activated when their
Clustering of rules
Too many rules in the knowledge base can negatively affect the effectiveness of management of rules. One of ways of managing the rules is to cluster them into groups and to describe the groups by their representatives. The notion of cluster analysis indicates that objects in the analyzed dimension are split into clusters which collect the objects most similar to one another and the resulting clusters are as different as possible [12]. It guarantees an optimum internal cohesion and external
Exploration of clusters
This section presents a description of the approach proposed by the authors. It clusters rules which show a similarity in their premise part and then create representatives of the clusters. By exploration of rule-based knowledge bases, the authors mean both utilizing descriptive information generated from the analysis of the rule clusters structure obtained in the course of clustering with the AHC algorithm i.e. information on a biggest cluster and alternatively on smallest clusters of rare
Experiments
The goal of the experiments was the exploration of rule-based knowledge bases with the structure of clusters. In the course of the experiments, the authors have analyzed the influence of clustering parameters and representative designation methods on the effectiveness of the resulting rule clusters. The authors propose that rules should be grouped into clusters with common rule premises and representatives should designated (using various methods) for these groups. The resulting structure is of
Conclusions
The subject of the analysis in this paper are cluster-structured rule-based knowledge bases. The authors propose clustering of similar rules as an exploration method for big knowledge bases with rule-based representation. A knowledge engineer has an insight into clusters of similar rules and their representatives and this is constitutes a ready-for-use tool that helps manage knowledge effectively and possible improve it in the future. It has to be emphasized that the presented idea of
References (33)
Hierarchical clustering for thematic browsing and summarization of large sets of association rules
Proceedings of the 2004 SIAM International Conference on Data Mining
(2004)- M. Lichman, UCI Machine Learning Repository,...
- et al.
Rough sets in decision making
Encyclopedia of Complexity and Systems Science
(2009) - et al.
How many clusters? An information-theoretic perspective
Neural Comput.
(2004) The comparison between forward and backward chaining
Int. J. Mach. Learn. Comput.
(2015)- et al.
A new version of rough set exploration system
RSCTC
(2002) - et al.
Similarity measures for categorical data: a comparative evaluation
SIAM
(2008) A comparison of the performance of clustering methods using spectral approach
Data Analysis Methods and Its Applications
(2012)- et al.
ICA: an incremental clustering algorithm based on OPTICS
Wireless Pers. Commun.
(2015) Rule induction from rough approximations
Three discretization methods for rule induction
Int. J. Intell. Syst.
Clustering rule bases using ontology-based similarity measures
Web Semantics
A decision criterion for the optimal number of clusters in hierarchical clustering
J. Global Optim.
Preference similarity network structural equivalence clustering based consensus group decision making model
Appl. Soft Comput.
An introduction to the syntax and content of Cyc
Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering
Cited by (26)
A semantic data-driven knowledge base construction method to assist designers in design inspiration based on traditional motifs
2023, Advanced Engineering InformaticsDetecting outliers in rule-based knowledge bases using Self-Organizing Map and Local Outlier Factor algorithms
2023, Procedia Computer ScienceOutliers in Covid 19 data based on Rule representation - The analysis of LOF algorithm
2021, Procedia Computer ScienceInfluence of outliers in MOBA games winner prediction
2021, Procedia Computer ScienceLoRMIkA: Local rule-based model interpretability with k-optimal associations
2020, Information SciencesCitation Excerpt :Further, Zhang et al. [46] considered multiple Gaussian models to represent the distribution of data where each Gaussian model reflects some local characteristics related to the dataset. However, especially outside of the field of deep learning, recently many researchers have shifted from linear models as explainers to rule-based explanations [30,28], as they arguably provide more precise explanations to the end users [35] and are more interpretable [21] compared with others. Puri et al. [31] introduced a global rule-based explainer.