Alternative rule induction methods based on incremental object using rough set theory

doi:10.1016/j.asoc.2012.08.042

Applied Soft Computing

Volume 13, Issue 1, January 2013, Pages 372-389

https://doi.org/10.1016/j.asoc.2012.08.042 Get rights and content

Abstract

The rough set (RS) theory can be seen as a new mathematical approach to vagueness and is capable of discovering important facts hidden in that data. However, traditional rough set approach ignores that the desired reducts are not necessarily unique since several reducts could include the same value of the strength index. In addition, the current RS algorithms have the ability to generate a set of classification rules efficiently, but they cannot generate rules incrementally when new objects are given. Numerous studies of incremental approaches are not capable to deal with the problems of large database. Therefore, an incremental rule-extraction algorithm is proposed to solve these issues in this study. Using this algorithm, when a new object is added up to an information system, it is unnecessary to re-compute rule sets from the very beginning, which can quickly generate the complete but not repetitive rules. In the case study, the results show that the incremental issues of new data add-in are resolved and a huge computation time is saved.

Graphical abstract

Highlights

► When a new data is added up, the proposed AREA can update rule sets by modifying partial original rule sets. ► It is unnecessary to re-compute rule sets from the very beginning by employing the AREA. Thus the computation time is decreased. ► The AREA can select more than one of the maximum SI. Therefore, the non-complete rules problem is solved. ► Because the AREA may generate repetitive rules, the algorithm is developed to exclude these repetitive rules in the solution search procedure.

Introduction

The rough set theory, proposed by Pawlak in 1982 [1] can be seen as a new mathematical approach to vagueness [2]. The rough set method does not require additional information about the data; it can work with imprecise values or uncertain data, is capable of discovering important facts hidden in that data, and has the capacity to express them in natural language [3]. In addition, the RS theory is useful today, while a bound of knowledge is surrounded, typically knowledge can be represented in the form of a decision table with rows containing objects and columns containing criteria or attributes. A decision table can be used to derive decision rules through an inductive process. These rules can then be generalized for use in future decision support [4]. The usefulness and effectiveness of the RS approach is shown in data mining, knowledge discovery, pattern recognition, decision analysis, and so on [5], [6].

To date, the knowledge discovery literature [7], [8], [9] indicates that using RS induct attributes often generates too many rules without focus. These rough set approaches cannot guarantee that the classification of a decision table is credible [10]. Therefore, Tseng [11] proposed the REA (rule-extraction algorithm) to solve the problem. The rule extraction algorithm (REA) was presented for discovering preference-based rules according to the reducts which contain the maximum of strength index (SI) in the same case. However, the desired reducts are not necessarily unique since several reducts could include the same value of SI. Therefore, an alternative rule can be defined as the rule which holds identical preference to the original decision rule and may be more attractive to a decision-maker than the original one. Thus, Tseng et al. [10] proposed AREA (alternative rule extraction algorithm) to solve the non-complete rules problem.

Moreover, the current algorithms of rough set are capable to generate a set of classification rules efficiently, but they cannot generate rules incrementally when new objects are given. However, the non-incremental approach becomes very costly or even intractable as the number of attributes grows. Alternatively, one can also apply an incremental learning scheme. The essence of incremental learning is to allow the learning process to take place in a continuous and progressive manner rather than a one-shot experience [12]. In practical application, the recorders of database are often increased dynamically [13]. If new object arrival, it have to compute the whole database again. This procession is due to consume huge computation time and memory space [13].

Most of traditional incremental technique related literatures [14], [15], [16], [17] are not capable to deal with the problems of large database. Moreover, to dealing with the new added data set, the traditional methods by re-computing the reduction algorithm and rule-extraction algorithm are often applied [18]. Therefore, Fan et al. [18] proposed an incremental rule-extraction algorithm based on the REA to solve the aforementioned problem. However, alternative rules which are as preferred as the original desired rules might exist since the maximum of SI is not unique. The REA may lead to non-complete rules. Therefore, in this study, an incremental rule-extraction algorithm is proposed based on the AREA [10] to solve the aforementioned problem, named incremental AREA (IAREA). The proposed approach is able to exclude the repetitive rules that maybe generating by original AREA and to avoid the problem of redundant rules.

In summary, considering the insufficient studies in previous literature, the proposed IAREA is able to generate concise and complete alternative decision rules as preferred as the original desired rules. In addition, the proposed incremental structure is capable to address dynamic database problems related to rough set-based rule induction. The IAREA is capable to deal with incremental data solely instead of re-computing the entire dataset when the database is updated. As a result, exceptional computing time and memory space are saved. A case study of CRM is applied to demonstrate validity and efficiency of the proposed method. Since this subject is rarely considered in previous literature, consequently this subject will open a new venue for CRM.

The study is organized as follows: in Section 2, the related literatures are reviewed, while the proposed approach is developed in Section 3. In Section 4, a case study of CRM to demonstrate feasibility of the proposed approach is depicted. Finally, Section 5 concludes this research.

Section snippets

Literature review

In this section, the literatures related to the rough set based rule induction, and the related incremental approaches are surveyed.

Solution approach

In this study, an Object Incremental Methodology is proposed. This algorithm is based on the reduct generation procedure from aforementioned Pawlak [25] and the alternative rule extraction algorithm from Tseng et al. [10]. The proposed approach updates rule sets by partly modifying original rule sets.

Case study

A cell phone vendor company HTF Inc. develops a strategic plan to promote its products. To apply the proposed approach, three stages are implemented:

Stage 1.
Through an on-line consumer preference survey system, HTF acquires attributes of customers’ purchasing preference according to types of products.
Stage 2.
Apply the AREA to these collected data to induct concise decision rules for the CRM manager;
Stage 3.
According to the newly added customer's preference profile, the IAREA is applied to derive the updated decision

Discussion

Finally, three incremental objects are tested: the new object in Table 7 – object change causes that the original rules cannot cover all instances; the new object in Table 10 – object change causes a contradiction in original rules and the new object in Table 13 – object change does not cause any contradiction, the original AREA rules cannot dominate the new object set but it can be dominated by original reducts. Table 17 illustrates the reductive percentage of run time by applying the proposed

Conclusion

In this study, the related literatures of traditional RS approaches and incremental technique were reviewed and then the drawbacks of them in previous literatures were presented. The incremental rule-extraction algorithm was proposed based on the AREA to solve the aforementioned drawbacks that new data added up to database, necessary re-compute the whole database again. The case study to demonstrate the feasibility of the proposed approach was presented. This study aims to facilitate the