Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules
Introduction
The problem of association rule mining was presented in 1993 and 1994 [2], [1], and after that it has been used as analysis technique by researchers. An Association rule is an expression in the form of X ⇒ Y. In this expression, X and Y are sets of items or attributes-value pairs. The left (X) and right hands (Y) of this expression are antecedent and consequent, respectively. Each of them contains a number of attributes, and these attributes cannot be null. Attributes of association rules can be placed in two domains: the discrete and the continuous domain [4]. Furthermore, association rules in the unsupervised domain can be classified into two groups: categorical association rules versus quantitative association rules [10], [51] and frequent association rules versus infrequent/rare rules [50]. There is also another type of association rules, called class association rules. They are rules with only one attribute in the consequent. Class association rules are used by associative classifiers (Fig. 1). Rules of this category are frequent and strong (The strong rules are rules with high value of confidence) [49].
Apriori algorithm [3] was an outstanding job in identifying association rules, but gradually its drawbacks became obvious. In theory, Apriori algorithm can guarantee results with high accuracy; nevertheless, its runtime will be noticeable in large datasets. To fix this drawback, a number of improved algorithms have been proposed. FP-growth [7] and Partition [8] are two superior examples of these algorithms. They significantly improve the performance, but applying them in some cases is impossible [9]. Population-based algorithms are the recent generation of association rule mining methods [52], [53]. However, they face two major challenges for finding QARs: Determining the appropriate ranges for quantitative attributes and finding interesting rules. We tried to solve these problems in Rare-PEARs.
Various criteria (support, confidence, lift, etc.) are used by the different QARs algorithms. Most of the non-population based algorithms have a common property; they use support, confidence or both of them as the main criteria to determine the quality of rules [1], [2], [3], [14]. Ant colony [11] and PSO [12], [13] based algorithms are some examples of these type of algorithms. Generally, in these algorithms, the user specifies a minimal acceptable value for support. Rules are generated if their support value is more than that threshold. However, support as the sole criterion is not enough. It cannot show the degree of reliability for a rule, and hence, confidence is often used as a measure to calculate the amount of reliability. There are some other measures as well. Lift [37] and CF [38] (certainly factor) are two of the most widely used. They will be described in next section.
By growing the number of criteria, Multi Objective Evolutionary Algorithms (MOEAs) have been introduced. MOEAs are interesting methods for mining QARs. However, rapid convergence damages the efficiency of MOEAs. Some researches, such as [10] solve this problem by restarting. They start again when the difference between two consecutive populations is less than α percent. Sufficient exploration needs to choose α percent carefully. If this value is high, current generations may not have the chance to produce elite chromosomes. Otherwise, it will lead to fast convergence. Finding the optimal interval of each rule’s attribute is another challenge in QARs. We solved these two problems by our proposed method (Rare-PEARs). We split mining QARs problem into N − 1 sub-problems (for each sub-problem a sub-process is considered to find rules with a specific size) and propose new definition of Non-Dominated rules. We called this new definition of Non-Dominated rules as Non-Dominated II (this definition is explained in the last part of this section). Rare-PEARs bring variation in the population through the product item sets with different size for each sub-problem.
Association rules on continuous attributes are called quantitative association rules (QARs). QARs are represented as x ⇒ y, in which x and y are item sets and x ∩ y = ∅. QARs include additional data (intervals of attributes) so they are shown as x[a, b] ⇒ y[c, d]. In this expression, a, b, c, d are real values and also x ∩ y = ∅. QARs are simultaneously produced by N − 1 sub-processes in Rare-PEARs. The aim of each sub-process is finding rules with specific size (these sizes are 2 to N in sub-problems, respectively). This significantly reduces the runtime (we will describe it in Sections 3 Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule, 4 Experimental result). Finding semi-optimal intervals for attributes of rules with similar form is the first objective in Rare PEARs. Note that similar form rules are rules with the same antecedent and consequent parts, but their interval values for the attribute are different. An example would be: B[0.1, 0.2] → A[0.2, 0.4]C[−0.4, −0.2], B[0.15, 0.2] → A[0.1, 0.25]C[−0.1, 0.1] and B[0.05, 0.2] → A[−0.2, 0.1]C[−0.3, −0.1]. These are three rules with similar form. Finally, Non-Dominated rules are found between found rules in step one. In Section 3.1, we depict Non-Dominated II in Rare-PEARs.
The goal of this research is to discover rare and/or interesting association rules (the rules with different form will be explained more in Section 3.1). We have developed our approach with three motivations:
- •
Ability of handling datasets with quantitative values. Previous researches often work on binary or discretized values [5], [13], [58], [59], [60].
- •
Produce initial population by the aid of evolutionary random permutation for all of sub-problems. Each sub-process starts with individuals with a specific length. This strategy considers an equal survival chance for individuals with different length. However, previous researches [10], [15], [19], [21], [24], [31], [32] generate their initial population randomly (rules with different lengths), and it is possible that rules with a specific length do not appear in the initial population. Hence, rare or interesting rules with bigger length may not be produced by the evolutionary operators. Consequently, fewer or no rules with bigger lengths participate in the finding Non-Dominated rules competition.
- •
Determining suitable range has a direct effect on discovering interesting and/or rare rules. We accomplish this in our approach by introducing new definition of Non-Dominated rules (Contrary to previous researches [10], [14], [15], [21], [24], [25], [32]). In this new definition, we applied multi-objective concept to find the most interesting state of a rule.
This paper is arranged as follows. In Section 2, related works, including previous studies of association rule mining has been surveyed. This section focuses on evolutionary algorithms. At the end of this section time complexity of association rule mining is studied. In Section 3, we describe our methodology (Rare-PEARs) in detail. This section includes chromosome representation, population initialization and evolutionary operators. Section 4 shows a detailed comparison and presents a comprehensive analysis of the results. Finally, Section 5 concludes the paper.
Section snippets
Related work
In this section, we study many of research papers in association rule mining area and some of their real-world applications. First, we study two non-population based researches for association rule mining. An information theory based approach was presented in [20]. Ke et al. [20] applied the numerical discrimination for discovery of quantitative association rules. In [20], first mutual information was analyzed between the attributes of a quantitative database, and then normalization (mutual
Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule
In this section, Rare-PEARs is explained. Ultimate rules of Rare-PEARs have different appearances, and their attributes have semi-optimal intervals. Rare-PEARs rules have a good balance between reliability, coverage, accuracy, and interestingness. Before describing Rare-PEARs, we introduce the three stages of this method.
- •
Generate the initial population of sub-processes. This is done by random permutation.
- •
Produce new rules and find semi-optimal intervals for their attributes during the execution
Experimental result
We carried out several experiments to analyze quality our proposal. This section has been organized as follows:
- •
Quality Criteria of QARs.
- •
Dataset description.
- •
Compared algorithms description.
- •
Comparison with the mono-objective (four algorithms), multi-object (four algorithms) and classical algorithms (two algorithms).
- •
Statistical test on results of algorithms.
- •
Time complexity and scalability of our approach.
Our results are average of 10 runs on each dataset. Information of dataset has been presented
Conclusion
In this paper, we have proposed Rare-PEARs. It is a new multi-objective evolutionary algorithm that intelligently produces the initial population in each of different sub-problems. Initial population is generated by a random permutation algorithm. Each sub-process is responsible for finding rules with a fixed size. Nevertheless, evaluation operators of Rare-PEARs produce rules with sizes different than specific sizes of rules of each sub-process. It leads to high diversity in the initial
References (61)
- et al.
QAR-CIP-NSGA-II: anew multi-objective evolutionary algorithm to mine quantitative association rules
Inf. Sci.
(2014) - et al.
Comput. Math. Appl.
(2007) - et al.
Association rule mining using binary particle swarm optimization
Eng. Appl. Artif. Intell.
(2013) - et al.
Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support
Expert Syst. Appl.
(2009) - et al.
Multi-objective rule mining using genetic algorithms
Inf. Sci.
(2004) - et al.
Modenar: multi-objective differential evolution algorithm for mining numeric association rules
Appl. Soft Comput.
(2008) - et al.
Mining fuzzy association rules from questionnaire data
Knowl.-Based Syst.
(2009) - et al.
A multi-objective genetic algorithm approach to rule mining for affective product design
Expert Syst. Appl.
(2012) - et al.
Grammar-based multi-objective algorithms for mining association rules
Data Knowl. Eng.
(2013) - et al.
Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence
Expert Syst. Appl.
(2011)
Web usage mining with evolutionary extraction of temporal fuzzy association rules
Knowl.-Based Syst.
Multi-objective PSO algorithm for mining numerical association rules without a priori discretization
Expert Syst. Appl.
A novel evolutionary method to search interesting association rules by keywords
Expert Syst. Appl.
A model of inexact reasoning in medicine
Math. Biosci.
Discovering gene association networks by multi-objective evolutionary quantitative association rules
J. Comput. Syst. Sci.
Fast algorithms for mining association rules
Mining association rules between sets of items in large databases
News Lett. ACM SIGMOD
Pincer-search: an efficient algorithm for discovering the maximum frequent set
IEEE Trans. Knowl. Data Eng.
Mining quantitative association rules in large relational tables
SIGMOD Rec.
KEEL: A software tool to assess evolutionary algorithms to data mining problems
Soft. Comput.
Mining frequent patterns without candidate generation
An efficient algorithm for mining association rules in large databases
Data Mining: Concepts and Techniques
Application of particle swarm optimization to association rule mining
Appl. Soft. Comput.
Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules
Soft. Comput.
A multi-objective genetic-fuzzy mining algorithm
A statistical theory for quantitative association rules
Journal of Intelligent Information Systems
Mining numerical association rules via multi-objective genetic algorithms
Inf. Sci.
An information-theoretic approach to quantitative association rule mining
Knowl. Inf. Syst.
Cited by (24)
A survey of evolutionary computation for association rule mining
2020, Information SciencesCitation Excerpt :Crowding distance used by NSGA-II was replaced to sort the solutions for each Pareto-front. Almasi and Abadeh [17] decomposed ARM into different sub-problems in order to extract RARs from quantitative data. Each problem is solved using a different initial population.
CARs-Lands: An associative classifier for large-scale datasets
2020, Pattern RecognitionCitation Excerpt :Note that replacement is performed randomly, if the supports are also the same. Since rule of CAR-lands are produced based on Rare-PEARs [42], at the beginning of this section, we briefly review it. Rare-PEARs [42] is an evolutionary association rule mining method which produces rare and reliable rules.
Applying mutual information for discretization to support the discovery of rare-unusual association rule in cerebrovascular examination dataset
2019, Expert Systems with ApplicationsMRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems
2018, Knowledge-Based SystemsCitation Excerpt :This kind of MOEAs addresses a multiobjective problem as N subproblems optimized at the same time using an EA. A MOEA to discover rare and interesting QAR was presented in [35]. Other EA-based approaches, such as niching genetic algorithms, have been applied to discover QAR [36].
A novel Multiple Objective Symbiotic Organisms Search (MOSOS) for time-cost-labor utilization tradeoff problem
2016, Knowledge-Based Systems