Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules

doi:10.1016/j.knosys.2015.07.016

Knowledge-Based Systems

Volume 89, November 2015, Pages 366-384

https://doi.org/10.1016/j.knosys.2015.07.016 Get rights and content

Abstract

Since finding quantitative association rules (QARs) is an NP-hard problem, evolutionary methods are suitable solutions for discovery QARs. Nevertheless, most of the previous evolutionary methods to discover association rules only consider frequent dependency among items in datasets. They do not pay specific attention to interestingness and non-redundancy as two critical objectives. In this paper, the proposed algorithm (Rare-PEARs) gives a chance to each rule with different length and appearance (antecedent and consequent parts of rules) to be created. Therefore, various interesting, rare or interesting and rare rules can be found. Some of these rules might be uninteresting (those that contain frequent item sets). However, we try to avoid them by Rare-PEARs. To accomplish this goal, our method decomposes the process of association rule mining into N − 1 sub-problems (N is the number of attributes, and each sub-problem is handled by an independent sub-process during Rare-PEARs execution). Each sub-process starts individually with a different initial population. It then explores the search space of its corresponding sub-problem to find rules with semi-optimal intervals for each of the attributes. This process is done by a new definition of Non-Dominated concept. Rare-PEARs uses this definition to find semi-optimal intervals for attributes during the execution of each sub-process. Finally, Rare-PEARs collects QARs from sub-processes and determines the ultimate Non-Dominated rules based on the interestingness and reliability measures. Rare-PEARs tries to maximize three objectives: interestingness, accuracy and reliability while providing vast coverage on the input dataset. We compared Rare-PEARs with ten algorithms (multi-objective, mono-objective and classical algorithms of association rule mining) over several real-world datasets. The results demonstrate high efficiency of Rare-PEARs.

Introduction

The problem of association rule mining was presented in 1993 and 1994 [2], [1], and after that it has been used as analysis technique by researchers. An Association rule is an expression in the form of X ⇒ Y. In this expression, X and Y are sets of items or attributes-value pairs. The left (X) and right hands (Y) of this expression are antecedent and consequent, respectively. Each of them contains a number of attributes, and these attributes cannot be null. Attributes of association rules can be placed in two domains: the discrete and the continuous domain [4]. Furthermore, association rules in the unsupervised domain can be classified into two groups: categorical association rules versus quantitative association rules [10], [51] and frequent association rules versus infrequent/rare rules [50]. There is also another type of association rules, called class association rules. They are rules with only one attribute in the consequent. Class association rules are used by associative classifiers (Fig. 1). Rules of this category are frequent and strong (The strong rules are rules with high value of confidence) [49].

Apriori algorithm [3] was an outstanding job in identifying association rules, but gradually its drawbacks became obvious. In theory, Apriori algorithm can guarantee results with high accuracy; nevertheless, its runtime will be noticeable in large datasets. To fix this drawback, a number of improved algorithms have been proposed. FP-growth [7] and Partition [8] are two superior examples of these algorithms. They significantly improve the performance, but applying them in some cases is impossible [9]. Population-based algorithms are the recent generation of association rule mining methods [52], [53]. However, they face two major challenges for finding QARs: Determining the appropriate ranges for quantitative attributes and finding interesting rules. We tried to solve these problems in Rare-PEARs.

Various criteria (support, confidence, lift, etc.) are used by the different QARs algorithms. Most of the non-population based algorithms have a common property; they use support, confidence or both of them as the main criteria to determine the quality of rules [1], [2], [3], [14]. Ant colony [11] and PSO [12], [13] based algorithms are some examples of these type of algorithms. Generally, in these algorithms, the user specifies a minimal acceptable value for support. Rules are generated if their support value is more than that threshold. However, support as the sole criterion is not enough. It cannot show the degree of reliability for a rule, and hence, confidence is often used as a measure to calculate the amount of reliability. There are some other measures as well. Lift [37] and CF [38] (certainly factor) are two of the most widely used. They will be described in next section.

By growing the number of criteria, Multi Objective Evolutionary Algorithms (MOEAs) have been introduced. MOEAs are interesting methods for mining QARs. However, rapid convergence damages the efficiency of MOEAs. Some researches, such as [10] solve this problem by restarting. They start again when the difference between two consecutive populations is less than α percent. Sufficient exploration needs to choose α percent carefully. If this value is high, current generations may not have the chance to produce elite chromosomes. Otherwise, it will lead to fast convergence. Finding the optimal interval of each rule’s attribute is another challenge in QARs. We solved these two problems by our proposed method (Rare-PEARs). We split mining QARs problem into N − 1 sub-problems (for each sub-problem a sub-process is considered to find rules with a specific size) and propose new definition of Non-Dominated rules. We called this new definition of Non-Dominated rules as Non-Dominated II (this definition is explained in the last part of this section). Rare-PEARs bring variation in the population through the product item sets with different size for each sub-problem.

Association rules on continuous attributes are called quantitative association rules (QARs). QARs are represented as x ⇒ y, in which x and y are item sets and x ∩ y = ∅. QARs include additional data (intervals of attributes) so they are shown as x[a, b] ⇒ y[c, d]. In this expression, a, b, c, d are real values and also x ∩ y = ∅. QARs are simultaneously produced by N − 1 sub-processes in Rare-PEARs. The aim of each sub-process is finding rules with specific size (these sizes are 2 to N in sub-problems, respectively). This significantly reduces the runtime (we will describe it in Sections 3 Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule, 4 Experimental result). Finding semi-optimal intervals for attributes of rules with similar form is the first objective in Rare PEARs. Note that similar form rules are rules with the same antecedent and consequent parts, but their interval values for the attribute are different. An example would be: B[0.1, 0.2] → A[0.2, 0.4]C[−0.4, −0.2], B[0.15, 0.2] → A[0.1, 0.25]C[−0.1, 0.1] and B[0.05, 0.2] → A[−0.2, 0.1]C[−0.3, −0.1]. These are three rules with similar form. Finally, Non-Dominated rules are found between found rules in step one. In Section 3.1, we depict Non-Dominated II in Rare-PEARs.

The goal of this research is to discover rare and/or interesting association rules (the rules with different form will be explained more in Section 3.1). We have developed our approach with three motivations:

•
Ability of handling datasets with quantitative values. Previous researches often work on binary or discretized values [5], [13], [58], [59], [60].
•
Produce initial population by the aid of evolutionary random permutation for all of sub-problems. Each sub-process starts with individuals with a specific length. This strategy considers an equal survival chance for individuals with different length. However, previous researches [10], [15], [19], [21], [24], [31], [32] generate their initial population randomly (rules with different lengths), and it is possible that rules with a specific length do not appear in the initial population. Hence, rare or interesting rules with bigger length may not be produced by the evolutionary operators. Consequently, fewer or no rules with bigger lengths participate in the finding Non-Dominated rules competition.
•
Determining suitable range has a direct effect on discovering interesting and/or rare rules. We accomplish this in our approach by introducing new definition of Non-Dominated rules (Contrary to previous researches [10], [14], [15], [21], [24], [25], [32]). In this new definition, we applied multi-objective concept to find the most interesting state of a rule.

This paper is arranged as follows. In Section 2, related works, including previous studies of association rule mining has been surveyed. This section focuses on evolutionary algorithms. At the end of this section time complexity of association rule mining is studied. In Section 3, we describe our methodology (Rare-PEARs) in detail. This section includes chromosome representation, population initialization and evolutionary operators. Section 4 shows a detailed comparison and presents a comprehensive analysis of the results. Finally, Section 5 concludes the paper.

Section snippets

Related work

In this section, we study many of research papers in association rule mining area and some of their real-world applications. First, we study two non-population based researches for association rule mining. An information theory based approach was presented in [20]. Ke et al. [20] applied the numerical discrimination for discovery of quantitative association rules. In [20], first mutual information was analyzed between the attributes of a quantitative database, and then normalization (mutual

Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule

In this section, Rare-PEARs is explained. Ultimate rules of Rare-PEARs have different appearances, and their attributes have semi-optimal intervals. Rare-PEARs rules have a good balance between reliability, coverage, accuracy, and interestingness. Before describing Rare-PEARs, we introduce the three stages of this method.

•
Generate the initial population of sub-processes. This is done by random permutation.
•
Produce new rules and find semi-optimal intervals for their attributes during the execution

Experimental result

We carried out several experiments to analyze quality our proposal. This section has been organized as follows:

•
Quality Criteria of QARs.
•
Dataset description.
•
Compared algorithms description.
•
Comparison with the mono-objective (four algorithms), multi-object (four algorithms) and classical algorithms (two algorithms).
•
Statistical test on results of algorithms.
•
Time complexity and scalability of our approach.

Our results are average of 10 runs on each dataset. Information of dataset has been presented

Conclusion

In this paper, we have proposed Rare-PEARs. It is a new multi-objective evolutionary algorithm that intelligently produces the initial population in each of different sub-problems. Initial population is generated by a random permutation algorithm. Each sub-process is responsible for finding rules with a fixed size. Nevertheless, evaluation operators of Rare-PEARs produce rules with sizes different than specific sizes of rules of each sub-process. It leads to high diversity in the initial

References (61)

D. Martin et al.
QAR-CIP-NSGA-II: anew multi-objective evolutionary algorithm to mine quantitative association rules
Inf. Sci.
(2014)
R.J. Kuo et al.
Comput. Math. Appl.
(2007)
K.N.V.D. Sarath et al.
Association rule mining using binary particle swarm optimization
Eng. Appl. Artif. Intell.
(2013)
X. Yan et al.
Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support
Expert Syst. Appl.
(2009)
A. Ghosh et al.
Multi-objective rule mining using genetic algorithms
Inf. Sci.
(2004)
B. Alatas et al.
Modenar: multi-objective differential evolution algorithm for mining numeric association rules
Appl. Soft Comput.
(2008)
Y.-L. Chen et al.
Mining fuzzy association rules from questionnaire data
Knowl.-Based Syst.
(2009)
K.Y. Fung et al.
A multi-objective genetic algorithm approach to rule mining for affective product design
Expert Syst. Appl.
(2012)
J.M. Luna et al.
Grammar-based multi-objective algorithms for mining association rules
Data Knowl. Eng.
(2013)
H. Qodmanan et al.
Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence
Expert Syst. Appl.
(2011)

S.G. Matthews et al.

Web usage mining with evolutionary extraction of temporal fuzzy association rules

Knowl.-Based Syst.

(2013)

V. Beiranvand et al.

Multi-objective PSO algorithm for mining numerical association rules without a priori discretization

Expert Syst. Appl.

(2014)

G. Yang et al.

A novel evolutionary method to search interesting association rules by keywords

Expert Syst. Appl.

(2011)

E. Shortliffe et al.

A model of inexact reasoning in medicine

Math. Biosci.

(1975)

M. Martínez-Ballesteros et al.

Discovering gene association networks by multi-objective evolutionary quantitative association rules

J. Comput. Syst. Sci.

(2014)

R. Agrawal et al.

Fast algorithms for mining association rules

R. Agrawal et al.

Mining association rules between sets of items in large databases

News Lett. ACM SIGMOD

(1993)

D.-I. Lin et al.

Pincer-search: an efficient algorithm for discovering the maximum frequent set

IEEE Trans. Knowl. Data Eng.

(2002)

R. Srikant et al.

Mining quantitative association rules in large relational tables

SIGMOD Rec.

(1996)

M. Vannucci, V. Colla, Meaningful discretization of continuous features for association rules mining by means of a SOM,...

Alcala-Fdez et al.

KEEL: A software tool to assess evolutionary algorithms to data mining problems

Soft. Comput.

(2009)

J. Han et al.

Mining frequent patterns without candidate generation

A. Savasere et al.

An efficient algorithm for mining association rules in large databases

J. Han et al.

Data Mining: Concepts and Techniques

(2011)

R.J. Kuoa et al.

Application of particle swarm optimization to association rule mining

Appl. Soft. Comput.

(2011)

Mehmet Kaya

Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules

Soft. Comput.

(2006)

C. Chen et al.

A multi-objective genetic-fuzzy mining algorithm

Y. Aumann et al.

A statistical theory for quantitative association rules

Journal of Intelligent Information Systems

(2003)

B. Minaei-Bidgoli et al.

Mining numerical association rules via multi-objective genetic algorithms

Inf. Sci.

(2013)

Y. Ke et al.

An information-theoretic approach to quantitative association rule mining

Knowl. Inf. Syst.

(2008)

Cited by (24)

Differential evolution and sine cosine algorithm based novel hybrid multi-objective approaches for numerical association rule mining
2021, Information Sciences
In association rules mining from data that have numeric-valued attributes, automatically adjusting the attribute intervals at the time of the mining process without a preprocess is very critical for preventing data loss and attribute interactions. In this paper, differential evolution and sine cosine algorithm based novel hybrid multi-objective evolutionary optimization methods are proposed for rapidly and directly mining the reduced high-quality numerical association rules by simultaneously adjusting the relevant intervals of related attributes without finding the frequent itemsets. These algorithms perform a global search and find the high-quality rules set in only one execution by modeling the rule mining task as a multi-objective problem that simultaneously meets different conflicting metrics. The algorithms proposed in this paper ensure the discovered rules to have high confidence and support and to be comprehensible. The proposed methods automate the rule mining process by directly finding the minimum intervals for the attributes and eliminating the need for minimum confidence and minimum support determined beforehand for each data set. The performances of new algorithms proposed in this study were tested with those of the state-of-the-art algorithms. The results show superiority of the proposed methods on the data sets that contain fewer attributes and higher number of instances.
A survey of evolutionary computation for association rule mining
2020, Information Sciences
Citation Excerpt :
Crowding distance used by NSGA-II was replaced to sort the solutions for each Pareto-front. Almasi and Abadeh [17] decomposed ARM into different sub-problems in order to extract RARs from quantitative data. Each problem is solved using a different initial population.
Association Rule Mining (ARM) is a significant task for discovering frequent patterns in data mining. It has achieved great success in a plethora of applications such as market basket, computer networks, recommendation systems, and healthcare. In the past few years, evolutionary computation-based ARM has emerged as one of the most popular research areas for addressing the high computation time of traditional ARM. Although numerous papers have been published, there is no comprehensive analysis of existing evolutionary ARM methodologies. In this paper, we review emerging research of evolutionary computation for ARM. We discuss the applications on evolutionary computations for different types of ARM approaches including numerical rules, fuzzy rules, high-utility itemsets, class association rules, and rare association rules. Evolutionary ARM algorithms were classified into four main groups in terms of the evolutionary approach, including evolution-based, swarm intelligence-based, physics-inspired, and hybrid approaches. Furthermore, we discuss the remaining challenges of evolutionary ARM and discuss its applications and future topics.
CARs-Lands: An associative classifier for large-scale datasets
2020, Pattern Recognition
Citation Excerpt :
Note that replacement is performed randomly, if the supports are also the same. Since rule of CAR-lands are produced based on Rare-PEARs [42], at the beginning of this section, we briefly review it. Rare-PEARs [42] is an evolutionary association rule mining method which produces rare and reliable rules.
Associative classifiers are one of the most efficient classifiers for large datasets. However, they are unsuitable to be directly used in large-scale data problems. Associative classifiers discover frequent/rare rules or both in order to produce an efficient classifier. Discovery rules need to explore a large solution space in a well-organized manner; hence, learning of the associative classification methods of large datasets is not suitable on large-scale datasets because of memory and time-complexity constraints. The proposed method, CARs-Lands, presents an efficient distributed associative classifier. In CARs-Lands, first, a modified dataset is generated. This new dataset has sub-datasets that are completely appropriate to produce classification association rules (CARs) in a parallel manner. The produced dataset by CARs-Lands contains two types of instances: main instances and neighbor instances. Main instances can be either real instances of training dataset or meta-instances, which are not in the training dataset; each main instance has several neighbor instances from the training dataset, which together form a sub-dataset. These sub-datasets are used for parallel local association rule mining. In CARs-Lands, local association rules lead to more accurate prediction, because each test instance is classified by the association rules of their nearest neighbors in the training datasets. The proposed approach is evaluated in terms of accuracy on six real-world large-scale datasets against five recent and well-known methods. Experiment results show that the proposed classification method has high prediction accuracy and is highly competitive when compared to other classification methods.
Applying mutual information for discretization to support the discovery of rare-unusual association rule in cerebrovascular examination dataset
2019, Expert Systems with Applications
In knowledge discovery studies, association rules mining has been extensively studied to discover hidden knowledge and relationships among set of items in a transactional dataset. Most research on association rule mining focuses on discovering frequent patterns based on the most frequent items occurring in the dataset. However, the process of extracting rare rules has received less attention. In medical dataset studies, the discovery of rare association rules (RARs) is more challenging, because it could likely be used to obtain more potentially rare and unusual knowledge for physicians, beside frequent association rules. Hence, the aim of this paper is to discover non-frequent or rare-unusual association rules (RUARs) from a stroke medical dataset to provide potential meaningful knowledge to the user domain.
A discretization method needs to be performed as the data preprocessing step before generating rules. To the best of our knowledge, fewer studies have focused on the role of discretization results to support the extraction of a better amount and quality of RUARs, particularly for medical datasets. In addition, the extracted RUARs is expected to provide potential new unusual insights on stroke risk patterns. This paper applies mutual information measure to discretize a stroke examination dataset collected from a medical center in Taiwan. The interval merging method was proposed to simplify the discrete form and enrich the quality of generated rules. Towards the end, rare association rules, with relatively low support, were generated by employing the Apriori-Rare method accordingly. In addition, a filtering process was applied to the content of the rule itemsets to discover the expected set of RUARs for physicians. Furthermore, the extracted RUARs was analyzed based on the relative risk values toward the occurrence of stroke.
Results indicated that the mutual information discretization outperformed the traditional discretization methods in terms of how the discretization scheme can support the extraction of RUARs with a better quantity and quality measurements for further analysis purpose in medical point of view. Moreover, the proposed method had a relatively higher number of RUARs. The knowledge of unusual rule patterns from rare association rules might provide potential new and unusual insights for medical pratitioners and increase the awareness of stroke examination results.
MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems
2018, Knowledge-Based Systems
Citation Excerpt :
This kind of MOEAs addresses a multiobjective problem as N subproblems optimized at the same time using an EA. A MOEA to discover rare and interesting QAR was presented in [35]. Other EA-based approaches, such as niching genetic algorithms, have been applied to discover QAR [36].
Many algorithms have emerged to address the discovery of quantitative association rules from datasets in the last years. However, this task is becoming a challenge because the processing power of most existing techniques is not enough to handle the large amount of data generated nowadays. These vast amounts of data are known as Big Data. A number of previous studies have been focused on mining boolean or nominal association rules from Big Data problems, nevertheless, the data in real-world applications usually consist of quantitative values and designing data mining algorithms able to extract quantitative association rules presents a challenge to workers in this research field. In spite of the fact that we can find classical methods to discover boolean or nominal association rules in the most well-known repositories of Big Data algorithms, such repositories do not provide methods to discover quantitative association rules. Indeed, no methodologies have been proposed in the literature without prior discretization in Big Data. Hence, this work proposes MRQAR, a new generic parallel framework to discover quantitative association rules in large amounts of data, designed following the MapReduce paradigm using Apache Spark. MRQAR performs an incremental learning able to run any sequential quantitative association rule algorithm in Big Data problems without needing to redesign such algorithms. As a case study, we have integrated the multiobjective evolutionary algorithm MOPNAR into MRQAR to validate the generic MapReduce framework proposed in this work. The results obtained in the experimental study performed on five Big Data problems prove the capability of MRQAR to obtain reduced set of high quality rules in reasonable time.
A novel Multiple Objective Symbiotic Organisms Search (MOSOS) for time-cost-labor utilization tradeoff problem
2016, Knowledge-Based Systems
Multiple work shifts are commonly utilized in construction projects to meet project requirements. Nevertheless, evening and night shifts raise the risk of adverse events and thus must be used to the minimum extent feasible. Tradeoff optimization among project duration (time), project cost, and the utilization of evening and night work shifts while maintaining with all job logic and resource availability constraints is necessary to enhance overall construction project success. In this study, a novel approach called “Multiple Objective Symbiotic Organisms Search” (MOSOS) to solve multiple work shifts problem is introduced. The MOSOS algorithm is new meta-heuristic based multi-objective optimization techniques inspired by the symbiotic interaction strategies that organisms use to survive in the ecosystem. A numerical case study of construction projects were studied and the performance of MOSOS is evaluated in comparison with other widely used algorithms which includes non-dominated sorting genetic algorithm II (NSGA-II), the multiple objective particle swarm optimization (MOPSO), the multiple objective differential evolution (MODE), and the multiple objective artificial bee colony (MOABC). The numerical results demonstrate MOSOS approach is a powerful search and optimization technique in finding optimization of work shift schedules that is it can assist project managers in selecting appropriate plan for project.

View all citing articles on Scopus

View full text

Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules

Abstract

Introduction

Section snippets

Related work

Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule

Experimental result

Conclusion

Inf. Sci.

Comput. Math. Appl.

Eng. Appl. Artif. Intell.

Expert Syst. Appl.

Inf. Sci.

Appl. Soft Comput.

Knowl.-Based Syst.

Expert Syst. Appl.

Data Knowl. Eng.

Expert Syst. Appl.

Knowl.-Based Syst.

Expert Syst. Appl.

Expert Syst. Appl.

Math. Biosci.

J. Comput. Syst. Sci.

Fast algorithms for mining association rules

Mining association rules between sets of items in large databases

News Lett. ACM SIGMOD

Pincer-search: an efficient algorithm for discovering the maximum frequent set

IEEE Trans. Knowl. Data Eng.

Mining quantitative association rules in large relational tables

SIGMOD Rec.

KEEL: A software tool to assess evolutionary algorithms to data mining problems

Soft. Comput.

Mining frequent patterns without candidate generation

An efficient algorithm for mining association rules in large databases

Data Mining: Concepts and Techniques

Application of particle swarm optimization to association rule mining

Appl. Soft. Comput.

Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules

Soft. Comput.

A multi-objective genetic-fuzzy mining algorithm

A statistical theory for quantitative association rules

Journal of Intelligent Information Systems

Mining numerical association rules via multi-objective genetic algorithms

Inf. Sci.

An information-theoretic approach to quantitative association rule mining

Knowl. Inf. Syst.