Elsevier

Knowledge-Based Systems

Volume 89, November 2015, Pages 366-384
Knowledge-Based Systems

Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules

https://doi.org/10.1016/j.knosys.2015.07.016Get rights and content

Abstract

Since finding quantitative association rules (QARs) is an NP-hard problem, evolutionary methods are suitable solutions for discovery QARs. Nevertheless, most of the previous evolutionary methods to discover association rules only consider frequent dependency among items in datasets. They do not pay specific attention to interestingness and non-redundancy as two critical objectives. In this paper, the proposed algorithm (Rare-PEARs) gives a chance to each rule with different length and appearance (antecedent and consequent parts of rules) to be created. Therefore, various interesting, rare or interesting and rare rules can be found. Some of these rules might be uninteresting (those that contain frequent item sets). However, we try to avoid them by Rare-PEARs. To accomplish this goal, our method decomposes the process of association rule mining into N  1 sub-problems (N is the number of attributes, and each sub-problem is handled by an independent sub-process during Rare-PEARs execution). Each sub-process starts individually with a different initial population. It then explores the search space of its corresponding sub-problem to find rules with semi-optimal intervals for each of the attributes. This process is done by a new definition of Non-Dominated concept. Rare-PEARs uses this definition to find semi-optimal intervals for attributes during the execution of each sub-process. Finally, Rare-PEARs collects QARs from sub-processes and determines the ultimate Non-Dominated rules based on the interestingness and reliability measures. Rare-PEARs tries to maximize three objectives: interestingness, accuracy and reliability while providing vast coverage on the input dataset. We compared Rare-PEARs with ten algorithms (multi-objective, mono-objective and classical algorithms of association rule mining) over several real-world datasets. The results demonstrate high efficiency of Rare-PEARs.

Introduction

The problem of association rule mining was presented in 1993 and 1994 [2], [1], and after that it has been used as analysis technique by researchers. An Association rule is an expression in the form of X  Y. In this expression, X and Y are sets of items or attributes-value pairs. The left (X) and right hands (Y) of this expression are antecedent and consequent, respectively. Each of them contains a number of attributes, and these attributes cannot be null. Attributes of association rules can be placed in two domains: the discrete and the continuous domain [4]. Furthermore, association rules in the unsupervised domain can be classified into two groups: categorical association rules versus quantitative association rules [10], [51] and frequent association rules versus infrequent/rare rules [50]. There is also another type of association rules, called class association rules. They are rules with only one attribute in the consequent. Class association rules are used by associative classifiers (Fig. 1). Rules of this category are frequent and strong (The strong rules are rules with high value of confidence) [49].

Apriori algorithm [3] was an outstanding job in identifying association rules, but gradually its drawbacks became obvious. In theory, Apriori algorithm can guarantee results with high accuracy; nevertheless, its runtime will be noticeable in large datasets. To fix this drawback, a number of improved algorithms have been proposed. FP-growth [7] and Partition [8] are two superior examples of these algorithms. They significantly improve the performance, but applying them in some cases is impossible [9]. Population-based algorithms are the recent generation of association rule mining methods [52], [53]. However, they face two major challenges for finding QARs: Determining the appropriate ranges for quantitative attributes and finding interesting rules. We tried to solve these problems in Rare-PEARs.

Various criteria (support, confidence, lift, etc.) are used by the different QARs algorithms. Most of the non-population based algorithms have a common property; they use support, confidence or both of them as the main criteria to determine the quality of rules [1], [2], [3], [14]. Ant colony [11] and PSO [12], [13] based algorithms are some examples of these type of algorithms. Generally, in these algorithms, the user specifies a minimal acceptable value for support. Rules are generated if their support value is more than that threshold. However, support as the sole criterion is not enough. It cannot show the degree of reliability for a rule, and hence, confidence is often used as a measure to calculate the amount of reliability. There are some other measures as well. Lift [37] and CF [38] (certainly factor) are two of the most widely used. They will be described in next section.

By growing the number of criteria, Multi Objective Evolutionary Algorithms (MOEAs) have been introduced. MOEAs are interesting methods for mining QARs. However, rapid convergence damages the efficiency of MOEAs. Some researches, such as [10] solve this problem by restarting. They start again when the difference between two consecutive populations is less than α percent. Sufficient exploration needs to choose α percent carefully. If this value is high, current generations may not have the chance to produce elite chromosomes. Otherwise, it will lead to fast convergence. Finding the optimal interval of each rule’s attribute is another challenge in QARs. We solved these two problems by our proposed method (Rare-PEARs). We split mining QARs problem into N  1 sub-problems (for each sub-problem a sub-process is considered to find rules with a specific size) and propose new definition of Non-Dominated rules. We called this new definition of Non-Dominated rules as Non-Dominated II (this definition is explained in the last part of this section). Rare-PEARs bring variation in the population through the product item sets with different size for each sub-problem.

Association rules on continuous attributes are called quantitative association rules (QARs). QARs are represented as x  y, in which x and y are item sets and x  y = ∅. QARs include additional data (intervals of attributes) so they are shown as x[a, b]  y[c, d]. In this expression, a, b, c, d are real values and also x  y = ∅. QARs are simultaneously produced by N  1 sub-processes in Rare-PEARs. The aim of each sub-process is finding rules with specific size (these sizes are 2 to N in sub-problems, respectively). This significantly reduces the runtime (we will describe it in Sections 3 Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule, 4 Experimental result). Finding semi-optimal intervals for attributes of rules with similar form is the first objective in Rare PEARs. Note that similar form rules are rules with the same antecedent and consequent parts, but their interval values for the attribute are different. An example would be: B[0.1, 0.2]  A[0.2, 0.4]C[−0.4, −0.2], B[0.15, 0.2]  A[0.1, 0.25]C[−0.1, 0.1] and B[0.05, 0.2]  A[−0.2, 0.1]C[−0.3, −0.1]. These are three rules with similar form. Finally, Non-Dominated rules are found between found rules in step one. In Section 3.1, we depict Non-Dominated II in Rare-PEARs.

The goal of this research is to discover rare and/or interesting association rules (the rules with different form will be explained more in Section 3.1). We have developed our approach with three motivations:

  • Ability of handling datasets with quantitative values. Previous researches often work on binary or discretized values [5], [13], [58], [59], [60].

  • Produce initial population by the aid of evolutionary random permutation for all of sub-problems. Each sub-process starts with individuals with a specific length. This strategy considers an equal survival chance for individuals with different length. However, previous researches [10], [15], [19], [21], [24], [31], [32] generate their initial population randomly (rules with different lengths), and it is possible that rules with a specific length do not appear in the initial population. Hence, rare or interesting rules with bigger length may not be produced by the evolutionary operators. Consequently, fewer or no rules with bigger lengths participate in the finding Non-Dominated rules competition.

  • Determining suitable range has a direct effect on discovering interesting and/or rare rules. We accomplish this in our approach by introducing new definition of Non-Dominated rules (Contrary to previous researches [10], [14], [15], [21], [24], [25], [32]). In this new definition, we applied multi-objective concept to find the most interesting state of a rule.

This paper is arranged as follows. In Section 2, related works, including previous studies of association rule mining has been surveyed. This section focuses on evolutionary algorithms. At the end of this section time complexity of association rule mining is studied. In Section 3, we describe our methodology (Rare-PEARs) in detail. This section includes chromosome representation, population initialization and evolutionary operators. Section 4 shows a detailed comparison and presents a comprehensive analysis of the results. Finally, Section 5 concludes the paper.

Section snippets

Related work

In this section, we study many of research papers in association rule mining area and some of their real-world applications. First, we study two non-population based researches for association rule mining. An information theory based approach was presented in [20]. Ke et al. [20] applied the numerical discrimination for discovery of quantitative association rules. In [20], first mutual information was analyzed between the attributes of a quantitative database, and then normalization (mutual

Rare-PEARs: a new random permutation based evolutionary algorithm to mine quantitative association rule

In this section, Rare-PEARs is explained. Ultimate rules of Rare-PEARs have different appearances, and their attributes have semi-optimal intervals. Rare-PEARs rules have a good balance between reliability, coverage, accuracy, and interestingness. Before describing Rare-PEARs, we introduce the three stages of this method.

  • Generate the initial population of sub-processes. This is done by random permutation.

  • Produce new rules and find semi-optimal intervals for their attributes during the execution

Experimental result

We carried out several experiments to analyze quality our proposal. This section has been organized as follows:

  • Quality Criteria of QARs.

  • Dataset description.

  • Compared algorithms description.

  • Comparison with the mono-objective (four algorithms), multi-object (four algorithms) and classical algorithms (two algorithms).

  • Statistical test on results of algorithms.

  • Time complexity and scalability of our approach.

Our results are average of 10 runs on each dataset. Information of dataset has been presented

Conclusion

In this paper, we have proposed Rare-PEARs. It is a new multi-objective evolutionary algorithm that intelligently produces the initial population in each of different sub-problems. Initial population is generated by a random permutation algorithm. Each sub-process is responsible for finding rules with a fixed size. Nevertheless, evaluation operators of Rare-PEARs produce rules with sizes different than specific sizes of rules of each sub-process. It leads to high diversity in the initial

References (61)

  • S.G. Matthews et al.

    Web usage mining with evolutionary extraction of temporal fuzzy association rules

    Knowl.-Based Syst.

    (2013)
  • V. Beiranvand et al.

    Multi-objective PSO algorithm for mining numerical association rules without a priori discretization

    Expert Syst. Appl.

    (2014)
  • G. Yang et al.

    A novel evolutionary method to search interesting association rules by keywords

    Expert Syst. Appl.

    (2011)
  • E. Shortliffe et al.

    A model of inexact reasoning in medicine

    Math. Biosci.

    (1975)
  • M. Martínez-Ballesteros et al.

    Discovering gene association networks by multi-objective evolutionary quantitative association rules

    J. Comput. Syst. Sci.

    (2014)
  • R. Agrawal et al.

    Fast algorithms for mining association rules

  • R. Agrawal et al.

    Mining association rules between sets of items in large databases

    News Lett. ACM SIGMOD

    (1993)
  • D.-I. Lin et al.

    Pincer-search: an efficient algorithm for discovering the maximum frequent set

    IEEE Trans. Knowl. Data Eng.

    (2002)
  • R. Srikant et al.

    Mining quantitative association rules in large relational tables

    SIGMOD Rec.

    (1996)
  • M. Vannucci, V. Colla, Meaningful discretization of continuous features for association rules mining by means of a SOM,...
  • Alcala-Fdez et al.

    KEEL: A software tool to assess evolutionary algorithms to data mining problems

    Soft. Comput.

    (2009)
  • J. Han et al.

    Mining frequent patterns without candidate generation

  • A. Savasere et al.

    An efficient algorithm for mining association rules in large databases

  • J. Han et al.

    Data Mining: Concepts and Techniques

    (2011)
  • R.J. Kuoa et al.

    Application of particle swarm optimization to association rule mining

    Appl. Soft. Comput.

    (2011)
  • Mehmet Kaya

    Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules

    Soft. Comput.

    (2006)
  • C. Chen et al.

    A multi-objective genetic-fuzzy mining algorithm

  • Y. Aumann et al.

    A statistical theory for quantitative association rules

    Journal of Intelligent Information Systems

    (2003)
  • B. Minaei-Bidgoli et al.

    Mining numerical association rules via multi-objective genetic algorithms

    Inf. Sci.

    (2013)
  • Y. Ke et al.

    An information-theoretic approach to quantitative association rule mining

    Knowl. Inf. Syst.

    (2008)
  • Cited by (24)

    • A survey of evolutionary computation for association rule mining

      2020, Information Sciences
      Citation Excerpt :

      Crowding distance used by NSGA-II was replaced to sort the solutions for each Pareto-front. Almasi and Abadeh [17] decomposed ARM into different sub-problems in order to extract RARs from quantitative data. Each problem is solved using a different initial population.

    • CARs-Lands: An associative classifier for large-scale datasets

      2020, Pattern Recognition
      Citation Excerpt :

      Note that replacement is performed randomly, if the supports are also the same. Since rule of CAR-lands are produced based on Rare-PEARs [42], at the beginning of this section, we briefly review it. Rare-PEARs [42] is an evolutionary association rule mining method which produces rare and reliable rules.

    • MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems

      2018, Knowledge-Based Systems
      Citation Excerpt :

      This kind of MOEAs addresses a multiobjective problem as N subproblems optimized at the same time using an EA. A MOEA to discover rare and interesting QAR was presented in [35]. Other EA-based approaches, such as niching genetic algorithms, have been applied to discover QAR [36].

    View all citing articles on Scopus
    View full text