Elsevier

Expert Systems with Applications

Volume 40, Issue 17, 1 December 2013, Pages 6928-6937
Expert Systems with Applications

Fuzzy association rule mining approaches for enhancing prediction performance

https://doi.org/10.1016/j.eswa.2013.06.025Get rights and content

Highlights

  • Two new prediction models are proposed and compared to demonstrate their merits.

  • First model is an integration of Fuzzy C-Means (FCM) and the Apriori algorithm.

  • Second model is an integration of FCM and the multiple support thresholds approach.

  • Models are evaluated using a road traffic data set and Abalone benchmark data set.

Abstract

This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature.

Introduction

A prediction model requires generating and evaluating a set of rules to predict a future value accurately. Association rule mining has been a popular approach in data mining (DM) research, increasingly attracting the attention of researchers (Jain et al., 2008, Toloo and Nalchigar, 2011, Toloo et al., 2009, Ho et al., 2012, Chiu et al., 2012). Association rules discovery presented in (Agrawal, Imielinski, & Swami, 1993) intends to extract the characteristics, hidden association patterns and the correlation between the items (attributes) in a large database (Kannan and Bhaskaran, 2009, Kiran and Reddy, 2010). The Apriori algorithm developed by Agrawal and Srikant (1994) is a classic and popular algorithm for strong association rules (knowledge) extraction from a transaction database with high frequent itemsets using the pre-defined threshold measures. These thresholds are minimum support (minsupp) and minimum confidence (minconf). Association rules are formally written and presented in the form of “IF–Then” as follows: X  Y, where X is called the antecedent and Y is called the consequence.

Let I = {i1, i2,  , in} be a set of distinct items (attributes). A collection of one or more items, i.e., any set of items is called an itemset. Let D = {t1, t2,  , tm} be a set of transaction IDs (TIDs). Each TID in D is formed from a set of items in I. The support count is the occurrence (frequency) of X and Y together, support (X  Y), and the support value is the fraction of transactions that contains both X and Y.

An itemset whose support is greater than or equal to a minsupp threshold is called a frequent itemset. The confidence value measures how often items in Y appear in transactions that contain X and is the ratio of occurrence (X and Y) divided by (/) occurrence (X) (i.e., support(XY)support(X)). An association rule is an implication expression of the form (X  Y), where X, Y  I and X  Y = ∅. A strong association rule is that which has support and confidence greater than the user defined minsupp and minconf. The main task of the association rule discovery is to find all strong rules.

One of the advantages of association rule discovery is that it extracts explicit rules that are of practical importance for the user/human expert to understand the application domain. Therefore this can be facilitated to adjust (extend) the rules manually with further domain knowledge, which is difficult to achieve with other mining approaches (Gedikli & Jannach, 2010). Srikant and Agrawal (1996) introduced the problem of extracting association rules from quantitative attributes (a numeric data set) by using the partitions method for these attributes. Some of the current association rule mining approaches for quantitative data neglected the values of the interval boundaries of the partitions. This causes sharpness of the boundary intervals which does not reflect the nature of human perception, justifiably argued by (Kuok et al., 1998, Kaya and Alhajj, 2003). Instead of using partition methods for the attributes, it is better to adopt the advantage of fuzzy set theory with a smooth transition between fuzzy sets. As a whole, the fuzzy approach is used for transforming quantitative data into fuzzy data. A variety of approaches has been developed in order to extract fuzzy association rules from quantitative data sets (Hong et al., 2004, Zhang et al., 2005, Huang et al., 2006, Lee et al., 2006, Lei and Ren-hou, 2007, Pach et al., 2008, Chen et al., 2008, Ashish and Vikramkumar, 2010, Palacios et al., 2010).

This paper investigates the problem of association rules extraction from quantitative data using fuzzy clustering techniques. Fuzzy clustering is a suitable method to transform quantitative data into fuzzy data, taking the advantage of fuzzy set theory over the partition method concerning the smooth transition among fuzzy sets. Fuzzy Association Rules (FARs) mining is adopted in this paper as a solution for extracting knowledge from the quantitative database.

The association rule mining aims to discover the relationships (rules) among the data attributes (features), which depend on minsupp and minconf. Consequently, large numbers of rules are anticipated, particularly if minsupp is set to be very low. Practically, a single minsupp is a vital parameter that controls the extracted number of association rules (Hu & Chen, 2006). Toloo et al. (2009) proposed an integrated data envelopment analysis based method to identify the most efficient association rules by ranking them using multiple criteria. Conventional association rule mining approaches like Apriori (Agrawal & Srikant, 1994) and Frequent Pattern-Growth (FP-Growth) (Han, Pei, & Yin, 2000) are based on a single minsupp threshold. However, it was observed that using a single minsupp causes a dilemma called the “rare item problem” (Liu et al., 1999, Hu and Chen, 2006, Kiran and Reddy, 2010).

To solve this rare item problem, Liu et al. (1999) developed a multiple support model called the Multiple Support Apriori (MSapriori) algorithm. MSapriori is based on the idea of setting a Minimum Item Support (MIS) for each item in a database, i.e., employing multiple minsupp for different items in the database, instead of using a single minsupp for the whole database. Hence, MSaproiri is expressed as a generalization of the Apriori algorithm. Different MIS values can be assigned to assess different frequent items to facilitate the generation of frequent itemsets of rare items and prevent the production of uninteresting frequent itemsets (Hu & Chen, 2006). More recently, an approach has been developed to improve MSapriori called Improved Multiple Support Apriori (IMSapriori) ( Kiran and Reddy, 2010, Palacios et al., 2010).

This paper also proposes Fuzzy Association Rules (FARs) generated using Fuzzy clustering on quantitative data by adopting the multiple support approaches in order to deal with the limitations of using a single minsupp. In summary, this paper proposes and tests two prediction models through a set of experiments and then compares them with existing work to demonstrate their merits and capabilities. The first model, referred to as the FCM–Apriori model, is based on the integration of the Fuzzy C-Means (FCM) clustering algorithm and the Apriori approach for extracting FARs, applied to a road traffic domain for prediction of a future value. The second model, referred to as the FCM–MSapriori model, is based on FCM and a multiple support thresholds approach; it basically enhances the first model to capture the rare itemsets related rules, and is applied to the same road traffic domain and another benchmark data set called Abalone.

The rest of this paper is organized as follows. The next section presents the algorithms and prediction models. Section 3 describes the case studies used to demonstrate the merits and capabilities of the models. Experimental results of analysis are presented in Section 4. Finally, the conclusions are drawn in Section 5 with the key contribution of the investigation.

Section snippets

The FCM–Apriori model

The proposed FCM–Apriori model extracts fuzzy rules for building a KB from a database, and is based on Huang et al. (2006), and Lu, Xu, and Jiang (2003). The model utilizes the following two methods:

  • FCM is used as an automatic system to transform the quantitative data set into fuzzy sets (terms). FCM is one of the fuzzy clustering algorithms based on an objective functioning method, developed by Bezdek in 1981 adopting the fuzzy set theory. In other words, it assigns a data object (observation)

Data sets for the case studies

Three data sets have been employed for the model performance analysis in Section 4. Two data sets are related to a road traffic problem and the third one is the Abalone bench benchmark data set.

The road traffic data has been generated using a traffic simulation model, (called the METANET macroscopic flow model) (Messmer, 2007).

Each record consists of:

  • Traffic state, which is represented by:

    • traffic demand in road 1 (the number of vehicles that need to use road 1)

    • traffic demand in road 2

    • traffic

The FCM–Apriori model

For analysis and validation purposes, the small road traffic data set (Section 3.1) as well as the large road traffic data set (Section 3.2) are used. Traffic state prediction (including traffic flow (traffic density) and traffic demand) has long been regarded as a critical concern for intelligent road traffic systems. The FCM–Apriori model discussed in Section 2 is applied to road traffic control management; two case studies with the two different data set sizes are used for the road traffic

Conclusion

This paper has presented two enhanced prediction models using a Fuzzy association rule mining approach. The FCM–Apriori model is based on a single support value, which has been tested for two (small and large) data sets in a road traffic domain. It is noted from the results that the model has effectively minimized MAPE, which is sensitive to minsupp and minconf values. It is also noted that a large data set size offered lower MAPE compared to the small one. The model used FCM to determine

References (29)

  • Ashish, M. & Vikramkumar, P. (2010). FPrep: Fuzzy clustering driven efficient automated pre-processing for fuzzy...
  • Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. ISBN...
  • C.H. Chen et al.

    Cluster-based evaluation in fuzzy-genetic data mining

    IEEE Transactions on Fuzzy Systems

    (2008)
  • Frank, A. & Asuncion, A. (2010). UCI machine learning repository [http://archive.ics.uci.edu/ml], University of...
  • Cited by (37)

    • Profile-based assessment of diseases affective factors using fuzzy association rule mining approach: A case study in heart diseases

      2021, Journal of Biomedical Informatics
      Citation Excerpt :

      For better comprehension, hereafter, the name of each method is replaced with the category of methods that it represents. For example, the term “fuzzy-without-profile method” is used to refer to the method of [12]. Partitioning performance comparison is carried out in terms of confidence value.

    • Assessment of corporate innovation capability with a data-mining approach: industrial case studies

      2016, Computers and Industrial Engineering
      Citation Excerpt :

      The rules indicate that if condition “A” occurs, then condition “B” may also occur. Details on the association rules are provided in Hipp, Güntzer, and Nakhaeizadeh (2000), Zhao and Bhowmick (2003), Kotsiantis and Kanellopoulos (2006), Sowan, Dahal, Hossain, Zhang, and Spencer (2013), and Altuntas, Dereli, and Kusiak (2015). Association rules are widely used tools in data mining.

    View all citing articles on Scopus
    View full text