Fuzzy association rule mining approaches for enhancing prediction performance
Introduction
A prediction model requires generating and evaluating a set of rules to predict a future value accurately. Association rule mining has been a popular approach in data mining (DM) research, increasingly attracting the attention of researchers (Jain et al., 2008, Toloo and Nalchigar, 2011, Toloo et al., 2009, Ho et al., 2012, Chiu et al., 2012). Association rules discovery presented in (Agrawal, Imielinski, & Swami, 1993) intends to extract the characteristics, hidden association patterns and the correlation between the items (attributes) in a large database (Kannan and Bhaskaran, 2009, Kiran and Reddy, 2010). The Apriori algorithm developed by Agrawal and Srikant (1994) is a classic and popular algorithm for strong association rules (knowledge) extraction from a transaction database with high frequent itemsets using the pre-defined threshold measures. These thresholds are minimum support (minsupp) and minimum confidence (minconf). Association rules are formally written and presented in the form of “IF–Then” as follows: X → Y, where X is called the antecedent and Y is called the consequence.
Let I = {i1, i2, … , in} be a set of distinct items (attributes). A collection of one or more items, i.e., any set of items is called an itemset. Let D = {t1, t2, … , tm} be a set of transaction IDs (TIDs). Each TID in D is formed from a set of items in I. The support count is the occurrence (frequency) of X and Y together, support (X ∪ Y), and the support value is the fraction of transactions that contains both X and Y.
An itemset whose support is greater than or equal to a minsupp threshold is called a frequent itemset. The confidence value measures how often items in Y appear in transactions that contain X and is the ratio of occurrence (X and Y) divided by (/) occurrence (X) (i.e., ). An association rule is an implication expression of the form (X → Y), where X, Y ⊂ I and X ∩ Y = ∅. A strong association rule is that which has support and confidence greater than the user defined minsupp and minconf. The main task of the association rule discovery is to find all strong rules.
One of the advantages of association rule discovery is that it extracts explicit rules that are of practical importance for the user/human expert to understand the application domain. Therefore this can be facilitated to adjust (extend) the rules manually with further domain knowledge, which is difficult to achieve with other mining approaches (Gedikli & Jannach, 2010). Srikant and Agrawal (1996) introduced the problem of extracting association rules from quantitative attributes (a numeric data set) by using the partitions method for these attributes. Some of the current association rule mining approaches for quantitative data neglected the values of the interval boundaries of the partitions. This causes sharpness of the boundary intervals which does not reflect the nature of human perception, justifiably argued by (Kuok et al., 1998, Kaya and Alhajj, 2003). Instead of using partition methods for the attributes, it is better to adopt the advantage of fuzzy set theory with a smooth transition between fuzzy sets. As a whole, the fuzzy approach is used for transforming quantitative data into fuzzy data. A variety of approaches has been developed in order to extract fuzzy association rules from quantitative data sets (Hong et al., 2004, Zhang et al., 2005, Huang et al., 2006, Lee et al., 2006, Lei and Ren-hou, 2007, Pach et al., 2008, Chen et al., 2008, Ashish and Vikramkumar, 2010, Palacios et al., 2010).
This paper investigates the problem of association rules extraction from quantitative data using fuzzy clustering techniques. Fuzzy clustering is a suitable method to transform quantitative data into fuzzy data, taking the advantage of fuzzy set theory over the partition method concerning the smooth transition among fuzzy sets. Fuzzy Association Rules (FARs) mining is adopted in this paper as a solution for extracting knowledge from the quantitative database.
The association rule mining aims to discover the relationships (rules) among the data attributes (features), which depend on minsupp and minconf. Consequently, large numbers of rules are anticipated, particularly if minsupp is set to be very low. Practically, a single minsupp is a vital parameter that controls the extracted number of association rules (Hu & Chen, 2006). Toloo et al. (2009) proposed an integrated data envelopment analysis based method to identify the most efficient association rules by ranking them using multiple criteria. Conventional association rule mining approaches like Apriori (Agrawal & Srikant, 1994) and Frequent Pattern-Growth (FP-Growth) (Han, Pei, & Yin, 2000) are based on a single minsupp threshold. However, it was observed that using a single minsupp causes a dilemma called the “rare item problem” (Liu et al., 1999, Hu and Chen, 2006, Kiran and Reddy, 2010).
To solve this rare item problem, Liu et al. (1999) developed a multiple support model called the Multiple Support Apriori (MSapriori) algorithm. MSapriori is based on the idea of setting a Minimum Item Support (MIS) for each item in a database, i.e., employing multiple minsupp for different items in the database, instead of using a single minsupp for the whole database. Hence, MSaproiri is expressed as a generalization of the Apriori algorithm. Different MIS values can be assigned to assess different frequent items to facilitate the generation of frequent itemsets of rare items and prevent the production of uninteresting frequent itemsets (Hu & Chen, 2006). More recently, an approach has been developed to improve MSapriori called Improved Multiple Support Apriori (IMSapriori) ( Kiran and Reddy, 2010, Palacios et al., 2010).
This paper also proposes Fuzzy Association Rules (FARs) generated using Fuzzy clustering on quantitative data by adopting the multiple support approaches in order to deal with the limitations of using a single minsupp. In summary, this paper proposes and tests two prediction models through a set of experiments and then compares them with existing work to demonstrate their merits and capabilities. The first model, referred to as the FCM–Apriori model, is based on the integration of the Fuzzy C-Means (FCM) clustering algorithm and the Apriori approach for extracting FARs, applied to a road traffic domain for prediction of a future value. The second model, referred to as the FCM–MSapriori model, is based on FCM and a multiple support thresholds approach; it basically enhances the first model to capture the rare itemsets related rules, and is applied to the same road traffic domain and another benchmark data set called Abalone.
The rest of this paper is organized as follows. The next section presents the algorithms and prediction models. Section 3 describes the case studies used to demonstrate the merits and capabilities of the models. Experimental results of analysis are presented in Section 4. Finally, the conclusions are drawn in Section 5 with the key contribution of the investigation.
Section snippets
The FCM–Apriori model
The proposed FCM–Apriori model extracts fuzzy rules for building a KB from a database, and is based on Huang et al. (2006), and Lu, Xu, and Jiang (2003). The model utilizes the following two methods:
- •
FCM is used as an automatic system to transform the quantitative data set into fuzzy sets (terms). FCM is one of the fuzzy clustering algorithms based on an objective functioning method, developed by Bezdek in 1981 adopting the fuzzy set theory. In other words, it assigns a data object (observation)
Data sets for the case studies
Three data sets have been employed for the model performance analysis in Section 4. Two data sets are related to a road traffic problem and the third one is the Abalone bench benchmark data set.
The road traffic data has been generated using a traffic simulation model, (called the METANET macroscopic flow model) (Messmer, 2007).
Each record consists of:
- •
Traffic state, which is represented by:
- ∘
traffic demand in road 1 (the number of vehicles that need to use road 1)
- ∘
traffic demand in road 2
- ∘
traffic
- ∘
The FCM–Apriori model
For analysis and validation purposes, the small road traffic data set (Section 3.1) as well as the large road traffic data set (Section 3.2) are used. Traffic state prediction (including traffic flow (traffic density) and traffic demand) has long been regarded as a critical concern for intelligent road traffic systems. The FCM–Apriori model discussed in Section 2 is applied to road traffic control management; two case studies with the two different data set sizes are used for the road traffic
Conclusion
This paper has presented two enhanced prediction models using a Fuzzy association rule mining approach. The FCM–Apriori model is based on a single support value, which has been tested for two (small and large) data sets in a road traffic domain. It is noted from the results that the model has effectively minimized MAPE, which is sensitive to minsupp and minconf values. It is also noted that a large data set size offered lower MAPE compared to the small one. The model used FCM to determine
References (29)
- et al.
Applying cluster-based fuzzy association rules mining framework into EC environment
Journal of Applied Soft Computing
(2012) - et al.
Using a fuzzy association rule mining approach to identify the financial data association
Expert Systems with Applications
(2012) - et al.
A fuzzy Apriori Tid mining algorithm with reduced computational time
Applied Soft Computing Journal
(2004) - et al.
Integrating fuzzy data mining and fuzzy artificial neural networks for discovering implicit knowledge
Knowledge-Based Systems
(2006) - et al.
Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism
- et al.
A new approach for evaluating agility in supply chains using fuzzy association rules mining
Engineering Applications of Artificial Intelligence
(2008) - et al.
Compact fuzzy association rule-based classifier
Expert Systems with Applications
(2008) - et al.
A new method for ranking discovered rules from data mining by DEA
Expert Systems with Applications
(2009) - Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceeding of...
- Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining association rules between sets of items in large databases. In...
Cluster-based evaluation in fuzzy-genetic data mining
IEEE Transactions on Fuzzy Systems
Cited by (37)
Profile-based assessment of diseases affective factors using fuzzy association rule mining approach: A case study in heart diseases
2021, Journal of Biomedical InformaticsCitation Excerpt :For better comprehension, hereafter, the name of each method is replaced with the category of methods that it represents. For example, the term “fuzzy-without-profile method” is used to refer to the method of [12]. Partitioning performance comparison is carried out in terms of confidence value.
Combining association rules mining with complex networks to monitor coupled risks
2019, Reliability Engineering and System SafetyAssessment of corporate innovation capability with a data-mining approach: industrial case studies
2016, Computers and Industrial EngineeringCitation Excerpt :The rules indicate that if condition “A” occurs, then condition “B” may also occur. Details on the association rules are provided in Hipp, Güntzer, and Nakhaeizadeh (2000), Zhao and Bhowmick (2003), Kotsiantis and Kanellopoulos (2006), Sowan, Dahal, Hossain, Zhang, and Spencer (2013), and Altuntas, Dereli, and Kusiak (2015). Association rules are widely used tools in data mining.
Association rules mining based analysis of consequential alarm sequences in chemical processes
2016, Journal of Loss Prevention in the Process IndustriesA Cognitively Confidence-Debiased Adversarial Fuzzy Apriori Method
2024, IEEE Transactions on Fuzzy SystemsMachine Learning Aided Numerical and Experimental Investigation of Hydrodynamic Performance in the Circulating Fluidized Bed Boiler
2024, Journal of Thermal Science and Engineering Applications