Fuzzy association rule mining approaches for enhancing prediction performance

doi:10.1016/j.eswa.2013.06.025

Expert Systems with Applications

Volume 40, Issue 17, 1 December 2013, Pages 6928-6937

https://doi.org/10.1016/j.eswa.2013.06.025 Get rights and content

Highlights

•
Two new prediction models are proposed and compared to demonstrate their merits.
•
First model is an integration of Fuzzy C-Means (FCM) and the Apriori algorithm.
•
Second model is an integration of FCM and the multiple support thresholds approach.
•
Models are evaluated using a road traffic data set and Abalone benchmark data set.

Abstract

This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature.

Introduction

A prediction model requires generating and evaluating a set of rules to predict a future value accurately. Association rule mining has been a popular approach in data mining (DM) research, increasingly attracting the attention of researchers (Jain et al., 2008, Toloo and Nalchigar, 2011, Toloo et al., 2009, Ho et al., 2012, Chiu et al., 2012). Association rules discovery presented in (Agrawal, Imielinski, & Swami, 1993) intends to extract the characteristics, hidden association patterns and the correlation between the items (attributes) in a large database (Kannan and Bhaskaran, 2009, Kiran and Reddy, 2010). The Apriori algorithm developed by Agrawal and Srikant (1994) is a classic and popular algorithm for strong association rules (knowledge) extraction from a transaction database with high frequent itemsets using the pre-defined threshold measures. These thresholds are minimum support (minsupp) and minimum confidence (minconf). Association rules are formally written and presented in the form of “IF–Then” as follows: X → Y, where X is called the antecedent and Y is called the consequence.

Let I = {i₁, i₂, … , i_n} be a set of distinct items (attributes). A collection of one or more items, i.e., any set of items is called an itemset. Let D = {t₁, t₂, … , t_m} be a set of transaction IDs (TIDs). Each TID in D is formed from a set of items in I. The support count is the occurrence (frequency) of X and Y together, support (X ∪ Y), and the support value is the fraction of transactions that contains both X and Y.

An itemset whose support is greater than or equal to a minsupp threshold is called a frequent itemset. The confidence value measures how often items in Y appear in transactions that contain X and is the ratio of occurrence (X and Y) divided by (/) occurrence (X) (i.e., $\frac{support (X \cup Y)}{support (X)}$ ). An association rule is an implication expression of the form (X → Y), where X, Y ⊂ I and X ∩ Y = ∅. A strong association rule is that which has support and confidence greater than the user defined minsupp and minconf. The main task of the association rule discovery is to find all strong rules.

One of the advantages of association rule discovery is that it extracts explicit rules that are of practical importance for the user/human expert to understand the application domain. Therefore this can be facilitated to adjust (extend) the rules manually with further domain knowledge, which is difficult to achieve with other mining approaches (Gedikli & Jannach, 2010). Srikant and Agrawal (1996) introduced the problem of extracting association rules from quantitative attributes (a numeric data set) by using the partitions method for these attributes. Some of the current association rule mining approaches for quantitative data neglected the values of the interval boundaries of the partitions. This causes sharpness of the boundary intervals which does not reflect the nature of human perception, justifiably argued by (Kuok et al., 1998, Kaya and Alhajj, 2003). Instead of using partition methods for the attributes, it is better to adopt the advantage of fuzzy set theory with a smooth transition between fuzzy sets. As a whole, the fuzzy approach is used for transforming quantitative data into fuzzy data. A variety of approaches has been developed in order to extract fuzzy association rules from quantitative data sets (Hong et al., 2004, Zhang et al., 2005, Huang et al., 2006, Lee et al., 2006, Lei and Ren-hou, 2007, Pach et al., 2008, Chen et al., 2008, Ashish and Vikramkumar, 2010, Palacios et al., 2010).

This paper investigates the problem of association rules extraction from quantitative data using fuzzy clustering techniques. Fuzzy clustering is a suitable method to transform quantitative data into fuzzy data, taking the advantage of fuzzy set theory over the partition method concerning the smooth transition among fuzzy sets. Fuzzy Association Rules (FARs) mining is adopted in this paper as a solution for extracting knowledge from the quantitative database.

The association rule mining aims to discover the relationships (rules) among the data attributes (features), which depend on minsupp and minconf. Consequently, large numbers of rules are anticipated, particularly if minsupp is set to be very low. Practically, a single minsupp is a vital parameter that controls the extracted number of association rules (Hu & Chen, 2006). Toloo et al. (2009) proposed an integrated data envelopment analysis based method to identify the most efficient association rules by ranking them using multiple criteria. Conventional association rule mining approaches like Apriori (Agrawal & Srikant, 1994) and Frequent Pattern-Growth (FP-Growth) (Han, Pei, & Yin, 2000) are based on a single minsupp threshold. However, it was observed that using a single minsupp causes a dilemma called the “rare item problem” (Liu et al., 1999, Hu and Chen, 2006, Kiran and Reddy, 2010).

To solve this rare item problem, Liu et al. (1999) developed a multiple support model called the Multiple Support Apriori (MSapriori) algorithm. MSapriori is based on the idea of setting a Minimum Item Support (MIS) for each item in a database, i.e., employing multiple minsupp for different items in the database, instead of using a single minsupp for the whole database. Hence, MSaproiri is expressed as a generalization of the Apriori algorithm. Different MIS values can be assigned to assess different frequent items to facilitate the generation of frequent itemsets of rare items and prevent the production of uninteresting frequent itemsets (Hu & Chen, 2006). More recently, an approach has been developed to improve MSapriori called Improved Multiple Support Apriori (IMSapriori) ( Kiran and Reddy, 2010, Palacios et al., 2010).

This paper also proposes Fuzzy Association Rules (FARs) generated using Fuzzy clustering on quantitative data by adopting the multiple support approaches in order to deal with the limitations of using a single minsupp. In summary, this paper proposes and tests two prediction models through a set of experiments and then compares them with existing work to demonstrate their merits and capabilities. The first model, referred to as the FCM–Apriori model, is based on the integration of the Fuzzy C-Means (FCM) clustering algorithm and the Apriori approach for extracting FARs, applied to a road traffic domain for prediction of a future value. The second model, referred to as the FCM–MSapriori model, is based on FCM and a multiple support thresholds approach; it basically enhances the first model to capture the rare itemsets related rules, and is applied to the same road traffic domain and another benchmark data set called Abalone.

The rest of this paper is organized as follows. The next section presents the algorithms and prediction models. Section 3 describes the case studies used to demonstrate the merits and capabilities of the models. Experimental results of analysis are presented in Section 4. Finally, the conclusions are drawn in Section 5 with the key contribution of the investigation.

Section snippets

The FCM–Apriori model

The proposed FCM–Apriori model extracts fuzzy rules for building a KB from a database, and is based on Huang et al. (2006), and Lu, Xu, and Jiang (2003). The model utilizes the following two methods:

•
FCM is used as an automatic system to transform the quantitative data set into fuzzy sets (terms). FCM is one of the fuzzy clustering algorithms based on an objective functioning method, developed by Bezdek in 1981 adopting the fuzzy set theory. In other words, it assigns a data object (observation)

Data sets for the case studies

Three data sets have been employed for the model performance analysis in Section 4. Two data sets are related to a road traffic problem and the third one is the Abalone bench benchmark data set.

The road traffic data has been generated using a traffic simulation model, (called the METANET macroscopic flow model) (Messmer, 2007).

Each record consists of:

•
Traffic state, which is represented by:
- ∘
  traffic demand in road 1 (the number of vehicles that need to use road 1)
- ∘
  traffic demand in road 2
- ∘
  traffic

The FCM–Apriori model

For analysis and validation purposes, the small road traffic data set (Section 3.1) as well as the large road traffic data set (Section 3.2) are used. Traffic state prediction (including traffic flow (traffic density) and traffic demand) has long been regarded as a critical concern for intelligent road traffic systems. The FCM–Apriori model discussed in Section 2 is applied to road traffic control management; two case studies with the two different data set sizes are used for the road traffic

Conclusion

This paper has presented two enhanced prediction models using a Fuzzy association rule mining approach. The FCM–Apriori model is based on a single support value, which has been tested for two (small and large) data sets in a road traffic domain. It is noted from the results that the model has effectively minimized MAPE, which is sensitive to minsupp and minconf values. It is also noted that a large data set size offered lower MAPE compared to the small one. The model used FCM to determine

References (29)

H.P. Chiu et al.
Applying cluster-based fuzzy association rules mining framework into EC environment
Journal of Applied Soft Computing
(2012)
G.T.S. Ho et al.
Using a fuzzy association rule mining approach to identify the financial data association
Expert Systems with Applications
(2012)
T.P. Hong et al.
A fuzzy Apriori Tid mining algorithm with reduced computational time
Applied Soft Computing Journal
(2004)
M.J. Huang et al.
Integrating fuzzy data mining and fuzzy artificial neural networks for discovering implicit knowledge
Knowledge-Based Systems
(2006)
Y-H. Hu et al.
Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism
V. Jain et al.
A new approach for evaluating agility in supply chains using fuzzy association rules mining
Engineering Applications of Artificial Intelligence
(2008)
F.P. Pach et al.
Compact fuzzy association rule-based classifier
Expert Systems with Applications
(2008)
M. Toloo et al.
A new method for ranking discovered rules from data mining by DEA
Expert Systems with Applications
(2009)
Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceeding of...
Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining association rules between sets of items in large databases. In...

Ashish, M. & Vikramkumar, P. (2010). FPrep: Fuzzy clustering driven efficient automated pre-processing for fuzzy...

Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. ISBN...

C.H. Chen et al.

Cluster-based evaluation in fuzzy-genetic data mining

IEEE Transactions on Fuzzy Systems

(2008)

Frank, A. & Asuncion, A. (2010). UCI machine learning repository [http://archive.ics.uci.edu/ml], University of...

Cited by (37)

Profile-based assessment of diseases affective factors using fuzzy association rule mining approach: A case study in heart diseases
2021, Journal of Biomedical Informatics
Citation Excerpt :
For better comprehension, hereafter, the name of each method is replaced with the category of methods that it represents. For example, the term “fuzzy-without-profile method” is used to refer to the method of [12]. Partitioning performance comparison is carried out in terms of confidence value.
The existing data mining solutions to identify risk factors associated with diseases are burdened with quite a few shortcomings. They usually use crisp partitions for numerical features and also do not use patient-specific profiles. These shortcomings create limitations for solving real problems. Discretizing a numerical feature through crisp partitions can also generate substantial partitioning errors, particularly for features whose values are closer to crisp boundaries. Since the normal range of each numerical feature varies according to the age, gender, and medical conditions of the patients, then ignoring these differences can undermine the accuracy of the extracted itemsets and rules. This paper presents a profile-based fuzzy association rule mining (PB-FARM) approach for the assessment of risk factors highly correlated with diseases. The proposed approach has three phases. Phase I involves creating profiles for patients based on their age, gender, and medical conditions, to determine a normal range of each numerical feature. Then fuzzy partitioning is done for all features (namely, numerical and categorical), and consequently, a structure, called FirstScan, is created. In Phase II, the FirstScan structure is utilized to mine for large fuzzy k-itemsets. Ultimately, in Phase III, the given k-itemsets are employed to generate fuzzy rules for associations between risk factors and diseases. To evaluate the performance of the proposed method the Z-Alizadeh Sani coronary artery disease (CAD) dataset, containing 303 records and 54 features, was used. The results show a positive correlation between typical chest pain and old age with the incidence of CAD. The comparisons made in this study showed that, firstly, the proposed algorithm has a higher partitioning accuracy than other methods, and secondly, it has a reasonably short execution time.
Combining association rules mining with complex networks to monitor coupled risks
2019, Reliability Engineering and System Safety
Due to geotechnical uncertainties, existing underground infrastructure, the construction of deep-pit foundations in dense urban areas is particularly challenging as there is a propensity for building and structural settlement to occur. Recognizing the need to proactively manage safety risks during construction, a new risk analysis approach that combines complex networks and association rules mining (ARM) is proposed. An improved Apriori algorithm is developed to unearth abnormal monitoring types. Then, complex network theory is introduced to examine the characteristics of the coupled relationships existing between different types of abnormal monitoring types. This research identifies and examines complex network measures to understand the topology of settlement networks. It is revealed that settlement networks confirm to both scale-free and small-word properties indicating that risks are not random events. This new approach of combining ARM with complex network is applied to examine deep foundation pits that are constructed for a subway project in Wuhan, China. It is demonstrated that proposed approach can successfully reveal the association rules between safety risk monitoring types and the coupling of risks. Preventative actions can therefore be undertaken in advance to mitigate against potential risks that are identified from the abnormal monitoring combinations.
Assessment of corporate innovation capability with a data-mining approach: industrial case studies
2016, Computers and Industrial Engineering
Citation Excerpt :
The rules indicate that if condition “A” occurs, then condition “B” may also occur. Details on the association rules are provided in Hipp, Güntzer, and Nakhaeizadeh (2000), Zhao and Bhowmick (2003), Kotsiantis and Kanellopoulos (2006), Sowan, Dahal, Hossain, Zhang, and Spencer (2013), and Altuntas, Dereli, and Kusiak (2015). Association rules are widely used tools in data mining.
The interest in assessment of innovation capability of manufacturing systems is fueled by the growing competition. At this time, there is no generally accepted model to evaluate innovation capability of manufacturing systems. In this paper, a fuzzy-logic based data-mining approach is proposed to assess innovation capability of manufacturing systems. The proposed algorithm is illustrated with two industrial case studies representing two different industry sectors. The results derived from these case studies demonstrate advantages of the proposed algorithm in assessing corporate innovation capability.
Association rules mining based analysis of consequential alarm sequences in chemical processes
2016, Journal of Loss Prevention in the Process Industries
In the context of industrial alarm rationalization, the analysis of consequential alarms is helpful for finding out root alarms so as to avoid alarm flooding. Motivated by this idea, this paper introduces a weighted fuzzy association rules mining approach to discovering correlated alarm sequences. Combining fuzzy sets, Apriori algorithms and alarm time series analysis, the algorithm does not search the entire item sets to find out root causes of consequential alarms. Furthermore, by transforming the association rules into fuzzy-driven causal knowledge bases and establishing the compatible fuzzy inference mechanism, a rationalized alarm topology is eventually created. Experimental results of a chemical plant show that the novel approach taking advantage of fuzzy inferences and data mining strategies is potentially effective to remove redundant alarm sequences.
A Cognitively Confidence-Debiased Adversarial Fuzzy Apriori Method
2024, IEEE Transactions on Fuzzy Systems
Machine Learning Aided Numerical and Experimental Investigation of Hydrodynamic Performance in the Circulating Fluidized Bed Boiler
2024, Journal of Thermal Science and Engineering Applications

View all citing articles on Scopus

View full text

Fuzzy association rule mining approaches for enhancing prediction performance

Highlights

Abstract

Introduction

Section snippets

The FCM–Apriori model

Data sets for the case studies

The FCM–Apriori model

Conclusion

Journal of Applied Soft Computing

Expert Systems with Applications

Applied Soft Computing Journal

Knowledge-Based Systems

Engineering Applications of Artificial Intelligence

Expert Systems with Applications

Expert Systems with Applications

Cluster-based evaluation in fuzzy-genetic data mining

IEEE Transactions on Fuzzy Systems