An efficient algorithm for unique class association rule mining

doi:10.1016/j.eswa.2020.113978

Expert Systems with Applications

Volume 164, February 2021, 113978

https://doi.org/10.1016/j.eswa.2020.113978 Get rights and content

Highlights

•
Unique patterns extraction of datasets.
•
Efficient and complete search for Class-based association rules CARs.
•
Performance of extracting CARs based on the Subsumption and Nonsense hypotheses.
•
Building rule-spaces and Ranking of datasets.

Abstract

Association rule mining is one of the main means in Knowledge discovery and Machine learning. Such kind of rules present knowledge of interrelations among items in a dataset. Class Association Rules (CARs) are a subset of association rules which are always mined using labeled datasets. Simply, a typical CAR has an itemset that is associated to a class label. Mining CARs is vital for construction of pattern or rule-based classification models and has received recently increasing research interest. In this work, a complete efficient but not exhaustive CAR mining algorithm (UniqAR) is introduced. UniqAR generates always and only $100 %$ accurate CARs which are called unique association rules using two rule search hypothesis of Subsumption and Nonsense to find unique itemsets in order to generate the Unique CARs. Unlike alternatives of CAR mining algorithms, UniqAR mined association rules aren’t based on itemset frequency or item selectivity. It can generate both frequent and rare association rules. No preferences of support, coverage, or item participant in itemsets are required to be provided for the proposed mining process. The main contribution of this work to CARs’ state of the art is describing unique itemsets and class association rules and providing an efficient mining process for them. Unlike the other unique rule mining alternatives in the literature, the proposed novel mining process depends on a complete but not exhaustive search that employs rules inter-relations. UniqAR has been modeled with computational analysis and extended evaluation. It is shown that UniqAR can extract all unique itemsets for unique association mining with no need to setup any user preferences, template or any constraints. Moreover, it describes accurately the effects of different dataset criteria like number of attributes/features, feature values, cases, and class labels on UniqAR unique itemset extraction mining process in an efficient way that avoids a huge number of itemsets/cases comparisons. Results have shown that the proposed UniqAR algorithm is feasible and promising.

Introduction

Class Association Rules CARs represent an interesting subset of association rules. A CAR implies always a class label as a consequent of itemset combination. CARs are vital in several applications and domains (Nguyen, Nguyen, Vo, & Pedrycz, 2016). In General, according to the literature, mining for high accuracy CARs that satisfy several constraints of frequency and accuracy represent the objective behind several CAR mining process. Higher accuracy of the generated CARs ensures a better application performance. Mining the highest (only: $100 %$ ) accurate CARs or the unique CARs represent the main motivation of this work.

Datasets consist of many cases, instances, objects or records which can be seen as combinations or of feature values/items in itemsets. Theses itemsets are mostly examined individually and in combinations in order to generate expressive interesting patterns like in frequent and rare pattern mining, and CARs (Han & Micheline Kamber, 2012). Itemsets which contain unique patterns that happen always with only one specific class label are unique itemsets. Unique itemsets are the raw material in generation of unique CARs which have an important role in several further analytic (e.g. Classification, Clustering, Data Profiling and Semantic Annotation). Nevertheless, availability of these kind of rules can measure the difficulty or the challenge degree of a specific dataset for supervised mining processes like Classification. It means that the higher number of available unique itemsets and then unique CARs, the easier expected training process is. Since, any of the unique association rules leads always to one specific class label, its accuracy is always 100%.

Mining unique itemsets in unlabeled datasets, where unique itemset happens only once, has received a significant research interest (Papenbrock & Naumann, 2017). In labeled datasets, CARs are mostly generated based on frequency (or minimum support) and itemset participation or selectivity constraints. Unique CARs represent unique itemsets that always happens with respect to a specific class label. A higher itemset support ensures a better confidence and coverage of the generated association rule as well. However, higher achieved confidence and coverage, no guarantee can be ensured for rule accuracy in mining for class labeled association rules. Generating the complete possible subsets of each object in a specific dataset, then comparing these subsets to all other objects in this dataset can ensure extraction of all unique itemsets and then the unique association rules but with an exponential complexity. The efficient generation of unique itemsets (then unique CARs) is always considered to be an NP-hard problem. Therefore, approximation (Nguyen et al., 2016), heuristic and stochastic (Wei, Leck, & Link, 2018) search methods represent an alternative, in spite of providing full search methods for unique pattern mining are proposed.

This work introduces a novel efficient algorithm for Unique Class Association Rule (itemsets) Mining (UniqAR) in labeled datasets. UniqAR avoids being exhaustive based on itemsets and rules inter-relations mining process. Such a mining process is an elementary step in generation of unique association rules. Rules have several inter-relations. They may contradict, complement, overlap and/or subsume each other (Vo, Le, Coenen, & Hong, 2014).

Rule Subsumption and Nonsense itemset filtration are item and rule inter-relationship properties which have a great impact in reducing the search computations for unique itemset patterns. Subsumption, from a rule representation point of view, means that shorter rules in terms of number of items can represent or subsume longer itemsets. No need to use any subsumed itemest in generating longer itemsets. The Subsumption property leads for minimal rule representations from a semantics perspective (Borgida & Patel-Schneider, 1994). In other words, unique rules which have smaller number of conditions represent simpler and more interpreted implications. Therefore, the proposed search is oriented to trace shorter or minimal itemsets because of considering the Subsumption criteria. This leads to find a minimal subset of the possible unique rules inside a dataset. This subset can efficiently represent and describe the whole unique rule domain of a dataset from a perspective of CARs.

The nonsense property refers to the ability of a shorter itemset to filter/indicate (be contained in) a number of cases in a dataset. If a longer itemset (longer in terms of a higher number of items than the shorter one and contains it) is observed in a similar number of cases as the shorter itemset (subset of the longer one), then the longer one is a nonsense itemset and no need to use any nonsense itemest in generating longer unique itemsets. One of the main contributions of this work is introducing the nonsense property. This property serves in keeping the proposed search focus on finding a minimal unique rule subset out of a specific dataset. Since the mined rules are presented as in an implication form of items’ combinations (conditions) and a class label, adding more conditions should has a specification effect. The specification effect should lead to decrease the number of cases which a rule can be observed in. If this specification effect can’t be achieved. This means that the last added condition has nonesense effect. Nevertheless, the conditions (items) in the unique rule before adding the nonsense condition are sufficient to specify the same cases The proposed novel algorithm uses these properties to present a complete but not exhaustive search for the unique patterns based on the rule properties.

This paper is organized as follows: In Section 2, a brief on the related work is introduced which positions this work contribution to CARs’ state of the art. Section 3 introduces the frequently used terminologies and concepts of this work. The two main hypotheses of Subsumption and Nonsense are presented and discussed in Section 4. UniqAR is fully described in Section 5. A computational analysis of UniqAR is developed in Section 6. UniqAR criteria are provided and discussed in Section 7. Section 8 presents an extensive Evaluation and discussions for UniqAR using 12 different datasets. Finally, conclusions are drawn in Section 9.

Section snippets

Related work

Association rule mining can be categorized based on the itemsets as a main input in rule induction. Finding association rules based on the interesting (e.g. frequent and rare) itemsets in unlabeled datasets is one of the classical unsupervised machine learning approaches in data mining. The well-known algorithms like Apriori (Agrawal et al., 1994), FP-growth (Han, Pei, Yin, & Mao, 2004), and ECLAT (Zaki, Parthasarathy, Ogihara, & Li, 1997) and their derivatives have introduced efficient

Dataset, itemset and unique class association rules

The frequently used terminology and concepts in this work are introduced here. A dataset (DS) is a collection of related discrete cases of data that may be accessed individually or in combinations and contains cases (objects or records) with different attributes. These attributes represents features. Each feature has a set of mutual feature values which form the different itemsets in the different cases. For instance in the Weather dataset¹

Subsumption and nonsense hypotheses

This work draws two milestone hypotheses of UniqAR as a minimal unique CAR mining algorithm. The following subsections introduce both Subsumption and Nonsense proposed hypotheses.

The proposed efficient algorithm for unique class association rule mining (UniqAR)

The proposed algorithm, UniqAR, introduces the idea of skipping or pruning both subsumption and nonsense to build minimal unique itemsets for class association. In a sequential bottom-up approach starting from singular feature-values (items) and building all the possible feature values combinations or itemsets, UniqAR tests the generated itemsets against uniqueness.

So, UniqAR itemset generation is a sequential forward generation process. The itemsets are generated with respect to the length in

Computational analysis

The upper bound of finding all itemset combinations of a given object/case (x), excluding the empty set, considering (n) features/items is ( $2^{n} - 1$ ). Per each of these formed itemset combinations, UniqAR compares it to all of the other objects/cases ( $M - 1$ ), so the maximum number of subsets/combinations (itemsets) of all objects/cases $C_{X}$ is: $C_{X} = M \times (2^{n} - 1)$

Where X represents the set of all cases in a dataset ( $x \in X$ ). $C_{X}$ can be decomposed according to the level (L) of the itemset combinations as follows: $C_{X} = M$

Criteria of the proposed algorithm

This section introduces a spot on the proposed algorithm criteria of complexity, efficiency, Rule Space, Dataset Ranking, and relations to Outliers.

Evaluation

In this work, UniqAR has been applied using several datasets. Most of these datasets are published online and being frequently used in the literature. Other datasets are collected from different sources.

All of the used datasets, their sources, the applied preprocessing (e.g. discretization), and detailed description are available online ². The following table, Table 2, introduces a brief information of these datasets. As it shows a variety of

Conclusions

This work adds a new contribution to Class Association Rule induction state of the art which is an efficient algorithm for mining the minimal unique CAR mining algorithm. the proposed algorithm, UniqAR, can discover all of the unique CARs in reasonable time based on two hypothesis of Subsumption and nonsense. A formal description for the proposed algorithm has been introduced. Results of an extensive evaluation have shown that: There is always plenty of unique itemsets in different datasets in

CRediT authorship contribution statement

Mahmoud Nasr: Conceptualization, Investigation, Software. Mohamed Hamdy: Conceptualization, Software, Visualization, Writing - original draft. Doaa Hegazy: Conceptualization, Validation, Writing - review & editing. Khaled Bahnasy: Conceptualization, Validation, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (34)

G. Chen et al.
A new approach to classification based on association rule mining
Decision Support Systems
(2006)
D.R. Morrison et al.
Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning
Discrete Optimization
(2016)
D. Nguyen et al.
Efficient mining of class association rules with the itemset constraint
Know.-Based Systems
(2016)
D. Nguyen et al.
Efficient strategies for parallel mining class association rules
Expert Systems with Applications
(2014)
D. Nguyen et al.
Ccar: An efficient method for mining class association rules with itemset constraints
Engineering Applications of Artificial Intelligence
(2015)
S. Parkinson et al.
Auditing file system permissions using association rule mining
Expert Systems with Applications
(2016)
B. Vo et al.
Mining erasable itemsets with subset and superset itemset constraints
Expert Systems with Applications
(2017)
Abedjan, Z., & Naumann, F. (2011). Advancing the discovery of unique column combinations. In Proceedings of the 20th...
Z. Abedjan et al.
Detecting unique column combinations on dynamic data
Agrawal, R., Srikant, R. et al. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very...

P.S. Bala et al.

Q-genesis: Question generation system based on semantic relationships

J.L. Balcázar et al.

Evaluation of association rule quality measures through feature extraction

S. Baset et al.

Object-oriented modeling with ontologies around: A survey of existing approaches

International Journal of Software Engineering and Knowledge Engineering

(2018)

Bay, V., & Bac, L. (2008). A novel classification algorithm based on association rules mining. In Pacific Rim Know....

A. Borgida et al.

A semantics and complete algorithm for subsumption in the classic description logic

Journal of Artificial Intelligence Research

(1994)

H. Cheng et al.

Approximate frequent itemset mining in the presence of random noise

J. Han et al.

Data mining concepts and techniques

(2012)

Cited by (12)

Adaptive fuzzy-evidential classification based on association rule mining
2024, Information Sciences
As one of the most promising classification approaches, association classification (AC) integrates data classification and association discovery techniques for generating a compact set of classification association rules. Recently, the fuzzy set and evidence theories are successively applied into AC in order to improve the classification performance in terms of accuracy and interpretability. However, from the perspectives of applicability and universality, there still exists two important issues in the current AC framework. On one hand, several key parameters, such as the number of fused rules in classification and the minimum support threshold in association discovery, are difficult to be accurately predefined in practice. On the other hand, the fixed grid-based fuzzy partition is not benefit to adapt for those datasets with large number of features. In this paper, an association rule-based adaptive fuzzy-evidential classification framework (AR-AFEC) is developed for overcoming the above limitations. To do so, an optimal rule fusion strategy and a dynamic minimum support threshold setting scheme are proposed for adaptively learning the parameters during classification and association mining respectively. In addition, an entropy-based trapezoidal fuzzy partition technique is proposed to adaptively obtain the fuzzy sets defined on each continuous feature domain. Experiments on 26 benchmark datasets and a human activity recognition application demonstrate that the proposal can achieve better accuracy than some state-of-the-art rule-based classification approaches, using less rules with more general structure.
Mine-first association rule mining: An integration of independent frequent patterns in distributed environments
2024, Decision Analytics Journal
Association rule mining is a widely used data mining technique in various domains. It enables the identification of trends, frequent patterns, and relationships among the data. This study introduced a new method for mining association rules independently from multiple data sources. It combined the frequent patterns obtained from each data source to discover frequent patterns applicable across the distributed environment. The model can also be extended to generate the rules with the specified target. The proposed method’s performance is compared to that of the traditional association rule mining method. The experimental results demonstrate that while the generated rules may not be identical to those produced by the traditional method, the proposed model offers better transparency and memory utilization in association rule generation. In addition, the model uncovers meaningful relationships, allowing decision-makers to access the frequent patterns for the individual data sources and the entire data across the environment.
An animal dynamic migration optimization method for directional association rule mining
2023, Expert Systems with Applications
Citation Excerpt :
Some researchers have considered the relative importance of items as well as the frequency and applied a weighted function or associative classification to reduce the invalid and unnecessary item set (Gan et al., 2017; Lin, Gan, Fournier-Viger, Hong, & Tseng, 2016; Shao et al., 2020; Song & Lee, 2017). Furthermore, some necessary but infrequent rules can be mined by some new methods (Borah & Nath, 2018; Nasr, Hamdy, Hegazy, & Bahnasy, 2021), and the fuzzy rule has also been considered in recent research to handle complex non-binary data. Lin et al. (2017) proposed an multiple fuzzy frequent itemsets mining (MFFI-Miner)algorithm to find fuzzy rules without candidate generation.
In the area of association rule mining, many optimization algorithms have been proposed to improve the computational efficiency of rule mining or the quality and diversity of association rules. However, in real applications, since the user may have prior knowledge and research trends for some key items, the association rules containing key items are more valuable and meaningful for these users. This contributes to a new issue that association rules related to key items should be mined in a targeted manner. To solve this issue, this paper proposes a novel animal dynamic migration optimization (ADMO) method to realize directional rule mining as well as maintain high mining efficiency and high rule quality. Taking the support and confidence of frequent itemsets as input, the method first identifies valuable rules and then initializes and updates the animal population to search for the best animal. The support and confidence of the best animal are defined as threshold values to delete unnecessary rules and discover more key rules. During the optimization, the population size value is dynamically generated. The effectiveness and applicability of ADMO are validated on 11 open-source datasets and a real-world elevator case. The results reveal that the ADMO method has a faster mining speed and obtains more key rules than the ARM-PSO, ARM-AMO, ARM-MOPSO, ARM-WOA, and ARM-DE methods. In the elevator case, the association rule generated by ADMO can provide a higher success rate and accuracy for requirement transformation.
Verifiable privacy-preserving association rule mining using distributed decryption mechanism on the cloud
2022, Expert Systems with Applications
Citation Excerpt :
For outsourcing analysis of supermarket shopping data, insecure mining will undoubtedly lead to the leakage of corporate transaction data, thereby harming corporate interests. Therefore, it is particularly important to focus on preserving privacy in the context of the ARM scheme (Altay & Alatas, 2021; Liao et al., 2019; Nasr et al., 2021; Ruan et al., 2019). At present, many privacy-preserving schemes are based on homomorphic encryption (HE), e.g., Liu et al. (2018), Pang and Wang (2021) used the double decryption mechanism of Bresson–Catalano–Pointcheval (BCP) cryptosystem (Bresson et al., 2003) to realize multi-key PPARM.
As one of the important ways in data mining, the association rule mining is to analyze the correlation of transactions based on massive data and mine the hidden valuable information. However, excessive data collection and analysis might lead to the privacy leakage of user data and the damage of data integrity. Meanwhile, in the existing privacy-preserving schemes, the cloud servers such as the evaluator have strong decryption capabilities, which causes active attacks easily, or data reliability in mining and analysis is not considered. In response to these problems, this paper proposes a verifiable privacy-preserving association rule mining scheme (VPPARM) using distributed decryption mechanism on the cloud. First, the scheme adopts distributed decryption to complete the data mining tasks through the dual-cloud servers, which weakens the decryption ability of the cloud servers and prevents the server from active attacks. Secondly, to protect the privacy of the association rule mining process, our scheme adopts adding virtual transactions, permutations, and random number masking collaboratively to hide the data in the whole mining process. In addition, to eliminate the hidden danger of illegal users, our scheme adopts a verifiable short digital signature scheme to verify data integrity and ensure data reliability, avoiding the poisoning attack. Finally, through the performance evaluation, the results demonstrate that our scheme realizes correctness, security, reliability with lower communication and computation costs and improves efficiency to a certain extent.
A fast algorithm for mining temporal association rules in a multi-attributed graph sequence
2022, Expert Systems with Applications
Citation Excerpt :
The concept of association rules was first proposed by Agrawal et al. in 1993 (Agrawal, Imielinski, & Swami, 1993). Association rules reveal associations in a transactional database (Antonello et al., 2021; Bernal Baró et al., 2020; Delgado-Osuna, García-Martínez, Gómez-Barbadillo, & Ventura, 2020; Geng, Liang, & Jiao, 2020; Nasr, Hamdy, Hegazy, & Bahnasy, 2021; Shabtay, Fournier-Viger, Yaari, & Dattner, 2020; Zhang & Shi, 2020), but do not reflect the temporal associations. Therefore, people became interested in temporal association rules and sequential patterns later.
In real life, there exist a lot of attributed graphs each of which contains attribute information as well as structural information. As time goes on, a group of attributed graphs form an attributed graph sequence. Being the generalization of single-attributed graph sequences, multi-attributed graph sequences are arising vastly and quickly. Mining the temporal associations hidden in a multi-attributed graph sequence is in urgent need from data owners. To meet the need and fill the gap of research on mining such kind of temporal associations, we first give a definition of temporal association rules for describing temporal associations in a multi-attributed graph sequence, and then propose a fast algorithm for mining temporal association rules in a multi-attributed graph sequence which is based on the anti-monotonicity of support. The proposed algorithm is designed in two steps, namely finding frequent temporal association rules and verifying the credibility of these rules. Equipped with two novel joining and pruning strategies, the proposed algorithm exhibits much higher efficiency which is specially pursued in the process of rule mining. Experiments performed on synthetic datasets and real datasets show that the proposed algorithm is effective and more efficient than other existing algorithms.
FR-Tree: A novel rare association rule for big data problem
2022, Expert Systems with Applications
Citation Excerpt :
However, it is not efficient because the high-utility itemset has lower support (Liu, Feng, Wang, & Tayi, 2018). Class ARM is a widely used technique in real-world mining applications where the output is integrated into the classification process for class prediction purposes (Mangat & Vig, 2014; Nasr, Hamdy, Hegazy, & Bahnasy, 2021; Nguyen, Nguyen, Vo, & Hong, 2015). The last pattern type method discovers infrequent association rules.
In some situations, finding the rare association rule is of higher importance than the frequent itemset. Unique rules represent rare cases, activities, or events in real-world applications. It is essential to extract exceptional critical activity from vast routine data. This paper proposes a new algorithm called FR-Tree to mine the association rules and produce essential rules. This work aims to demonstrate that this algorithm is suitable for extracting rare association rules with high confidence. The proposed algorithm generates, filters, and classifies the all-important rules, either frequent or rare. The rare rules were produced without needing to set an additional threshold. Therefore, the proposed algorithm has an advantage incomparable with the other rare association rule techniques. The generated rules were tested using well-known datasets, and the performance was compared with the other rare association rule techniques. The results proved that our method outperformed the existing rare association rule techniques.

View all citing articles on Scopus

View full text

An efficient algorithm for unique class association rule mining

Highlights

Abstract

Introduction

Section snippets

Related work

Dataset, itemset and unique class association rules

Subsumption and nonsense hypotheses

The proposed efficient algorithm for unique class association rule mining (UniqAR)

Computational analysis

Criteria of the proposed algorithm

Evaluation

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Decision Support Systems

Discrete Optimization

Know.-Based Systems

Expert Systems with Applications

Engineering Applications of Artificial Intelligence

Expert Systems with Applications

Expert Systems with Applications

Detecting unique column combinations on dynamic data

Q-genesis: Question generation system based on semantic relationships

Evaluation of association rule quality measures through feature extraction

Object-oriented modeling with ontologies around: A survey of existing approaches

International Journal of Software Engineering and Knowledge Engineering

A semantics and complete algorithm for subsumption in the classic description logic

Journal of Artificial Intelligence Research

Approximate frequent itemset mining in the presence of random noise

Data mining concepts and techniques