Privacy-preserving in association rule mining using an improved discrete binary artificial bee colony
Introduction
Association rule mining aims to extract the correlations between different items in a transaction data (Agrawal, Imielinski, & Swami, 1993). Despite the benefits of association rule mining for businesses and organizations, it poses a major threat to privacy when data is shared (Amiri, 2007). In this situation, a data recipient can obtain sensitive information using data mining techniques. For instance, in the data outsourcing on the cloud, each data owner may not be willing to disclose their own private information to others during the cooperative data mining process (Wu et al., 2019). Thus, before releasing the data, sensitive information is protected through data perturbation. To protect the confidentiality of sensitive knowledge, some transactions are sanitized (Divanis & Verykios, 2010). This process is known as Association Rule Hiding (ARH) or data sanitization. There are two types of patterns in ARH problem: sensitive rules that are selected by the user and non-sensitive rules that are the remained rules. ARH is performed by decreasing the support/confidence level of the sensitive rules below the thresholds. This process has some side effects on the non-sensitive rules so that the utility of the modified data is reduced (Telikani & Shahbahrami, 2017). An optima ARH solution should hide all sensitive association rules with minimum side effects (Telikani & Shahbahrami, 2015). Therefore, ARH is an NP-hard problem (Atallah, Bertino, Elmagarmid, Ibrahim & Verykios, 1999).
Evolutionary computation strategies are a new trend for ARH, in which the hiding problem is encoded to a population of solutions. Each solution includes a set of sensitive transactions required for sanitization. The fitness function is designed by considering the side effects of the sanitization process. Some evolutionary approaches such as GA (Lin, Hong, Yang & Wang, 2015), PSO (Lin et al., 2016), cuckoo search algorithm (Afshari, Dehkordi & Akbari, 2016), and ant colony system (Wu, Zhan & Lin, 2017) have been adopted. However, these approaches have three challenges: (1) they often converge to local minima due to the nature of their local decisions. Therefore, a global best solution cannot be found in the search space and inappropriate transactions are selected for sanitization. (2) The evolutionary ARH approaches have solely been developed for the itemset hiding, so that the side effects are maximized if they are applied to the rule hiding problem. (3) Long solutions are designed that poses a major challenge when faced with a large number of transactions, leading to efficiency reduction and low exploration.
In order to solve the above-mentioned problems, our outstanding contributions consist of the following:
- (1)
We propose an improved binary version of the ABC algorithm to be applied for selecting an appropriate set of sensitive transactions for sanitization in ARH process. To achieve this goal, we enhance Discrete binary ABC (DisABC) (Kashan, Nahavandi & Kashan, 2012), which is a binary version of ABC based on the Jaccard coefficient similarity measure. In this paper, DisABC is improved using two modifications to provide a balance between exploration and exploitation. These enhancements include choosing two neighbor solutions instead of one individual in employee bee phase and determining an accurate neighbor instead of the random solution in onlooker bee phase. We called the advanced DisABC as Improved Binary ABC (IBABC).
- (2)
We present a binary encoding to generate shorter solutions, unlike other ARH algorithms that consider all sensitive transactions in a solution. We try to reduce the size of solutions as much as possible by considering only common sensitive transactions.
- (3)
IBABC is integrated into our new ARH algorithm, named ABC4ARH, to select the best set of the sensitive transactions. ABC4ARH aims to modify sensitive transactions instead of deleting the entire transactions. In this regard, it chooses some items as victims based on a heuristic, in which items with the highest frequency in the sensitive rules and the lowest frequency in the non-sensitive rules are considered.
- (4)
The performance of ABC4ARH algorithm is demonstrated through a set of experiments using real datasets under three test cases. On the other hand, IBABC is evaluated using two binary problems, which include 0–1 knapsack problem and Uncapacitated Facility Location Problem (UFLP). The results show that ABC4ARH outperforms the other similar algorithms in most cases in terms of the data quality and the search-ability.
The rest of the paper is organized as follows: Section 2 provides background information on ARH and ABC along with reviews related works. Section 3 describes the improved DisABC algorithm in details. In Section 4, a novel ARH algorithm is introduced by applying the new binary ABC mechanism in the transaction selection phase aiming at minimizing the side effects. Section 5 details the two binary problems and the real datasets for experiments. Section 6 presents the experimental results and discussions. Finally, Section 7 concludes the paper.
Section snippets
Background information
This section first provides background information on the association rule mining and ARH. Then, we review related works in ARH. Finally, we briefly describe ABC and DisABC algorithms.
IBABC: improved binary ABC
DisABC (Kashan et al., 2012) is one of the suitable evolutionary algorithms for binary problems due to its simplicity and novelty. Although DisABC effectively solves small- and medium-sized problems, it suffers from less exploitation for high dimensional problems (Ozturk et al., 2015a). When applying DisABC in high dimensional problems such as ARH, in which the size of the solutions depends on the number of sensitive transactions, the performance of hiding algorithm is decreased. Therefore, it
ABC4ARH: Artificial Bee Colony for Association Rule Hiding
Recently, the adoption of evolutionary computation for ARH problem has attracted more attention than heuristic approaches. One of the most common ways is to modify some parts of an evolutionary algorithm via internal or external forces. In this section, we propose a rule hiding algorithm based on IBABC for effectively hiding the sensitive association rules by transaction modification. In ABC4ARH, on the one hand, the concept of binary ABC is applied to optimally find a set of transactions for
Experimental setup
To evaluate the performance of our proposed neighborhood generation mechanism and ABC4ARH algorithms, experimental settings and different benchmarks are described. The first three subsections, 5.1 to 5.3, provide information on the requirements for side effect evaluation. Sections 5.4 and 5.5 describe the 0–1 knapsack problem and UFLP to show the effectiveness of the new ABC algorithm in terms of global search ability and scalability.
Experimental results
In all experiments, in order to have a fair comparison, the standard parameter settings such as population size and MCN are set as 50 and 2000, respectively. In each run, solutions with the best fitness value are selected to solve the problem. To provide a better evaluation, each experiment is performed 30 times with random seeds and the mean best values produced by the algorithms are considered. As recommended for ABC (Karaboga & Akay, 2009), the limit value for IBABC is chosen to be SN*D,
Conclusion and future works
This paper presents a privacy-preserving algorithm based on the concept of Artificial Bee Colony (ABC), called ABC for Association Rule Hiding (ABC4ARH), to protect sensitive association rules. To improve the evolution process, IBABC was proposed with three improvements in the phases of initialization, employee bee, and onlooker bee. In the initialization step, the solution size and the how-to initialize "1" bits are determined based on the values specified by the preprocessing phases. Employee
CRediT authorship contribution statement
Akbar Telikani: Conceptualization, Methodology, Data curation, Formal analysis, Investigation, Project administration, Resources, Visualization, Writing - original draft. Amir H. Gandomi: Supervision, Validation, Writing - review & editing. Asadollah Shahbahrami: Supervision, Validation, Writing - review & editing. Mohammad Naderi Dehkordi: Conceptualization, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (55)
- et al.
Association rule hiding using cuckoo optimization algorithm
Expert Systems with Applications
(2016) - et al.
A modified artificial bee colony algorithm for real-parameter optimization
Information Sciences
(2012) Dare to share: Protecting sensitive knowledge with data sanitization
Decision Support Systems
(2007)- et al.
JayaX: Jaya algorithm with xor operator for binary optimization
Applied Soft Computing
(2019) - et al.
An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem
SIAM Journal on Computing
(2010) - et al.
Using TF-IDF to hide sensitive itemsets
Applied Intelligence
(2013) - et al.
DisABC: A new artificial bee colony algorithm for binary optimization
Applied Soft Computing
(2012) - et al.
A discrete binary version of the particle swarm algorithm
- et al.
XOR-based artificial bee colony algorithm for binary optimization
Turkish Journal of Electrical Engineering & Computer Sciences
(2013) - et al.
MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining
Advanced Engineering Informatics
(2007)
The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion
Applied Intelligence
Lossless and robust privacy preservation of association rules in data sanitization
Cluster Computing
Hiding sensitive frequent itemsets by a border-based approach
Computing.Science and Engineering
Maintenance of sanitizing informative association rules
Expert Systems with Applications
Hiding informative association rule sets
Expert Systems with Applications
Hiding collaborative recommendation association rules
Applied Intelligence
SecEDMO: Enabling efficient data mining with strong privacy protection in cloud computing
IEEE Transactions on Cloud Computing
Mining association rules between sets of items in large databases
Fast algorithms for mining association rules in large databases
Disclosure limitation of sensitive rules
Memetic binary particle swarm optimization for discrete optimization problems
Information Sciences
Privacy preservation through a greedy, distortion-based rule-hiding method
Applied Intelligence
An integer programming approach for frequent itemset hiding
Hiding sensitive knowledge without side effects
Knowledge and Information Systems
Association rule hiding for data mining
Advances in frequent itemset mining implementations: Report on FIMI’03
Cited by (31)
An edge-aided parallel evolutionary privacy-preserving algorithm for Internet of Things
2023, Internet of Things (Netherlands)A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :The proposed hybrid approach was compared with the ABC algorithm and it was reported that it was more effective than the ABC algorithm in fuzzy rule selection. Telikani et al. (2020) proposed a novel rule hiding algorithm and an improved binary ABC (IBABC) algorithm. ABC4ARH was obtained by combining both algorithms.
H-FHAUI: Hiding frequent high average utility itemsets
2022, Information SciencesVerifiable privacy-preserving association rule mining using distributed decryption mechanism on the cloud
2022, Expert Systems with ApplicationsCitation Excerpt :Randomization-based solutions (Du et al., 2020; Hussien et al., 2013) generally use disturbance methods to implement PPARM (Rizvi & Haritsa, 2002) and association rule hiding (ARH) (Gkoulalas-Divanis & Verykios, 2010). ARH is a process of hiding sensitive knowledge using data transformation (Telikani et al., 2020), which can hide the sensitive association rule. The main idea of these disturbance methods is to mask the original data with a small random noise without changing the characteristics of the data.
An evolutionary adaptive neuro-fuzzy inference system for estimating field penetration index of tunnel boring machine in rock mass
2021, Journal of Rock Mechanics and Geotechnical EngineeringHigh-performance implementation of evolutionary privacy-preserving algorithm for big data using GPU platform
2021, Information SciencesCitation Excerpt :If the pattern is not hidden by the modification of the victim item, the victim item selection step re-runs again for the rest of the items. ABC4ARH algorithm exhibits high performance in terms of reducing side effects compared with the state-of-the-art PPARM algorithms [12]. However, its efficiency decreases when dealing with very large datasets.