Privacy-preserving in association rule mining using an improved discrete binary artificial bee colony

https://doi.org/10.1016/j.eswa.2019.113097Get rights and content

Highlights

  • A new algorithm is developed for hiding sensitive rules using a binary ABC approach.

  • We improve the binary ABC to optimize the exploitation for rule hiding.

  • We apply the improved swarm algorithm for selecting sensitive transactions.

  • Various experiments are carried out to verify the performance of our algorithms.

Abstract

Association Rule Hiding (ARH) is the process of protecting sensitive knowledge using data transformation. Although there are some evolutionary-based ARH algorithms, they mostly focus on the itemset hiding instead of the rule hiding. Besides, unstable convergence to the global optimum solution and designing long solutions make them inappropriate in reducing side effects. They use the basic versions of evolutionary approaches, resulting in inappropriate performance in ARH domain where the search space is large and the algorithms easily get trapped in local optima. To deal with these problems, we propose a new rule hiding algorithm based on a binary Artificial Bee Colony (ABC) approach which has good exploration. However, we improve the binary ABC algorithm to enhance its poor exploitation by designing a new neighborhood generation mechanism to balance between exploration and exploitation. We called this algorithm Improved Binary ABC (IBABC). IBABC approach is coupled with our proposed rule hiding algorithm, called ABC4ARH, to select sensitive transactions for modification. To choose victim items, ABC4ARH formulates a heuristic. The performance of ABC4ARH algorithm on the side effects is demonstrated using extensive experiments conducted on five real datasets. Furthermore, the effectiveness of IBABC is verified using the uncapacitated facility location problem and 0–1 knapsack problem.

Introduction

Association rule mining aims to extract the correlations between different items in a transaction data (Agrawal, Imielinski, & Swami, 1993). Despite the benefits of association rule mining for businesses and organizations, it poses a major threat to privacy when data is shared (Amiri, 2007). In this situation, a data recipient can obtain sensitive information using data mining techniques. For instance, in the data outsourcing on the cloud, each data owner may not be willing to disclose their own private information to others during the cooperative data mining process (Wu et al., 2019). Thus, before releasing the data, sensitive information is protected through data perturbation. To protect the confidentiality of sensitive knowledge, some transactions are sanitized (Divanis & Verykios, 2010). This process is known as Association Rule Hiding (ARH) or data sanitization. There are two types of patterns in ARH problem: sensitive rules that are selected by the user and non-sensitive rules that are the remained rules. ARH is performed by decreasing the support/confidence level of the sensitive rules below the thresholds. This process has some side effects on the non-sensitive rules so that the utility of the modified data is reduced (Telikani & Shahbahrami, 2017). An optima ARH solution should hide all sensitive association rules with minimum side effects (Telikani & Shahbahrami, 2015). Therefore, ARH is an NP-hard problem (Atallah, Bertino, Elmagarmid, Ibrahim & Verykios, 1999).

Evolutionary computation strategies are a new trend for ARH, in which the hiding problem is encoded to a population of solutions. Each solution includes a set of sensitive transactions required for sanitization. The fitness function is designed by considering the side effects of the sanitization process. Some evolutionary approaches such as GA (Lin, Hong, Yang & Wang, 2015), PSO (Lin et al., 2016), cuckoo search algorithm (Afshari, Dehkordi & Akbari, 2016), and ant colony system (Wu, Zhan & Lin, 2017) have been adopted. However, these approaches have three challenges: (1) they often converge to local minima due to the nature of their local decisions. Therefore, a global best solution cannot be found in the search space and inappropriate transactions are selected for sanitization. (2) The evolutionary ARH approaches have solely been developed for the itemset hiding, so that the side effects are maximized if they are applied to the rule hiding problem. (3) Long solutions are designed that poses a major challenge when faced with a large number of transactions, leading to efficiency reduction and low exploration.

In order to solve the above-mentioned problems, our outstanding contributions consist of the following:

  • (1)

    We propose an improved binary version of the ABC algorithm to be applied for selecting an appropriate set of sensitive transactions for sanitization in ARH process. To achieve this goal, we enhance Discrete binary ABC (DisABC) (Kashan, Nahavandi & Kashan, 2012), which is a binary version of ABC based on the Jaccard coefficient similarity measure. In this paper, DisABC is improved using two modifications to provide a balance between exploration and exploitation. These enhancements include choosing two neighbor solutions instead of one individual in employee bee phase and determining an accurate neighbor instead of the random solution in onlooker bee phase. We called the advanced DisABC as Improved Binary ABC (IBABC).

  • (2)

    We present a binary encoding to generate shorter solutions, unlike other ARH algorithms that consider all sensitive transactions in a solution. We try to reduce the size of solutions as much as possible by considering only common sensitive transactions.

  • (3)

    IBABC is integrated into our new ARH algorithm, named ABC4ARH, to select the best set of the sensitive transactions. ABC4ARH aims to modify sensitive transactions instead of deleting the entire transactions. In this regard, it chooses some items as victims based on a heuristic, in which items with the highest frequency in the sensitive rules and the lowest frequency in the non-sensitive rules are considered.

  • (4)

    The performance of ABC4ARH algorithm is demonstrated through a set of experiments using real datasets under three test cases. On the other hand, IBABC is evaluated using two binary problems, which include 0–1 knapsack problem and Uncapacitated Facility Location Problem (UFLP). The results show that ABC4ARH outperforms the other similar algorithms in most cases in terms of the data quality and the search-ability.

The rest of the paper is organized as follows: Section 2 provides background information on ARH and ABC along with reviews related works. Section 3 describes the improved DisABC algorithm in details. In Section 4, a novel ARH algorithm is introduced by applying the new binary ABC mechanism in the transaction selection phase aiming at minimizing the side effects. Section 5 details the two binary problems and the real datasets for experiments. Section 6 presents the experimental results and discussions. Finally, Section 7 concludes the paper.

Section snippets

Background information

This section first provides background information on the association rule mining and ARH. Then, we review related works in ARH. Finally, we briefly describe ABC and DisABC algorithms.

IBABC: improved binary ABC

DisABC (Kashan et al., 2012) is one of the suitable evolutionary algorithms for binary problems due to its simplicity and novelty. Although DisABC effectively solves small- and medium-sized problems, it suffers from less exploitation for high dimensional problems (Ozturk et al., 2015a). When applying DisABC in high dimensional problems such as ARH, in which the size of the solutions depends on the number of sensitive transactions, the performance of hiding algorithm is decreased. Therefore, it

ABC4ARH: Artificial Bee Colony for Association Rule Hiding

Recently, the adoption of evolutionary computation for ARH problem has attracted more attention than heuristic approaches. One of the most common ways is to modify some parts of an evolutionary algorithm via internal or external forces. In this section, we propose a rule hiding algorithm based on IBABC for effectively hiding the sensitive association rules by transaction modification. In ABC4ARH, on the one hand, the concept of binary ABC is applied to optimally find a set of transactions for

Experimental setup

To evaluate the performance of our proposed neighborhood generation mechanism and ABC4ARH algorithms, experimental settings and different benchmarks are described. The first three subsections, 5.1 to 5.3, provide information on the requirements for side effect evaluation. Sections 5.4 and 5.5 describe the 0–1 knapsack problem and UFLP to show the effectiveness of the new ABC algorithm in terms of global search ability and scalability.

Experimental results

In all experiments, in order to have a fair comparison, the standard parameter settings such as population size and MCN are set as 50 and 2000, respectively. In each run, solutions with the best fitness value are selected to solve the problem. To provide a better evaluation, each experiment is performed 30 times with random seeds and the mean best values produced by the algorithms are considered. As recommended for ABC (Karaboga & Akay, 2009), the limit value for IBABC is chosen to be SN*D,

Conclusion and future works

This paper presents a privacy-preserving algorithm based on the concept of Artificial Bee Colony (ABC), called ABC for Association Rule Hiding (ABC4ARH), to protect sensitive association rules. To improve the evolution process, IBABC was proposed with three improvements in the phases of initialization, employee bee, and onlooker bee. In the initialization step, the solution size and the how-to initialize "1" bits are determined based on the values specified by the preprocessing phases. Employee

CRediT authorship contribution statement

Akbar Telikani: Conceptualization, Methodology, Data curation, Formal analysis, Investigation, Project administration, Resources, Visualization, Writing - original draft. Amir H. Gandomi: Supervision, Validation, Writing - review & editing. Asadollah Shahbahrami: Supervision, Validation, Writing - review & editing. Mohammad Naderi Dehkordi: Conceptualization, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (55)

  • C.W. Lin et al.

    The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

    Applied Intelligence

    (2015)
  • G.S. Navale et al.

    Lossless and robust privacy preservation of association rules in data sanitization

    Cluster Computing

    (2019)
  • Sun, X., & Yu, P.S. (.2005). A border–based approach for hiding sensitive frequent itemsets. In Proceedings of the 5th...
  • X. Sun et al.

    Hiding sensitive frequent itemsets by a border-based approach

    Computing.Science and Engineering

    (2007)
  • S.L. Wang

    Maintenance of sanitizing informative association rules

    Expert Systems with Applications

    (2009)
  • S.L. Wang et al.

    Hiding informative association rule sets

    Expert Systems with Applications

    (2007)
  • S.L. Wang et al.

    Hiding collaborative recommendation association rules

    Applied Intelligence

    (2007)
  • J. Wu et al.

    SecEDMO: Enabling efficient data mining with strong privacy protection in cloud computing

    IEEE Transactions on Cloud Computing

    (2019)
  • R. Agrawal et al.

    Mining association rules between sets of items in large databases

  • R. Agrawal et al.

    Fast algorithms for mining association rules in large databases

  • M. Atallah et al.

    Disclosure limitation of sensitive rules

  • Z. Beheshti et al.

    Memetic binary particle swarm optimization for discrete optimization problems

    Information Sciences

    (2015)
  • P. Cheng et al.

    Privacy preservation through a greedy, distortion-based rule-hiding method

    Applied Intelligence

    (2016)
  • A.G. Divanis et al.

    An integer programming approach for frequent itemset hiding

  • A.G. Divanis et al.

    Hiding sensitive knowledge without side effects

    Knowledge and Information Systems

    (2009)
  • A.G. Divanis et al.

    Association rule hiding for data mining

    (2010)
  • B. Goethals et al.

    Advances in frequent itemset mining implementations: Report on FIMI’03

  • Cited by (31)

    • A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The proposed hybrid approach was compared with the ABC algorithm and it was reported that it was more effective than the ABC algorithm in fuzzy rule selection. Telikani et al. (2020) proposed a novel rule hiding algorithm and an improved binary ABC (IBABC) algorithm. ABC4ARH was obtained by combining both algorithms.

    • Verifiable privacy-preserving association rule mining using distributed decryption mechanism on the cloud

      2022, Expert Systems with Applications
      Citation Excerpt :

      Randomization-based solutions (Du et al., 2020; Hussien et al., 2013) generally use disturbance methods to implement PPARM (Rizvi & Haritsa, 2002) and association rule hiding (ARH) (Gkoulalas-Divanis & Verykios, 2010). ARH is a process of hiding sensitive knowledge using data transformation (Telikani et al., 2020), which can hide the sensitive association rule. The main idea of these disturbance methods is to mask the original data with a small random noise without changing the characteristics of the data.

    • High-performance implementation of evolutionary privacy-preserving algorithm for big data using GPU platform

      2021, Information Sciences
      Citation Excerpt :

      If the pattern is not hidden by the modification of the victim item, the victim item selection step re-runs again for the rest of the items. ABC4ARH algorithm exhibits high performance in terms of reducing side effects compared with the state-of-the-art PPARM algorithms [12]. However, its efficiency decreases when dealing with very large datasets.

    View all citing articles on Scopus
    View full text