Skip to main content
Log in

A cost-efficient and versatile sanitizing algorithm by using a greedy approach

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In a very large database, there exists sensitive information that must be protected against unauthorized accesses. The confidentiality protection of the information has been a long-term goal pursued by the database security research community and the government statistical agencies. In this paper, we proposed greedy methods for hiding sensitive rules. The experimental results showed the effectiveness of our approaches in terms of undesired side effects avoided in the rule hiding process. The results also revealed that in most cases, all the sensitive rules are hidden without generating spurious rules. First, the good scalability of our approach in terms of database sizes was achieved by using an efficient data structure, FCET, to store only maximal frequent itemsets instead of storing all frequent itemsets. Furthermore, we also proposed a new framework for enforcing the privacy in mining association rules. In the framework, we combined the techniques of efficiently hiding sensitive rules with the transaction retrieval engine based on the FCET index tree. For hiding sensitive rules, the proposed greedy approach includes a greedy approximation algorithm and a greedy exhausted algorithm to sanitize the database. In particular, we presented four strategies in the sanitizing procedure and four strategies in the exposed procedure, respectively, for hiding a group of association rules characterized as sensitive or artificial rules. In addition, the exposed procedure would expose missing rules during the processing so that the number of missing rules could be lowered as much as possible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 247–255

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450

    Article  Google Scholar 

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM international conference on management of data, pp 207–216

  • Atallah M, Bertino E, Elmagarmid A (1999) Disclosure limitation of sensitive rules. In: Proceedings of IEEE knowledge and data engineering workshop, pp 45–52

  • Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, pp 443–452

  • Fu WC, Wong RCW, Wang K (2005) Privacy-preserving frequent pattern mining across private databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 613–616

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM international conference on management of data, pp 1–12

  • Huang YF, Wu CM (2002) Mining generalized association rules using pruning techniques. In: Proceedings of IEEE international conference on data mining, pp 227–234

  • Lee G, Chang CY, Chen A (2004) Hiding sensitive patterns in association rules mining. In: Proceedings of 28th annual international computer software and applications conference, pp 424–429

  • Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE ICDM workshop on privacy, security, and data mining, pp 43–54

  • Oliveira SRM, Zaiane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining, pp 613–616

  • Oliveira SRM, Zaiane OR (2003) Algorithms for balancing privacy and knowledge discovery in association rule mining. In: Proceedings of the 7th international database engineering and applications symposium, pp 54–63

  • Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of ACM international conference on management of data, pp 175–186

  • Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of international conference on very large data bases, pp 682–693

  • Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4):45–54

    Article  Google Scholar 

  • Shenoy P, Haritsa J, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Proceedings of ACM international conference on management of data, pp 22–33

  • Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  • Wang SL, Jafari A (2005) Hiding sensitive predictive association rules. In: Proceedings of IEEE international conference on systems, man, and cybernetics, vol 1, pp 164–169

  • Wu CM, Huang YF (2008) An efficient data structure for mining generalized association rules. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, vol 2, pp 565–571

  • Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42

    Article  Google Scholar 

  • Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: Proceedings of the World Congress on computer science and information engineering, vol 4, pp 61–65

  • Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  • Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the 3rd international conference on knowledge discovery in databases and data mining, pp 283–286

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin-Fu Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CM., Huang, YF. A cost-efficient and versatile sanitizing algorithm by using a greedy approach. Soft Comput 15, 939–952 (2011). https://doi.org/10.1007/s00500-010-0549-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-010-0549-3

Keywords

Navigation