Abstract
In a very large database, there exists sensitive information that must be protected against unauthorized accesses. The confidentiality protection of the information has been a long-term goal pursued by the database security research community and the government statistical agencies. In this paper, we proposed greedy methods for hiding sensitive rules. The experimental results showed the effectiveness of our approaches in terms of undesired side effects avoided in the rule hiding process. The results also revealed that in most cases, all the sensitive rules are hidden without generating spurious rules. First, the good scalability of our approach in terms of database sizes was achieved by using an efficient data structure, FCET, to store only maximal frequent itemsets instead of storing all frequent itemsets. Furthermore, we also proposed a new framework for enforcing the privacy in mining association rules. In the framework, we combined the techniques of efficiently hiding sensitive rules with the transaction retrieval engine based on the FCET index tree. For hiding sensitive rules, the proposed greedy approach includes a greedy approximation algorithm and a greedy exhausted algorithm to sanitize the database. In particular, we presented four strategies in the sanitizing procedure and four strategies in the exposed procedure, respectively, for hiding a group of association rules characterized as sensitive or artificial rules. In addition, the exposed procedure would expose missing rules during the processing so that the number of missing rules could be lowered as much as possible.
Similar content being viewed by others
References
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 247–255
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM international conference on management of data, pp 207–216
Atallah M, Bertino E, Elmagarmid A (1999) Disclosure limitation of sensitive rules. In: Proceedings of IEEE knowledge and data engineering workshop, pp 45–52
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, pp 443–452
Fu WC, Wong RCW, Wang K (2005) Privacy-preserving frequent pattern mining across private databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 613–616
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM international conference on management of data, pp 1–12
Huang YF, Wu CM (2002) Mining generalized association rules using pruning techniques. In: Proceedings of IEEE international conference on data mining, pp 227–234
Lee G, Chang CY, Chen A (2004) Hiding sensitive patterns in association rules mining. In: Proceedings of 28th annual international computer software and applications conference, pp 424–429
Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE ICDM workshop on privacy, security, and data mining, pp 43–54
Oliveira SRM, Zaiane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining, pp 613–616
Oliveira SRM, Zaiane OR (2003) Algorithms for balancing privacy and knowledge discovery in association rule mining. In: Proceedings of the 7th international database engineering and applications symposium, pp 54–63
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of ACM international conference on management of data, pp 175–186
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of international conference on very large data bases, pp 682–693
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4):45–54
Shenoy P, Haritsa J, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Proceedings of ACM international conference on management of data, pp 22–33
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Wang SL, Jafari A (2005) Hiding sensitive predictive association rules. In: Proceedings of IEEE international conference on systems, man, and cybernetics, vol 1, pp 164–169
Wu CM, Huang YF (2008) An efficient data structure for mining generalized association rules. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, vol 2, pp 565–571
Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: Proceedings of the World Congress on computer science and information engineering, vol 4, pp 61–65
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the 3rd international conference on knowledge discovery in databases and data mining, pp 283–286
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, CM., Huang, YF. A cost-efficient and versatile sanitizing algorithm by using a greedy approach. Soft Comput 15, 939–952 (2011). https://doi.org/10.1007/s00500-010-0549-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-010-0549-3