A cost-efficient and versatile sanitizing algorithm by using a greedy approach

Wu, Chieh-Ming; Huang, Yin-Fu

doi:10.1007/s00500-010-0549-3

A cost-efficient and versatile sanitizing algorithm by using a greedy approach

Focus
Published: 19 February 2010

Volume 15, pages 939–952, (2011)
Cite this article

Soft Computing Aims and scope Submit manuscript

Chieh-Ming Wu¹ &
Yin-Fu Huang²

148 Accesses
6 Citations
Explore all metrics

Abstract

In a very large database, there exists sensitive information that must be protected against unauthorized accesses. The confidentiality protection of the information has been a long-term goal pursued by the database security research community and the government statistical agencies. In this paper, we proposed greedy methods for hiding sensitive rules. The experimental results showed the effectiveness of our approaches in terms of undesired side effects avoided in the rule hiding process. The results also revealed that in most cases, all the sensitive rules are hidden without generating spurious rules. First, the good scalability of our approach in terms of database sizes was achieved by using an efficient data structure, FCET, to store only maximal frequent itemsets instead of storing all frequent itemsets. Furthermore, we also proposed a new framework for enforcing the privacy in mining association rules. In the framework, we combined the techniques of efficiently hiding sensitive rules with the transaction retrieval engine based on the FCET index tree. For hiding sensitive rules, the proposed greedy approach includes a greedy approximation algorithm and a greedy exhausted algorithm to sanitize the database. In particular, we presented four strategies in the sanitizing procedure and four strategies in the exposed procedure, respectively, for hiding a group of association rules characterized as sensitive or artificial rules. In addition, the exposed procedure would expose missing rules during the processing so that the number of missing rules could be lowered as much as possible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 247–255
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM international conference on management of data, pp 207–216
Atallah M, Bertino E, Elmagarmid A (1999) Disclosure limitation of sensitive rules. In: Proceedings of IEEE knowledge and data engineering workshop, pp 45–52
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, pp 443–452
Fu WC, Wong RCW, Wang K (2005) Privacy-preserving frequent pattern mining across private databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 613–616
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM international conference on management of data, pp 1–12
Huang YF, Wu CM (2002) Mining generalized association rules using pruning techniques. In: Proceedings of IEEE international conference on data mining, pp 227–234
Lee G, Chang CY, Chen A (2004) Hiding sensitive patterns in association rules mining. In: Proceedings of 28th annual international computer software and applications conference, pp 424–429
Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE ICDM workshop on privacy, security, and data mining, pp 43–54
Oliveira SRM, Zaiane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining, pp 613–616
Oliveira SRM, Zaiane OR (2003) Algorithms for balancing privacy and knowledge discovery in association rule mining. In: Proceedings of the 7th international database engineering and applications symposium, pp 54–63
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of ACM international conference on management of data, pp 175–186
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of international conference on very large data bases, pp 682–693
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4):45–54
Article Google Scholar
Shenoy P, Haritsa J, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Proceedings of ACM international conference on management of data, pp 22–33
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Article Google Scholar
Wang SL, Jafari A (2005) Hiding sensitive predictive association rules. In: Proceedings of IEEE international conference on systems, man, and cybernetics, vol 1, pp 164–169
Wu CM, Huang YF (2008) An efficient data structure for mining generalized association rules. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, vol 2, pp 565–571
Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Article Google Scholar
Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: Proceedings of the World Congress on computer science and information engineering, vol 4, pp 61–65
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Article Google Scholar
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the 3rd international conference on knowledge discovery in databases and data mining, pp 283–286

Download references

Author information

Authors and Affiliations

Graduate School of Engineering Science and Technology, National Yunlin University of Science and Technology, 123 University Road, Section 3, Touliu, Yunlin, 640, Taiwan, ROC
Chieh-Ming Wu
Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, 123 University Road, Section 3, Touliu, Yunlin, 640, Taiwan, ROC
Yin-Fu Huang

Authors

Chieh-Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yin-Fu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin-Fu Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CM., Huang, YF. A cost-efficient and versatile sanitizing algorithm by using a greedy approach. Soft Comput 15, 939–952 (2011). https://doi.org/10.1007/s00500-010-0549-3

Download citation

Published: 19 February 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s00500-010-0549-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A cost-efficient and versatile sanitizing algorithm by using a greedy approach

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

On the nature and types of anomalies: a review of deviations in data

Big Data Privacy: Challenges to Privacy Principles and Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A cost-efficient and versatile sanitizing algorithm by using a greedy approach

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

On the nature and types of anomalies: a review of deviations in data

Big Data Privacy: Challenges to Privacy Principles and Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation