Skip to main content
Log in

Hiding sensitive knowledge without side effects

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Sensitive knowledge hiding in large transactional databases is one of the major goals of privacy preserving data mining. However, it is only recently that researchers were able to identify exact solutions for the hiding of knowledge, depicted in the form of sensitive frequent itemsets and their related association rules. Exact solutions allow for the hiding of vulnerable knowledge without any critical compromises, such as the hiding of nonsensitive patterns or the accidental uncovering of infrequent itemsets, amongst the frequent ones, in the sanitized outcome. In this paper, we highlight the process of border revision, which plays a significant role towards the identification of exact hiding solutions, and we provide efficient algorithms for the computation of the revised borders. Furthermore, we review two algorithms that identify exact hiding solutions, and we extend the functionality of one of them to effectively identify exact solutions for a wider range of problems (than its original counterpart). Following that, we introduce a novel framework for decomposition and parallel solving of hiding problems, which are handled by each of these approaches. This framework improves to a substantial degree the size of the problems that both algorithms can handle and significantly decreases their runtime. Through experimentation, we demonstrate the effectiveness of these approaches toward providing high quality knowledge hiding solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng (TKDE) 8(1): 962–969

    Article  Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB), pp 487–499

  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 439–450

  4. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX), pp 45–52

  5. Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data

  6. Bertino E, Fovino IN, Povenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Mining Knowl Discov (DMKD) 11(2): 121–154

    Article  Google Scholar 

  7. Cheung D, Xiao Y (1998) Effect of data skewness in parallel mining of association rules. In: Proceedings of the 2nd Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (PAKDD), pp 48–60

  8. Clifton C, Kantarciog̈lu M, Vaidya J (2002) Defining privacy for data mining. National Science Foundation Workshop on Next Generation Data Mining (WNGDM), pp 126–133

  9. Clifton C, Marks D (1996) Security and privacy implications of data mining. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp 15–19

  10. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th International Workshop on Information Hiding, pp 369–383

  11. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 343–364

  12. Farkas C, Jajodia S (2002) The inference problem: a survey. ACM SIGKDD Exploration Newsl 4(2): 6–11

    Article  Google Scholar 

  13. Fienberg S, Slavkovic A (2005) Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining Knowl Discov (DMKD) 11(2): 155–180

    Article  MathSciNet  Google Scholar 

  14. Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: Proceedings of the 2006 ACM Conference on Information and Knowledge Management (CIKM)

  15. Gkoulalas-Divanis A, Verykios VS (2007) A hybrid approach to frequent itemset hiding. In: Proceedings of the 2007 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 297–304

  16. Han E-H, Karypis G, Kumar V (2007) Scalable parallel data mining for association rules. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp 277–288

  17. ILOG CPLEX 9.0 User’s Manual (2003) ILOG Inc, Gentilly, France

  18. Kantarciog̈lu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng (TKDE) 16(9): 1026–1037

    Article  Google Scholar 

  19. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inform Syst (KAIS) 7(4): 387–414

    Article  Google Scholar 

  20. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1): 359–392

    Article  MathSciNet  Google Scholar 

  21. Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2): 86–98. http://www.ecn.purdue.edu/KDDCUP

  22. Lee G, Lee K, Chen A (2001) Efficient graph-based algorithms for discovering and maintaining association rules in large databases. Knowl Inform Syst (KAIS) 3(3): 338–355

    Article  MATH  MathSciNet  Google Scholar 

  23. Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inform Syst Res 16(3): 256–270

    Article  Google Scholar 

  24. Morgenstern M (1988) Controlling logical inference in multilevel database and knowledge-base systems. In: Proceedings of the 1988 IEEE Symposium on Security and Privacy, pp 245–255

  25. Moustakides G, Verykios VS (2006) A max-min approach for hiding frequent itemsets. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pp 502–506

  26. Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the 2002 IEEE International Conference on Privacy, Security and Data Mining (CRPITS), pp 43–54

  27. Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp 211–218

  28. Parthasarathy S, Zaki M, Ogihara M, Li W (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inform Syst (KAIS) 3(1): 1–29

    Article  MATH  Google Scholar 

  29. Pontikakis E, Theodoridis Y, Tsitsonis A, Chang L, Verykios VS (2004) A quantitative and qualitative analysis of blocking in association rule hiding. In: Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society (WPES), pp 29–30

  30. Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB)

  31. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4): 45–54

    Article  Google Scholar 

  32. Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), pp 426–433

  33. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 639–644

  34. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004a) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33(1): 50–57

    Article  Google Scholar 

  35. Verykios VS, Emagarmid AK, Bertino E, Saygin Y, Dasseni E (2004b) Association rule hiding. IEEE Trans Knowl Data Eng (TKDE) 16(4): 434–447

    Article  Google Scholar 

  36. Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inform Syst (KAIS) 10(3): 383–397

    Article  Google Scholar 

  37. Yokoo M, Durfee E, Ishida T, Kuwabara K (1998) The distributed constraint satisfaction problem: formalization and algorithms. IEEE Trans Knowl Data Eng (TKDE) 10(5): 673–685

    Article  Google Scholar 

  38. Zaïane OR, El-Hajj M, Lu P (2001) Fast parallel association rule mining without candidacy generation. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 665–668

  39. Zou Q, Chu W, Johnson D, Chiu H (2002) A pattern decomposition algorithm for data mining of frequent patterns. Knowl Inform Syst (KAIS) 4(4): 466–482

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aris Gkoulalas-Divanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gkoulalas-Divanis, A., Verykios, V.S. Hiding sensitive knowledge without side effects. Knowl Inf Syst 20, 263–299 (2009). https://doi.org/10.1007/s10115-008-0178-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0178-7

Keywords

Navigation