Skip to main content
Log in

Hiding sensitive itemsets without side effects

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data mining techniques are being used to discover useful patterns hidden in the data. However, these data mining techniques also extract sensitive information posing a threat to privacy. Frequent Itemset mining is a widely used data mining technique and a pre-processing step for Association Rule Mining. These frequent itemsets may contain sensitive itemsets which need to be hidden from adversaries. Traditional data sanitization techniques modify transactions in the database to hide sensitive itemsets which suffer from undesired side effects and information loss. In this paper, we propose a pattern sanitization approach to hide sensitive itemsets for privacy preserved pattern sharing. The transactional database is modeled as a set of lossless compact patterns using Closed Itemsets. The novelty of the proposed technique is in sanitizing the closed itemsets/patterns instead of transactions in the database. The proposed Recursive Pattern Sanitization (RPS) algorithm hides multiple sensitive itemsets irrespective of their size and support in single parse of the closed patterns. The patterns in the sanitized model retain the closeness property, and the model has inherent support for finding frequent itemsets and association rules reducing mining activity by the end user. Experimental results show that the proposed approach is effective in hiding sensitive itemsets without side effects and unexpected information loss compared to other well-known transaction modification based itemset hiding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS: 211–222

  2. Gkoulalas-Divanis A, Verykios V (2006) An integer programming approach for frequent itemset hiding. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM: 748–757

  3. Aggarwal C, Bhuiyan M, Hasan M (2014) Frequent pattern mining algorithms: A survey. In: Aggarwal C, Han J (eds) Frequent pattern mining. Springer, Cham

  4. Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD, ACM: 439–450

  5. Öztürk AC, Bostano\(\breve {g}\)lu EB (2017) Itemset hiding under multiple sensitive support thresholds. In: Proceedings of 9th International Joint Conference on Knowledge Discovery Knowledge Engineering and Knowledge Management (IC3K 2017) - Volume 3: KMIS, pp 222–231

  6. Telikani A, Shahbahrami A (2017) Optimizing association rule hiding using combination of border and heuristic approaches. Appl Intell 47(2):544–557

    Article  Google Scholar 

  7. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the knowledge and data engineering exchange (KDEX’99): 45– 52

  8. Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), pp 85–93

  9. Bertino E, Lin D, Jiang W (2008) A survey of quantification of privacy preserving data mining algorithms. In: Aggarwal CC, Yu P (eds) Privacy-preserving data mining. Advances in database systems, vol 34. Springer, Boston

  10. Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input–output privacy. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007): 696–705

  11. Aggarwal C, Yu P (2004) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th International Conference on Advances in Database Technology, EDBT: 183–199

  12. Clifton C (1999) Protecting against data mining through samples. In: Proceedings of the 13th International Conference on database security (DBSec’99), pp 193–207

  13. Domingo-Ferrer J (2005) Data Min Knowl Disc 11:117. https://doi.org/10.1007/s10618-005-0009-3

    Article  MathSciNet  Google Scholar 

  14. Bertino E, Fovino IN, Povenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Min Knowl Disc 11(2):121–154

    Article  MathSciNet  Google Scholar 

  15. Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: Proceedings of the 18th Conference on Database Security (DBSEC 2004): 325–339

  16. Dasseni E, Verykios V, Elmagarmid A, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th International Workshop on Information Hiding, IHW: 369–383

  17. Stavropoulos EC, Verykios V, Kagklis V (2016) A transversal hyper-graph approach for the frequent itemset hiding problem. Knowl Inf Syst 47(3):625–645

    Article  Google Scholar 

  18. Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Info Syst 29(4):343–364

    Article  Google Scholar 

  19. Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/

  20. Moustakides GV, Verykios VS (2006) A max-min approach for hiding frequent itemsets. In: Workshops Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006): 502–506

  21. Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Info Syst 20(3):263–299

    Article  Google Scholar 

  22. Gkoulalas-Divanis A, Haritsa J, Kantarcioglu M (2014) Privacy issues in association rule mining. In: Aggarwal C, Han J (eds) Frequent pattern mining. Springer, Cham

  23. Caiola G, Reiter JP (2010) Random forests for generating partially synthetic, categorical data. Transaction Data Privacy: 27–42

  24. Haritsa JR (2008) Mining association rules under privacy constraints. In: Aggarwal CC, Yu P (eds) Privacy-preserving data mining. Advances in database systems, vol 34. Springer, Boston

  25. Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: 2000 ACM-SIGMOD International Workshop Data Mining and Knowledge Discovery (DMKD’00)

  26. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD: 639–644

  27. Wang J, Han J, Pei J (2003) CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’03). ACM, New York, pp 236–245

  28. Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X (2014) Privbayes: private data release via bayesian networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). ACM, New York, pp 1423–1434

  29. Iqbal K, Asghar S, Fong S (2010) Hiding sensitive XML Association Rules via Bayesian network. In: Proceedings of 6th International Conference on Advanced Information Management and Service (IMS), Seoul:466–471

  30. Iqbal K, Asghar S, Fong S (2011) A PPDM model using Bayesian Network for hiding sensitive XML Association Rules. In: Proceedings of Sixth International Conference on Digital Information Management, Melbourn, QLD:30–35

  31. Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy?. In: Proceedings of the 10th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’04): 599–604

  32. Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: The 6th international conference on information, process, and knowledge management:124–130

  33. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the 1999 IEEE Workshop on Knowledge and Data Engineering Exchange

  34. Fernandes M, Gomes J (2017) Heuristic approach for association rule hiding using ECLAT. In: Proceedings of 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), Mumbai:218-223

  35. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037

    Article  Google Scholar 

  36. Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Data Min Knowl Disc 11:181. https://doi.org/10.1007/s10618-005-0011-9

    Article  Google Scholar 

  37. Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Info Syst Res 16(3):256–270

    Article  Google Scholar 

  38. Zaki MJ, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of International Conference on Data Mining: 457–473

  39. Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89

    Article  Google Scholar 

  40. Cheng P, Roddick JF, Chu S-C, Lin C-W (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell 44(2):295–306

    Article  Google Scholar 

  41. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB:487-499

  42. Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925

    Article  Google Scholar 

  43. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International conference on management of data, SIGMOD:207–216

  44. Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, CRPIT: 43–54

  45. Oliveira SRM, Zaiane OR (2006) A unified framework for protecting sensitive association rules in business collaboration. Int J Bus Intell Data Min 1(3):247–287

    Article  Google Scholar 

  46. Oliveira SRM, Zaiane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003):211–218

  47. Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB: 682–693

  48. Wang S-L, Jafari A (2005) Using unknowns for hiding sensitive predictive association rules. In: Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration (IRI 2005):223-228

  49. SPMF – An Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/

  50. Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1 (1):74–94

    Article  Google Scholar 

  51. Tassa T (2011) Secure mining of association rules in horizontally distributed databases. CoRR, arXiv:1106.5113

  52. Hong T-P, Lin C-W, Yang K-T, Wang S-L (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  53. Kagklis V, Verykios VS, Tzimas G, Tsakalidis AK (2014) An integer linear programming scheme to sanitize sensitive frequent itemsets, 2014 IEEE 26th international conference on tools with artificial intelligence, Limassol:771–775

  54. Verykios VS, Emagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  55. Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: Aggarwal CC, Yu P (eds) Privacy-preserving data mining. Advances in database systems, vol 34. Springer, Boston

  56. Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: WRI world congress on computer science and information engineering:61–65

  57. Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005):426–433

  58. Saygin Y, Verykios VS, Elmagarmid A (2002) Privacy preserving association rule mining. In: Proceedings of the 2002 International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE 2002):151–163

  59. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4):45–54

    Article  Google Scholar 

  60. Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Surendra H.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

H, S., S, M.H. Hiding sensitive itemsets without side effects. Appl Intell 49, 1213–1227 (2019). https://doi.org/10.1007/s10489-018-1329-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1329-5

Keywords

Navigation