Skip to main content
Log in

Frequent itemset hiding revisited: pushing hiding constraints into mining

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper introduces a new theoretical scheme for the solution of the frequent itemset hiding problem. We propose an algorithmic approach that consists of a novel constraint-based hiding model which encompasses hiding into one pass mining, along with a solution methodology that relies on Linear Programming. The induced patterns by the constraint-based mining algorithm are, in this way, utilized to build a minimal linear program whose solution dictates the construction of a database extension that delivers the sought-for hiding. This extension should be appended to the original database and released as a whole for mining, with that resulting extended database hiding the sensitive knowledge that we want to protect. Our proposed theory outdoes both in space complexity and accuracy, all the existing approaches which have been proposed so far in this domain and we proved that superiority with a series of experiments against other existing approaches. Our proposal sheds a new light on the exploration of new algorithmic techniques which can be handily applied to model hiding problems by providing solutions that computationally outperform all existing modeling approaches for hiding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Abul O, Atzori M, Bonchi F, Giannotti F (2007) Hiding sequences. In: SEBD, pp 233–241

  2. Abul O, Gökçe H (2012) Knowledge hiding from tree and graph databases. Data Knowl Eng 72:148–171

    Article  Google Scholar 

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487– 499

  4. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD conference, pp 439–450

  5. Amiri F, Quirchmayr G (2017) A comparative study on innovative approaches for privacy-preservation in knowledge discovery. In: ICIME 2017: Proceedings of the 9th international conference on information management and engineering, pp 120– 127

  6. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: KDEX workshop. IEEE, pp 45–52

  7. Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & hall/CRC data mining and knowledge discovery series. CRC Press Inc., Boca Raton

    Google Scholar 

  8. Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2):180–201

    Article  Google Scholar 

  9. Bonchi F, Saygin Y, Verykios VS, Atzori M, Gkoulalas-Divanis A, Kaya SV, Savas E (2008) Privacy in spatiotemporal data mining. In: Mobility, data mining and privacy, pp 297– 333

  10. Boulicaut J-F, Jeudy B (2005) Constraint-based data mining. In: The data mining and knowledge discovery handbook, pp 399–416

  11. Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input-output privacy. In: ICDE, pp 696– 705

  12. Calders T (2008) Itemset frequency satisfiability: Complexity and axiomatization. Theor Comput Sci 394(1-2):84–111

    Article  MathSciNet  Google Scholar 

  13. Caruccio L, Desiato D, Polese G, Tortora G (2020) GDPR compliant information confidentiality preservation in big data processing. IEEE Access, NJ, pp 205034–205050

    Google Scholar 

  14. Chee CH, Jaafar J, Aziz IA, Hasan MH, Yeoh W (2019) Algorithms for frequent itemset mining: a literature review. Artif Intell 52:2603–2621

    Google Scholar 

  15. Cheng P, Roddick JF, Chu SC, Lin CW (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell 44:295–306

    Article  Google Scholar 

  16. Clifton C (1999) Protecting against data mining through samples. In: DBSEc, pp 193–207

  17. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Information hiding, pp 369–383

  18. Delis A, Verykios VS, Tsitsonis AA (2010) A data perturbation approach to sensitive classification rule hiding. In: SAC, pp 605–609

  19. Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48:4775–4791

    Article  Google Scholar 

  20. Feretzakis G, Mitropoulos K, Kalles D, Verykios VS (2020) Local distortion hiding (LDH) algorithm: a Java-based prototype. In: SETN, pp 144–149

  21. Feretzakis G, Kalles D, Verykios VS (2019) On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21(1):66

    Article  Google Scholar 

  22. Efficient Apriori : https://github.com/tommyod/Efficient-Apriori

  23. Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Inf Syst 29(4):343–364

    Article  Google Scholar 

  24. Frequent itemset mining dataset repository: http://fimi.uantwerpen.be/data/

  25. Gao F, Khandelwal A, Liu J (2019) Mining frequent itemsets using improved apriori on spark. ICISDM 2019

  26. Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: CIKM, pp 748– 757

  27. Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713

    Article  Google Scholar 

  28. Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Inf Syst 20(3):263–299

    Article  Google Scholar 

  29. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037

    Article  Google Scholar 

  30. Kenthapadi K, Mironov I, Thakurta AG (2019) Privacy-preserving data mining in industry. In: Twelfth ACM international conference

  31. Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: the 6th international conference on information, process, and knowledge management, pp 124–130

  32. Li R, Mu N, Le J, Liao X (2019) Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction. Comput Sec (elsevier) 84:17–34

    Article  Google Scholar 

  33. Liu X, Wen S, Zuo W (2020) Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining. Appl Intell 50:169–191

    Article  Google Scholar 

  34. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: CRYPTO, pp 36–54

  35. Makris C, Markovits P (2018) Evaluation of sensitive data hiding techniques for transaction databases. SETN ’18 11:1–8

    Google Scholar 

  36. Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inf Syst Res 16(3):256–270

    Article  Google Scholar 

  37. Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75– 89

    Article  Google Scholar 

  38. Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: ICDM, pp 613–616

  39. Ozturk AC, Bostanoglu EB (2017) Itemset hiding under multiple sensitive support thresholds. In: Proceedings of 9th international joint conference on knowledge discovery knowledge engineering and knowledge management, pp 222–231

  40. Python Pulp Library: https://pythonhosted.org/PuLP/

  41. Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: VLDB, pp 682–693

  42. Sharma S, Toshniwal D (2020) MR-OVNTSA: a heuristics based sensitive pattern hiding approach for big data. Appl Intell

  43. Md Siraj M, Rahmat NA, Din MM (2019) A survey on privacy preserving data mining approaches and techniques. In: ICSCA ’19: proceedings of the 2019 8th international conference on software and computer applications, pp 65–69

  44. Sacca D, Serra E, Rullo A (2019) Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applications. Data Mining Knowl Discov 33:1736–1774

    Article  MathSciNet  Google Scholar 

  45. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: KDD, pp 67–73

  46. Stavropoulos EC, Verykios VS, Kagklis V (2016) A transversal hypergraph approach for the frequent itemset hiding problem. Knowl Inf Sys

  47. Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: ICDM, pp 426–433

  48. Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. JCSE 1(1):74–94

    Article  Google Scholar 

  49. Telikani A, Shahbahrami A (2018) Data sanitization in association rule mining: an analytical review. Expert Sys Appl 96:406– 426

    Article  Google Scholar 

  50. Telikani A, Shahbahrami A, Tavoli R (2015) Data sanitization in association rule mining based on impact factor. J AI Data Min 3(2):132–140

    Google Scholar 

  51. Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  52. Verykios VS, Stavropoulos EC, Zorkadis V, Elmagarmid AK (2019) A constraint-based model for the frequent itemset hiding problem. e-Democracy 49–64

  53. Voigt P, von dem Bussche A (2017) The EU general data protection regulation(GDPR): a practical guide in Springer

  54. Wen H, Kou M, He H, Li X, Tou H, Yang Y. (2018) A spark-based incremental algorithm for frequent itemset mining. In: BDIOT 2018: proceedings of the 2018 2nd international conference on big data and internet of things, pp 53–58

Download references

Acknowledgements

We would like to thank the department of Informatics in the University of Piraeus for infrastructure availability to perform the extensive experimental tests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evangelos Sakkopoulos.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verykios, V.S., Stavropoulos, E.C., Krasadakis, P. et al. Frequent itemset hiding revisited: pushing hiding constraints into mining. Appl Intell 52, 2539–2555 (2022). https://doi.org/10.1007/s10489-021-02490-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02490-4

Keywords

Navigation