Skip to main content
Log in

MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a novel ‘MapReduce Based Optimum Victim Item and Transaction Selection Approach (MR-OVnTSA)’ that provides a feasible and intelligent solution for protecting sensitive frequent itemsets present in big data. The approach advocates to resolve the captious challenges, existing knowledge hiding algorithms are encountering. The proposed solution optimally minimizes the side effect of hiding process on non-sensitive information, and maintains a balance between knowledge and privacy as well as handles the exponential growth in data volume efficiently. The algorithm plugs the most optimum item and transaction as victim, by intelligently analyzing their coverage value i.e. it chooses one with maximal impact on sensitive knowledge but minimal on non-sensitive information. Further, the MapReduce version of the proposed scheme resolves the issue of non-feasibility by processing large-scale data (big data) in a parallel fashion. Experiments have been demonstrated over real and synthetically generated large-scale datasets. Results evince that the proffered scheme is much more efficient and maintains the balance between the privacy preservation, data quality maintenance, and CPU time, when dealing with large voluminous big datasets compared to existing knowledge hiding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: Privacy-preserving data mining, pp 267–289

  2. Tsai Y-C, Wang S-L, Song C-Y, Ting I (2016) Privacy and utility effects of k-anonymity on association rule hiding. In: Proceedings of the the 3rd multidisciplinary international social networks conference on social informatics, data science 2016, p 42

  3. Jin Y, Su C, Ruan N, Jia W (2016) Privacy-preserving mining of association rules for horizontally distributed databases based on FP-tree. In: International conference on information security practice and experience, pp 300–314

  4. Aggarwal CC, Yu PS (2008) A general survey of privacy-preserving data mining models and algorithms, privacy-preserving data mining: models and algorithms. Springer, Boston, pp 11–52

    Book  Google Scholar 

  5. Telikani A, Shahbahrami A (2018) Data sanitization in association rulemining: an analytical review. Expert Syst Appl 96:406– 426

    Article  Google Scholar 

  6. Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  7. Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE international conference on privacy, security and data mining, vol 14, pp 43–54

  8. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: IEEE workshop on knowledge and data engineering exchange (KDEX’99) Proceedings, pp 45–52

  9. Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In: Twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems RIDE-2EC proceedings, pp 151–158

  10. Oliveira SRM, Zaïane O R (2003) Protecting sensitive knowledge by data sanitization. In: ICDM, vol 3, pp 613–616

  11. Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191

    Article  Google Scholar 

  12. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th international workshop on information hiding, pp 369–383

  13. Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42

    Article  Google Scholar 

  14. Cheng P, Roddick JF, Chu SC, Lin CW (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell 44(2):295–306

    Article  Google Scholar 

  15. http://fimi.ua.ac.be/data/ (Accessed 17 Nov 2016)

  16. https://sourceforge.net/projects/ibmquestdatagen/ (Accessed 10 April 2016)

  17. https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data (Accessed 7 Oct 2016)

  18. Sharma S, Toshniwal D (2011) Parallelization of association rule mining: survey. In: International conference on computing, communication and security (ICCCS), pp 1–6

  19. Bhandarkar M (2010) Mapreduce programming with apache Hadoop. In: IEEE international symposium on parallel & distributed processing (IPDPS)

  20. Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: Proceeding of the ACM conference on information and knowledge management (CIKM ’06), pp 748–757

  21. Sari Aslam N, Cheng T, Cheshire J (2019) A high-precision heuristic model to detect home and work locations from smart card data. Geo-spatial Information Science 22(1):1–11

    Article  Google Scholar 

  22. Zhao X, Gong M, Zuo X, Pan L (2019) Guest editorial: advances in bio-inspired heuristics for computing. CAAI Transactions on Intelligence Technology 4(3):127–128

    Article  Google Scholar 

  23. Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  24. Lin CW, Hong TP, Wong JW, Lan GC, Lin WY (2014) A GA-based approach to hide sensitive high utility itemsets. Sci World J, vol 2014

  25. Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J, vol 2014

  26. Lin CW, Hong TP, Yang KT, Wang SL (2014) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230

    Article  Google Scholar 

  27. Lin CW, Liu Q, Fournier-Viger P, Hong TP, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53:1–18

    Article  Google Scholar 

  28. Lin JC-W, Zhang Y, Chen C-H, Wu JM-T, Chen C-M, Hong T-P (2018) A multiple objective PSO-based approach for data sanitization. In: 2018 conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 148–151

  29. Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: Proceedings of the 18th conference on database security (DBSEC 2004), pp 325–339

  30. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4):45–54

    Article  Google Scholar 

  31. Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In: Proceedings of the 2002 international workshop on research issues in data engineering: engineering e-commerce/e-business systems (RIDE 2002), pp 151–163

  32. Wang S-L, Jafari A (2005) Using unknowns for hiding sensitive predictive association rules. In: Proceedings of the 2005 IEEE international conference on information reuse and integration (IRI 2005), pp 223–228

  33. Pontikakis E, Theodoridis Y, Tsitsonis A, Chang L, Verykios VS (2004) A quantitative and qualitative analysis of blocking in association rule hiding. In: Proceedings of the 2004 ACM workshop on privacy in the electronic society (WPES 2004), pp 29–30

  34. Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713

    Article  Google Scholar 

  35. Sun X, Philip SY (2005) A border-based approach for hiding sensitive frequent itemsets. In: Null. IEEE, pp 426–433

  36. Moustakides GV, Verykios VS (2008) A MaxMin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89

    Article  Google Scholar 

  37. Sharma S, Toshniwal D (2018) MR-I MaxMin-scalable two-phase border based knowledge hiding technique using mapreduce. Future Generation Computer Systems

  38. Chen P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intelligent Data Analysis 20(3):495–514

    Article  Google Scholar 

  39. Al-Sai ZA, Abualigah LM (2017) Big data and e-government: a review. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 580–587

  40. Sharma S, Toshniwal D (2017) Scalable two-phase co-occurring sensitive pattern hiding using MapReduce. Journal of Big Data 4(1):4

    Article  Google Scholar 

  41. Hong TP, Lin CW, Yang KT, Wang SL (2011) June. A heuristic data-sanitization approach based on TF-IDF. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 156–164

  42. Lin C-W, Hong T-P, Hsu H-C (2014) Reducing side effects of hiding sensitive itemsets in privacy preserving data mining. Sci World J 2014:12. 235837

    Google Scholar 

  43. Telikani A, Shahbahrami A (2017) Optimizing association rule hiding using combination of border and heuristic approaches. Appl Intell 47(2):544–557

    Article  Google Scholar 

  44. Yi X, Rao F-Y, Bertino E, Bouguettaya A (2015) Privacy-preserving association rule mining in cloud computing. In: Proceedings of the 10th ACM symposium on information, computer and communications security, pp 439–450

  45. Zhang X, et al. (2014) Privacy preservation over big data in cloud systems. Security, Privacy and Trust in Cloud Systems. Springer, Berlin, pp 239–257

    Book  Google Scholar 

  46. Zhang X, et al. (2014) A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems 25(2):363–373

    Article  Google Scholar 

  47. Huang C, Lu R (2015) EFPA: efficient and flexible privacy-preserving mining of association rule in cloud. In: IEEE/CIC international conference on communications in China (ICCC)

  48. Liu F, Shu X, Yao D, Butt AR (2015) Privacy-preserving scanning of big content for sensitive data exposure with MapReduce. In: Proceedings of the 5th ACM conference on data and application security and privacy, pp 195–206

  49. Huang C, Lu R, Choo K-KR (2016) Secure and flexible cloud-assisted association rule mining over horizontally partitioned databases. J Comput Syst Sci 89:51–63

    Article  MathSciNet  MATH  Google Scholar 

  50. Telikani A, Shahbahrami A, Tavoli R (2015) Data sanitization in association rule mining based on impact factor. Journal of AI and Data Mining 3(2):131–140

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivani Sharma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Comparing proposed OVnTSA with hybrid scheme

Appendix: Comparing proposed OVnTSA with hybrid scheme

The presented OVnTSA approach is heuristics based scheme used for hiding sensitive patterns however, recently some recent approaches have been investigated using different methods like hybrid schemes [43].

Hybrid approach use border based scheme for hiding sensitive rules whereas proposed OVnTSA approach is sensitive frequent pattern hiding scheme. Therefore for comparing the performance in terms of both techniques, A set of frequent itemsets and all the derived rules out from those itemsets are considered as confidential. OVnTSA needs to hide the sensitive patterns and Hybrid needs to hide all the rules derived from those sensitive patterns. The performance is compared in terms of misses cost i.e. a number of non-sensitive itemsets lost over two different real benchmark datasets. From Fig. 6a and b, it can be clearly observed that with a different number of sensitive itemsets the misses cost using OVnTSA scheme is less than the hybrid scheme. Further, Fig. 6c and d, present the plot between misses cost verses varying minimum threshold. The experiments are performed over four benchmark datasets. It can be clearly stated that the proposed scheme has lower misses cost than the hybrid approach.

Fig. 6
figure 6

Comparative analysis of OVnTSA and hybrid approach in terms of total nonsensitive items lost during sanitization

As the compared techniques are meant for a different purpose, i.e. Hybrid for rule hiding and OVnTSA is for sensitive pattern hiding, it may be possible they may perform differently in a real scenario. A hybrid approach may perform better for other applications i.e. when some set of sensitive rules is provided as input for sanitization. As it is possible that some of the rules which are derived from a sensitive itemset (in our case) maybe, are not sensitive or valid at all (i.e. do not hold the confidence value above the threshold).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S., Toshniwal, D. MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data. Appl Intell 50, 4241–4260 (2020). https://doi.org/10.1007/s10489-020-01749-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01749-6

Keywords

Navigation