MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data

Sharma, Shivani; Toshniwal, Durga

doi:10.1007/s10489-020-01749-6

MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data

Published: 15 July 2020

Volume 50, pages 4241–4260, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

362 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a novel ‘MapReduce Based Optimum Victim Item and Transaction Selection Approach (MR-OVnTSA)’ that provides a feasible and intelligent solution for protecting sensitive frequent itemsets present in big data. The approach advocates to resolve the captious challenges, existing knowledge hiding algorithms are encountering. The proposed solution optimally minimizes the side effect of hiding process on non-sensitive information, and maintains a balance between knowledge and privacy as well as handles the exponential growth in data volume efficiently. The algorithm plugs the most optimum item and transaction as victim, by intelligently analyzing their coverage value i.e. it chooses one with maximal impact on sensitive knowledge but minimal on non-sensitive information. Further, the MapReduce version of the proposed scheme resolves the issue of non-feasibility by processing large-scale data (big data) in a parallel fashion. Experiments have been demonstrated over real and synthetically generated large-scale datasets. Results evince that the proffered scheme is much more efficient and maintains the balance between the privacy preservation, data quality maintenance, and CPU time, when dealing with large voluminous big datasets compared to existing knowledge hiding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

Article Open access 09 March 2017

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: Privacy-preserving data mining, pp 267–289
Tsai Y-C, Wang S-L, Song C-Y, Ting I (2016) Privacy and utility effects of k-anonymity on association rule hiding. In: Proceedings of the the 3rd multidisciplinary international social networks conference on social informatics, data science 2016, p 42
Jin Y, Su C, Ruan N, Jia W (2016) Privacy-preserving mining of association rules for horizontally distributed databases based on FP-tree. In: International conference on information security practice and experience, pp 300–314
Aggarwal CC, Yu PS (2008) A general survey of privacy-preserving data mining models and algorithms, privacy-preserving data mining: models and algorithms. Springer, Boston, pp 11–52
Book Google Scholar
Telikani A, Shahbahrami A (2018) Data sanitization in association rulemining: an analytical review. Expert Syst Appl 96:406– 426
Article Google Scholar
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Article Google Scholar
Oliveira SRM, Zaiane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE international conference on privacy, security and data mining, vol 14, pp 43–54
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: IEEE workshop on knowledge and data engineering exchange (KDEX’99) Proceedings, pp 45–52
Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In: Twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems RIDE-2EC proceedings, pp 151–158
Oliveira SRM, Zaïane O R (2003) Protecting sensitive knowledge by data sanitization. In: ICDM, vol 3, pp 613–616
Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Article Google Scholar
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th international workshop on information hiding, pp 369–383
Wu Y-H, Chiang C-M, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Article Google Scholar
Cheng P, Roddick JF, Chu SC, Lin CW (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell 44(2):295–306
Article Google Scholar
http://fimi.ua.ac.be/data/ (Accessed 17 Nov 2016)
https://sourceforge.net/projects/ibmquestdatagen/ (Accessed 10 April 2016)
https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data (Accessed 7 Oct 2016)
Sharma S, Toshniwal D (2011) Parallelization of association rule mining: survey. In: International conference on computing, communication and security (ICCCS), pp 1–6
Bhandarkar M (2010) Mapreduce programming with apache Hadoop. In: IEEE international symposium on parallel & distributed processing (IPDPS)
Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: Proceeding of the ACM conference on information and knowledge management (CIKM ’06), pp 748–757
Sari Aslam N, Cheng T, Cheshire J (2019) A high-precision heuristic model to detect home and work locations from smart card data. Geo-spatial Information Science 22(1):1–11
Article Google Scholar
Zhao X, Gong M, Zuo X, Pan L (2019) Guest editorial: advances in bio-inspired heuristics for computing. CAAI Transactions on Intelligence Technology 4(3):127–128
Article Google Scholar
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Book Google Scholar
Lin CW, Hong TP, Wong JW, Lan GC, Lin WY (2014) A GA-based approach to hide sensitive high utility itemsets. Sci World J, vol 2014
Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J, vol 2014
Lin CW, Hong TP, Yang KT, Wang SL (2014) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230
Article Google Scholar
Lin CW, Liu Q, Fournier-Viger P, Hong TP, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53:1–18
Article Google Scholar
Lin JC-W, Zhang Y, Chen C-H, Wu JM-T, Chen C-M, Hong T-P (2018) A multiple objective PSO-based approach for data sanitization. In: 2018 conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 148–151
Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: Proceedings of the 18th conference on database security (DBSEC 2004), pp 325–339
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4):45–54
Article Google Scholar
Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In: Proceedings of the 2002 international workshop on research issues in data engineering: engineering e-commerce/e-business systems (RIDE 2002), pp 151–163
Wang S-L, Jafari A (2005) Using unknowns for hiding sensitive predictive association rules. In: Proceedings of the 2005 IEEE international conference on information reuse and integration (IRI 2005), pp 223–228
Pontikakis E, Theodoridis Y, Tsitsonis A, Chang L, Verykios VS (2004) A quantitative and qualitative analysis of blocking in association rule hiding. In: Proceedings of the 2004 ACM workshop on privacy in the electronic society (WPES 2004), pp 29–30
Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713
Article Google Scholar
Sun X, Philip SY (2005) A border-based approach for hiding sensitive frequent itemsets. In: Null. IEEE, pp 426–433
Moustakides GV, Verykios VS (2008) A MaxMin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89
Article Google Scholar
Sharma S, Toshniwal D (2018) MR-I MaxMin-scalable two-phase border based knowledge hiding technique using mapreduce. Future Generation Computer Systems
Chen P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intelligent Data Analysis 20(3):495–514
Article Google Scholar
Al-Sai ZA, Abualigah LM (2017) Big data and e-government: a review. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 580–587
Sharma S, Toshniwal D (2017) Scalable two-phase co-occurring sensitive pattern hiding using MapReduce. Journal of Big Data 4(1):4
Article Google Scholar
Hong TP, Lin CW, Yang KT, Wang SL (2011) June. A heuristic data-sanitization approach based on TF-IDF. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 156–164
Lin C-W, Hong T-P, Hsu H-C (2014) Reducing side effects of hiding sensitive itemsets in privacy preserving data mining. Sci World J 2014:12. 235837
Google Scholar
Telikani A, Shahbahrami A (2017) Optimizing association rule hiding using combination of border and heuristic approaches. Appl Intell 47(2):544–557
Article Google Scholar
Yi X, Rao F-Y, Bertino E, Bouguettaya A (2015) Privacy-preserving association rule mining in cloud computing. In: Proceedings of the 10th ACM symposium on information, computer and communications security, pp 439–450
Zhang X, et al. (2014) Privacy preservation over big data in cloud systems. Security, Privacy and Trust in Cloud Systems. Springer, Berlin, pp 239–257
Book Google Scholar
Zhang X, et al. (2014) A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems 25(2):363–373
Article Google Scholar
Huang C, Lu R (2015) EFPA: efficient and flexible privacy-preserving mining of association rule in cloud. In: IEEE/CIC international conference on communications in China (ICCC)
Liu F, Shu X, Yao D, Butt AR (2015) Privacy-preserving scanning of big content for sensitive data exposure with MapReduce. In: Proceedings of the 5th ACM conference on data and application security and privacy, pp 195–206
Huang C, Lu R, Choo K-KR (2016) Secure and flexible cloud-assisted association rule mining over horizontally partitioned databases. J Comput Syst Sci 89:51–63
Article MathSciNet MATH Google Scholar
Telikani A, Shahbahrami A, Tavoli R (2015) Data sanitization in association rule mining based on impact factor. Journal of AI and Data Mining 3(2):131–140
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Uttrakhand, 247667, India
Shivani Sharma & Durga Toshniwal

Authors

Shivani Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Durga Toshniwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shivani Sharma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Comparing proposed OVnTSA with hybrid scheme

The presented OVnTSA approach is heuristics based scheme used for hiding sensitive patterns however, recently some recent approaches have been investigated using different methods like hybrid schemes [43].

Hybrid approach use border based scheme for hiding sensitive rules whereas proposed OVnTSA approach is sensitive frequent pattern hiding scheme. Therefore for comparing the performance in terms of both techniques, A set of frequent itemsets and all the derived rules out from those itemsets are considered as confidential. OVnTSA needs to hide the sensitive patterns and Hybrid needs to hide all the rules derived from those sensitive patterns. The performance is compared in terms of misses cost i.e. a number of non-sensitive itemsets lost over two different real benchmark datasets. From Fig. 6a and b, it can be clearly observed that with a different number of sensitive itemsets the misses cost using OVnTSA scheme is less than the hybrid scheme. Further, Fig. 6c and d, present the plot between misses cost verses varying minimum threshold. The experiments are performed over four benchmark datasets. It can be clearly stated that the proposed scheme has lower misses cost than the hybrid approach.

As the compared techniques are meant for a different purpose, i.e. Hybrid for rule hiding and OVnTSA is for sensitive pattern hiding, it may be possible they may perform differently in a real scenario. A hybrid approach may perform better for other applications i.e. when some set of sensitive rules is provided as input for sanitization. As it is possible that some of the rules which are derived from a sensitive itemset (in our case) maybe, are not sensitive or valid at all (i.e. do not hold the confidence value above the threshold).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, S., Toshniwal, D. MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data. Appl Intell 50, 4241–4260 (2020). https://doi.org/10.1007/s10489-020-01749-6

Download citation

Published: 15 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10489-020-01749-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data

Abstract

Access this article

Similar content being viewed by others

Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Comparing proposed OVnTSA with hybrid scheme

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MR-OVnTSA: a heuristics based sensitive pattern hiding approach for big data

Abstract

Access this article

Similar content being viewed by others

Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Comparing proposed OVnTSA with hybrid scheme

Appendix: Comparing proposed OVnTSA with hybrid scheme

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation