Abstract
Privacy-preserving data mining (PPDM) has become an important research topic, as it can hide sensitive information, while ensuring that information can still be extracted for decision making. While performing the sanitization progress for hiding the sensitive information, three side effects such as hiding failure, missing cost, and artificial cost happen at the same time. Several evolutionary algorithms were introduced to minimize those three side effects of PPDM using a single-objective function that generates one solution for sanitization. This paper presents a multiobjective algorithm (NSGA2DT) with two strategies for hiding sensitive information with transaction deletion based on the NSGA-II framework. To obtain better balance of side effects, the designed NSGA2DT takes database dissimilarity (Dis) as one more factor to achieve better performance in terms of four side effects. Moreover, instead of a single solution of the sanitization progress, the designed NSGA2DT provides more than one solutions than those of single-objective evolutionary algorithms, which shows flexibility to select the most appropriate transactions for deletion depending on user’s preference. A Fast SoRting strategy (FSR) and the pre-large concept are utilized, respectively, in this paper to find the optimized transactions for deletion and speed up the iterative process. Based on the developed NSGA2DT, the set of several Pareto solutions can be easily discovered, thus avoiding the problem of local optimization of single-objective approaches. Besides, the designed NSGA2DT does not require to set initial weights for evaluating the side effects, and thus, the results could not be seriously influenced by the predefined weights. Experimental results show that the proposed NSGA2DT provides satisfactory results with reduced side effects, compared to previous evolutionary approaches with single-objective function.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994a) Quest synthetic data generator. IBM Almaden Research Center. http://www.Almaden.ibm.com/cs/quest/syndata.html
Agrawal R, Srikant R (1994b) Fast algorithms for mining association rules in large databases. In: The international conference on very large data base. pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM international conference on management of data, vol 29. pp 439–450
Cheng P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intell Data Anal 20(3):495–514
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering. pp 106–114
Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: The international conference on database systems for advanced applications. pp 185–194
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–347
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding. pp 369–383
Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Derigs U, Kabath M, Zils M (1999) Adaptive genetic algorithms: a methodology for dynamic autoconfiguration of genetic search algorithms. Meta-Heuristics. pp 231–248
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, vol 3876. pp 265–284
Emmerich M, Beume N, Naujoks B (2005) An EMO algorithm using the hypervolume measure as selection criterion. In: The international conference on evolutionary multi-criterion optimization. pp 62–76
Emmerich MTM, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: The international conference on genetic algorithms. pp 416–423
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. pp 36–40
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, Boston
Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: The international conference on data warehousing and knowledge discovery. pp 407–417
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hasan ASMT, Jiang Q, Chen H, Wang S (2018) A new approach to privacy-preserving multiple independent data publishing. Appl Sci 8(5):1–22
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, Cambridge
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129
Hong TP, Lin CW, Yang KT, Wang SL (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510
Hongcheng T (2012) An improved adaptive genetic algorithm. In: Knowledge discovery and data mining. pp 717–723
Kalyani G, Chandra Sekhara Rao MVP, Janakiramaiah B (2017) Decision tree based data reconstruction for privacy preserving classification rule mining. Informatica 41:289–304
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: IEEE international conference on neural networks. pp 1942–1948
Knowles J, Corne D (1999) The pareto archived evolution strategy: a new baseline algorithm for pareto multiobjective optimisation. In: The congress on evolutionary computation. pp 98–105
Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4:201–227
Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J 398269:1–13
Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230
Lin JCW, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53(C):1–18
Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology. pp 36–54
Liu F, Li T (2018) A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 4945152:1–8
Marco D, Sabrina O, Thomas S (2004) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Motlagh FN, Sajedi H (2016) MOSAR: a multi-objective strategy for hiding sensitive association rules using genetic algorithm. Appl Artif Intell 30(9):823–843
Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining. pp 43–54
Ping G, Chunbo X, Yi C, Jing L, Yanqing L (2014) Adaptive ant colony optimization algorithm. In: The international conference on mechatronics and control. pp 95–98
Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: The international conference on genetic algorithms, vol 2, no 1. pp 93–100
Srinivas N, Deb K (1994) Multiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33:50–57
Wu YH, Chiang CM, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19:29–42
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271
Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimization and control with applications to industrial problems. pp 95–100
Zhan ZH, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern B 39(6):1362–1381
Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731
Acknowledgements
This research was partially supported by the Shenzhen Technical Project under JCYJ20170307151733005 and KQJSCX20170726103424709.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest in this paper.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lin, J.CW., Zhang, Y., Zhang, B. et al. Hiding sensitive itemsets with multiple objective optimization. Soft Comput 23, 12779–12797 (2019). https://doi.org/10.1007/s00500-019-03829-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03829-3