Abstract
Privacy preservation becomes a more and more serious problem in data publication, which has drawn dramatic attention in research and development. Recently, several privacy preservation models and algorithms have been proposed for publishing data. However, most of the previous methods suffer from more than one drawback as follows: (i) Could not be used on multi-record datasets. (ii) Only guarantee one-way generalization. (iii) User privacy preferences are ignored. In order to satisfy higher privacy requirements and make it suitable for multi-record publishing datasets, a bidirectional personalized generalization (BP-generalization) model is proposed as a new solution in this paper. The rational is to focus anonymous objects on both relational and set-valued information. First, we merge tuples with the same attribute values in multi-record datasets to ensure the validity of quasi-identifier anonymity. Second, by enforcing l-diversity on equivalence groups and k-anonymity on fingerprint buckets respectively, privacy preservation model may resist bi-directional chain attack. Finally, a new hierarchical generalization strategy is also proposed for personal privacy preservation of sensitive attributes, then different generalization rules can be adopted for different levels of sensitive values. Extensive experimental results on two datasets show that the performance of our method is better than state-of-art techniques in terms of efficiency and information loss.
Similar content being viewed by others
References
Acs G, Achara JP, Castelluccia C (2015) Probabilistic km-anonymity efficient anonymization of large set-valued datasets. In: 2015 IEEE international conference on big data (Big Data), pp 1164–1173
Chen Z, Kang H, Yin S, Kim S (2016) An efficient privacy protection in mobility social network services with novel clustering-based anonymization. Eurasip J Wirel Commun Netw 2016(1):275
Ge Z, Song Z, Ding SX (2017) Data mining and analytics in the process industry: the role of machine learning. IEEE Access 5:20590–20616
Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: 33rd international conference on very large data bases, VLDB 2007–conference proceedings, pp 758 – 769
He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. Proc VLDB Endow 2(1):934–945
Le J, Zhang D, Mu N, Liao X, Yang F (2018) Anonymous privacy preservation based on m-signature and fuzzy processing for real-time data release. IEEE Trans Syst Man Cybern Syst 99:1–13
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: 22nd International conference on data engineering (ICDE’06) vol 1, p 25
Li B, Liu Y, Han X, Zhang J (2018) Cross-bucket generalization for information and privacy preservation. IEEE Trans Knowl Data Eng 30(3):449–459
Liu X, Xie Q, Wang L (2017) Personalized extended (alpha, k)-anonymity model for privacy preserving data publishing. Concurr Comput Pract Exp 29(6):e3886
Loukides G, Gkoulalas-Divanis A, Shao J (2013) Efficient and flexible anonymization of transaction data. Knowl Inf Syst 36(1):153–210
Lu Q, Wang C, Xiong Y, Xia H, Huang W, Gong X (2017) Personalized privacy-preserving trajectory data publishing. Chin J Electron 26(2):285–291
Ni S, Xie M, Qian Q (2017) Clustering based k-anonymity algorithm for privacy preservation. IJ Netw Secur 19(6):1062–1071
Poulis G, Loukides G, Gkoulalas-Divanis A, Skiadopoulos S (2013) Anonymizing data with relational and transaction attributes. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8190 LNAI(PART 3), pp 353–369
Sei Y, Okumura H, Takenouchi T, Ohsuga A (2017) Anonymization of sensitive quasi-identifiers for l-diversity and t-closeness. In: IEEE transactions on dependable and secure computing, pp 1–1
Sheela MA, Vijayalakshmi K (2017) Partition based perturbation for privacy preserving distributed data mining. Cybernetics and Information Technologies 17(2):44–55
Sopaoglu U, Abul O (2017) A top-down k-anonymization implementation for apache spark. In: 2017 IEEE international conference on big data (big data), pp 4513–4521
Sweeney L (2002) K-generalization: A model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Terrovitis M, Mamoulis N, Kalnis P (2011) Local and global recoding methods for anonymizing set-valued data. VLDB J 20(1):83–106
Terrovitis M, Liagouris J, Mamoulis N, Skiadopoulos S (2012) Privacy preservation by disassociation. Proc VLDB Endow 5(10):944–955
Wang K, Wang P, Fu AW, Wong RCW (2016) Generalized bucketization scheme for flexible privacy settings. Inf Sci 348:377–393
Wang SL, Tsai YC, Kao HY, Hong TP (2011) Extending suppression for anonymization on set-valued data. Int J Innov Comput Inf Control 7(12):6849–6863
Wang SL, Tsai YC, Kao HY (2014) On anonymizing transactions with sensitive items. Appl Intell 41(4):1043–1058
Xiao X, Yi K, Tao Y (2010) The hardness and approximation algorithms for l-diversity. Advances in Database Technology—EDBT 2010. In: 13th International conference on extending database technology, proceedings, pp 135 – 146
Xin Y, Xie Z, Yang J (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143
Zakerzadeh H, Aggarwal CC, Barker K (2016) Managing dimensionality in data privacy anonymization. Knowl Inf Syst 49(1):341–373
Zhang H, Zhou Z, Ye L (2015) Towards privacy preserving publishing of set-valued data on hybrid cloud. IEEE Trans Cloud Comput 6(2):316–329
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, X., Zhou, Z. A generalization model for multi-record privacy preservation. J Ambient Intell Human Comput 11, 2899–2912 (2020). https://doi.org/10.1007/s12652-019-01430-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-019-01430-y