Abstract
As a representative model for privacy preserving data publishing, K-anonymity has raised a considerable number of questions for researchers over the past few decades. Among them, how to achieve data release without sacrificing the users’ privacy and how to maximize the availability of published data is the ultimate goal of privacy preserving data publishing. In order to enhance the clustering effect and reduce the unnecessary computation, this paper proposes a weighted K-member clustering algorithm. A series of weight indicators are designed to evaluate the outlyingness of records, distance between records, and information loss of the published data. The proposed algorithm can reduce the influence of outliers on the clustering effect and maintain the availability of data to the best possible extent during the clustering process. Experimental analysis suggests that the proposed method generates lower information loss, improves the clustering effect, and is less sensitive to outliers as compared with some existing methods.
Similar content being viewed by others
References
Zheng WT, Zhongyue W, Tongtong Lv, Ma Y, Jia C (2018) K-anonymity algorithm based on improved clustering. In: Proceedings of the 18th international conference on algorithms and architectures for parallel processing (ICA3PP 2018). Guangzhou, China, November, pp 462–476
Huang Z, Liu S, Mao X, Chen K, Li J (2017) Insight of the protection for data security under selective opening attacks. Inf Sci 412:223–241
Li J, Huang X, Chen X, Xiang Y (2014) Insight of the protection for data security under selective opening attacks. IEEE Trans Parallel Distrib Syst 25:2201–2210
Yan Y, Gao X, Adnan M, Feng T, Xie PS (2020) ENG differential private spatial decomposition and location publishing based on unbalanced quadtree partition algorithm. IEEE Access 8(1):104775–104787
Yan Y, Wang BQ, Quan Z, Sheng Adnan M, Feng T, Xie PS (2020) Modelling the publishing process of big location data using deep learning prediction methods. Electronics 9(3):420
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10:557–570
Meyerson A, Williams R (2004) On the complexity of optimal K-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04). Paris, France, pp 223–228
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’98). Seattle, WA, USA, p 188
Byun JW, Kamra S, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA’07). Bangkok, Thailand, pp 188–200
Lin J, MengCheng W (2008) An efficient clustering method for k-anonymization. In: Proceedings of the 11th international conference on extending database technology. Nantes, France, pp 46–50
Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization for privacy preservation with less information loss. ACM SIGKDD Explor Newsl 8:26–30
Li H, Zhu H, Du S, Liang X, Shen X (2018) Privacy leakage of location sharing in mobile social networks: attacks and defense. IEEE Trans Depend Secur Comput 15:646–660
Ren XM (2012) Research for privacy protection method based on K-anonymity. Harbin Engineering University (Master thesis)
Liu QH, Shen H, Sang Yp (2015) Privacy preserving data publishing for multiple numerical sensitive attributes. Tsinghua Sci Technol 20:246–254
Bhaladhare PR, Jinwala DC (2016) Novel approaches for privacy preserving data mining in K-anonymity model. Inf Sci Eng 32:63–78
Xin Y, Xie ZQ, Yang J (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143
Palanisamy B, Liu L, Zhou Y, Wang Q (2018) Privacy-preserving publishing of multilevel utility-controlled graph datasets. ACM Trans Internet Technol 18:1–21
Liu F, Li T (2018) A clustering K-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:58
Tan PN, Steinbach M, Karpatne A, Kumar V (2019) Introduction to data mining, 2nd edn. Pearson, Boston, pp 563–565
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29:93–104
Meyerson A, Williams R (2016) The non-uniform k-center problem. In: Proceedings of the 43rd international colloquium on automata languages and programming (ICALP 2016). Rome, Italy, pp 223–228
Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. In: Processing of the VLDB endowment. pp 757–768
Huang L, Jiang S, Li J, Wu X (2018) Epsilon-coresets for clustering (with outliers) in doubling metrics. In: Proceedings of the 2018 IEEE 59th annual symposium on foundations of computer science (FOCS). pp 814–825
Ceccarello M, Pietracaprina A, Pucci G (2019) Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: Processing of the VLDB endowment. pp 766–778
Guha S, Li Y, Zhang Q (2017) Distributed partial clustering. In: Proceedings of the 29th ACM symposium on parallelism in algorithms and architectures (SPAA’17). Washington DC, USA, pp 143–152
Li S, Guo X (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Montréal, Canada, pp 7849–7857
Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15). pp 1063–1071
Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2017) A Scalable and Efficient Outlier Strategy for Categorical Data. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 210–217
Meltzer M (2015) Outlier detection in datasets with mixed-attributes. Vrije University (Master thesis)
Li Hang (2015) Learning to rank for information retrieval and natural language processing, 2nd edn. Morgan & Claypool, San Rafael
Acknowledgements
The research-at-hand is duly supported by National Nature Science Foundation of China (Nos. 61762059, 61762060, and 61862040).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, Y., Herman, E.A., Mahmood, A. et al. A weighted K-member clustering algorithm for K-anonymization. Computing 103, 2251–2273 (2021). https://doi.org/10.1007/s00607-021-00922-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-021-00922-0