A weighted K-member clustering algorithm for K-anonymization

Yan, Yan; Herman, Eyeleko Anselme; Mahmood, Adnan; Feng, Tao; Xie, Pengshou

doi:10.1007/s00607-021-00922-0

A weighted K-member clustering algorithm for K-anonymization

Regular Paper
Published: 20 February 2021

Volume 103, pages 2251–2273, (2021)
Cite this article

Computing Aims and scope Submit manuscript

Yan Yan ORCID: orcid.org/0000-0002-2885-9867¹,
Eyeleko Anselme Herman¹,
Adnan Mahmood²,
Tao Feng¹ &
…
Pengshou Xie¹

739 Accesses
12 Citations
Explore all metrics

Abstract

As a representative model for privacy preserving data publishing, K-anonymity has raised a considerable number of questions for researchers over the past few decades. Among them, how to achieve data release without sacrificing the users’ privacy and how to maximize the availability of published data is the ultimate goal of privacy preserving data publishing. In order to enhance the clustering effect and reduce the unnecessary computation, this paper proposes a weighted K-member clustering algorithm. A series of weight indicators are designed to evaluate the outlyingness of records, distance between records, and information loss of the published data. The proposed algorithm can reduce the influence of outliers on the clustering effect and maintain the availability of data to the best possible extent during the clustering process. Experimental analysis suggests that the proposed method generates lower information loss, improves the clustering effect, and is less sensitive to outliers as compared with some existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Notes

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data.

References

Zheng WT, Zhongyue W, Tongtong Lv, Ma Y, Jia C (2018) K-anonymity algorithm based on improved clustering. In: Proceedings of the 18th international conference on algorithms and architectures for parallel processing (ICA3PP 2018). Guangzhou, China, November, pp 462–476
Huang Z, Liu S, Mao X, Chen K, Li J (2017) Insight of the protection for data security under selective opening attacks. Inf Sci 412:223–241
Article MATH Google Scholar
Li J, Huang X, Chen X, Xiang Y (2014) Insight of the protection for data security under selective opening attacks. IEEE Trans Parallel Distrib Syst 25:2201–2210
Article Google Scholar
Yan Y, Gao X, Adnan M, Feng T, Xie PS (2020) ENG differential private spatial decomposition and location publishing based on unbalanced quadtree partition algorithm. IEEE Access 8(1):104775–104787
Article Google Scholar
Yan Y, Wang BQ, Quan Z, Sheng Adnan M, Feng T, Xie PS (2020) Modelling the publishing process of big location data using deep learning prediction methods. Electronics 9(3):420
Article Google Scholar
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10:557–570
Article MathSciNet MATH Google Scholar
Meyerson A, Williams R (2004) On the complexity of optimal K-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04). Paris, France, pp 223–228
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’98). Seattle, WA, USA, p 188
Byun JW, Kamra S, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA’07). Bangkok, Thailand, pp 188–200
Lin J, MengCheng W (2008) An efficient clustering method for k-anonymization. In: Proceedings of the 11th international conference on extending database technology. Nantes, France, pp 46–50
Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization for privacy preservation with less information loss. ACM SIGKDD Explor Newsl 8:26–30
Article Google Scholar
Li H, Zhu H, Du S, Liang X, Shen X (2018) Privacy leakage of location sharing in mobile social networks: attacks and defense. IEEE Trans Depend Secur Comput 15:646–660
Article Google Scholar
Ren XM (2012) Research for privacy protection method based on K-anonymity. Harbin Engineering University (Master thesis)
Liu QH, Shen H, Sang Yp (2015) Privacy preserving data publishing for multiple numerical sensitive attributes. Tsinghua Sci Technol 20:246–254
Article Google Scholar
Bhaladhare PR, Jinwala DC (2016) Novel approaches for privacy preserving data mining in K-anonymity model. Inf Sci Eng 32:63–78
Google Scholar
Xin Y, Xie ZQ, Yang J (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143
Article Google Scholar
Palanisamy B, Liu L, Zhou Y, Wang Q (2018) Privacy-preserving publishing of multilevel utility-controlled graph datasets. ACM Trans Internet Technol 18:1–21
Article Google Scholar
Liu F, Li T (2018) A clustering K-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8
Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:58
Article Google Scholar
Tan PN, Steinbach M, Karpatne A, Kumar V (2019) Introduction to data mining, 2nd edn. Pearson, Boston, pp 563–565
Google Scholar
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29:93–104
Article Google Scholar
Meyerson A, Williams R (2016) The non-uniform k-center problem. In: Proceedings of the 43rd international colloquium on automata languages and programming (ICALP 2016). Rome, Italy, pp 223–228
Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. In: Processing of the VLDB endowment. pp 757–768
Huang L, Jiang S, Li J, Wu X (2018) Epsilon-coresets for clustering (with outliers) in doubling metrics. In: Proceedings of the 2018 IEEE 59th annual symposium on foundations of computer science (FOCS). pp 814–825
Ceccarello M, Pietracaprina A, Pucci G (2019) Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: Processing of the VLDB endowment. pp 766–778
Guha S, Li Y, Zhang Q (2017) Distributed partial clustering. In: Proceedings of the 29th ACM symposium on parallelism in algorithms and architectures (SPAA’17). Washington DC, USA, pp 143–152
Li S, Guo X (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Montréal, Canada, pp 7849–7857
Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15). pp 1063–1071
Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2017) A Scalable and Efficient Outlier Strategy for Categorical Data. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 210–217
Meltzer M (2015) Outlier detection in datasets with mixed-attributes. Vrije University (Master thesis)
Li Hang (2015) Learning to rank for information retrieval and natural language processing, 2nd edn. Morgan & Claypool, San Rafael
Google Scholar

Download references

Acknowledgements

The research-at-hand is duly supported by National Nature Science Foundation of China (Nos. 61762059, 61762060, and 61862040).

Author information

Authors and Affiliations

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Yan Yan, Eyeleko Anselme Herman, Tao Feng & Pengshou Xie
Department of Computing, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, 2109, Australia
Adnan Mahmood

Authors

Yan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Eyeleko Anselme Herman
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Tao Feng
View author publications
You can also search for this author in PubMed Google Scholar
Pengshou Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eyeleko Anselme Herman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, Y., Herman, E.A., Mahmood, A. et al. A weighted K-member clustering algorithm for K-anonymization. Computing 103, 2251–2273 (2021). https://doi.org/10.1007/s00607-021-00922-0

Download citation

Received: 12 September 2020
Accepted: 04 February 2021
Published: 20 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00607-021-00922-0

Keywords

Mathematics Subject Classification

68P27

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A weighted K-member clustering algorithm for K-anonymization

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Comprehensive Survey of Anomaly Detection Algorithms

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A weighted K-member clustering algorithm for K-anonymization

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Comprehensive Survey of Anomaly Detection Algorithms

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation