A High-Availability K-modes Clustering Method Based on Differential Privacy

Zhang, Shaobo; Yuan, Liujie; Li, Yuxing; Chen, Wenli; Ding, Yifei

doi:10.1007/978-3-030-95388-1_18

Shaobo Zhang^14,15,16,
Liujie Yuan^14,15,
Yuxing Li^14,15,
Wenli Chen^14,15 &
…
Yifei Ding^14,15

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13156))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1781 Accesses
1 Citations

Abstract

In categorical data mining, the K-modes algorithm is a classic algorithm that has been widely used. However, the data analyzed by the K-modes algorithm usually contains sensitive user information. If these data are leaked, it will seriously threaten the privacy of users. In response to this problem, the existing method that combines differential privacy with the K-modes algorithm can effectively prevent privacy leakage. Nevertheless, differential privacy adds noise to the data while protecting data privacy, which will reduce the availability of clustering results. In this paper, we propose a high-availability K-modes clustering mechanism based on differential privacy(HAKC). In this mechanism, based on the use of differential privacy to protect data privacy, we select the initial centroid of the clustering by calculation, and improve the calculation method of the distance between the data point and the centroid in the iterative process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, T., Jia., W., Xing, G., Li, M.: Exploiting statistical mobility models for efficient Wi-Fi deployment. IEEE Trans. Veh. Technol. 62(1), 360–373 (2012)
Google Scholar
Chen, M., Wang, T., Ota, K., Dong, M., et al.: Intelligent resource allocation management for vehicles network: an A3C learning approach. Comput. Commun. 151, 485–494 (2020)
Article Google Scholar
Zhang, S., Mao, X., Choo, K., Peng, T., et al.: A trajectory privacy-preserving scheme based on a dual-K mechanism for continuous location-based services. Inf. Sci. 527, 406–419 (2020)
Article Google Scholar
Zhang, S., Li, X., Tan, Z., Peng, T., et al.: A caching and spatial k-anonymity driven privacy enhancement scheme in continuous location-based services. Future Gener. Comput. Syst. 94, 40–50 (2019)
Article Google Scholar
Cao, W., Wu, S., Yu, Z., Wong, H.: Exploring correlations among tasks, clusters, and features for multitask clustering. IEEE Trans. Neural Netw. Learn. Syst. 30(2), 355–368 (2019)
Article Google Scholar
Wang, S., Sun, Y., Bao, Z.: On the efficiency of K-means clustering: evaluation, optimization, and algorithm selection. Proc. VLDB Endowment 14(2), 163–175 (2020)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article Google Scholar
Zhang, S., Wang, G., Alam, B., Liu, Q.: A dual privacy preserving scheme in continuous location-based services. IEEE Internet Things J. 5(5), 4191–4200 (2018)
Article Google Scholar
Liu, Q., Peng, Y., Wu, J., Wang, T., et al.: Secure multi-keyword fuzzy searches with enhanced service quality in cloud computing. IEEE Trans. Netw. Serv. Manage. (2020). https://doi.org/10.1109/TNSM.2020.3045467
Wang, T., Cao, Z., Wang, S., Wang, J., et al.: Privacy-enhanced data collection based on deep learning for internet of vehicles. IEEE Trans. Ind. Inf. 16(10), 6663–6672 (2019)
Article Google Scholar
Yuan, L., Zhang, S., Zhu, G., Alinani, K., Peng.: privacy-preserving mechanism for mixed data clustering with local differential privacy. Concurrency Comput. Pract. Experience (2021). https://doi.org/10.1002/cpe.6503
Dwork, C.: Differential privacy. In: 33th International Conference on Automata, Languages and Programming - Volume Part II, pp. 1–19. Springer, Germany (2006)
Google Scholar
Dewri, R., Thurimella, R.: Exploiting service similarity for privacy in location-based search queries. IEEE Trans. Parallel Distrib. Syst. 25(2), 374–383 (2014)
Article Google Scholar
Jana, M.: Composition attack against social network data. Comput. Secur. 5, 115–129 (2018)
Google Scholar
Zhao, B., Yang, K., Wang, Z., Li, H., et al.: Anonymous and privacy-preserving federated learning with industrial big data. IEEE Trans. Ind. Inf. 17(9), 6314–6323 (2021)
Article Google Scholar
Jung, W., Kwon, S., Shim, K.: TIDY: publishing a time interval dataset with differential privacy. IEEE Trans. Knowl. Data Eng. 33(5), 2280–2294 (2021)
Article Google Scholar
Takagi, S., Cao, Y., Asano, Y., Yoshikawa, M.: Geo-graph-indistinguishability: protecting location privacy for LBS over road networks. In: Foley, S.N. (ed.) DBSec 2019. LNCS, vol. 11559, pp. 143–163. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22479-0_8
Chapter Google Scholar
Xiao, X., Tao, Y., Chen, M.: Optimal random perturbation at multiple privacy levels. Proc. VLDB Endowment 2(1), 814–825 (2010)
Article Google Scholar
Kifer, D.: On estimating the swapping rate for categorical data. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 557–566. ACM, USA (2015)
Google Scholar
Hardt, M., Rothblum, G.: multiplicative weights mechanism for privacy-preserving data analysis. In: 51th Annual Symposium on Foundations of Computer Science, pp. 61–70. IEEE, USA (2010)
Google Scholar
Mohan, P., Thakurta, A., Shi, E.: GUPT: privacy preserving data analysis made easy. In: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 349–360. ACM, USA (2012)
Google Scholar
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. J. ACM 60(2), 1–25 (2013)
Article Google Scholar
Li, N., Qardaji, W., Su, D., Cao, J.: PrivBasis: frequent itemset mining with differential privacy. Proc. VLDB Endowment 5(11), 1340–1351 (2012)
Article Google Scholar
Xiao, X., Wang, G., Gehrke, K.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2010)
Article Google Scholar
Blum, B., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM, USA (2005)
Google Scholar
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Article Google Scholar
Ni, L., Li, C., Wang, X., Jiang, H., et al.: DP-MCDBSCAN: differential privacy preserving multi-Core DBSCAN clustering for network user data. IEEE Access 6, 21053–21063 (2018)
Article Google Scholar
Su, D., Cao, J., Li, N., Bertino, E.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 1–33 (2017)
Article Google Scholar
Nguyen, H.: Privacy-preserving mechanisms for K-modes clustering. Comput. Secur. 78, 60–75 (2018)
Article Google Scholar
Ghosh, A., Rougharden, M., Sundararajan, M.: Universally utility maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Article MathSciNet Google Scholar
Mcsherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: 15th ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM, USA (2009)
Google Scholar
Bhatt, K., Dalal, P., Panwar, P.: A cluster centres initialization method for clustering categorical data using genetic algorithm. Int. J. Digital Appl. Contemp. Res. 2(1), 1–8 (2013)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Hunan Provincial Education Department of China under Grant number 21A0318, and the Research project on Teaching Reform of Ordinary Colleges and Universities in Hunan Province under Grant Number HNJG-2021-0651.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
Shaobo Zhang, Liujie Yuan, Yuxing Li, Wenli Chen & Yifei Ding
Hunan Key Laboratory of Service Computing and New Software Service Technology, Xiangtan, 411201, China
Shaobo Zhang, Liujie Yuan, Yuxing Li, Wenli Chen & Yifei Ding
Key Laboratory of Software Engineering for Complex Systems, College of Computer, National University of Defense Technology, Changsha, 410073, China
Shaobo Zhang

Authors

Shaobo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liujie Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yuxing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenli Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaobo Zhang .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Yongxuan Lai
Beijing Normal University, Zhuhai, China
Tian Wang
Xiamen University, Xiamen, China
Min Jiang
Tianjin University, Tianjin, China
Guangquan Xu
Hunan University, Changsha, China
Wei Liang
University of Naples Parthenope, Naples, Italy
Aniello Castiglione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S., Yuan, L., Li, Y., Chen, W., Ding, Y. (2022). A High-Availability K-modes Clustering Method Based on Differential Privacy. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-95388-1_18
Published: 23 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95387-4
Online ISBN: 978-3-030-95388-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A High-Availability K-modes Clustering Method Based on Differential Privacy