Abstract
In categorical data mining, the K-modes algorithm is a classic algorithm that has been widely used. However, the data analyzed by the K-modes algorithm usually contains sensitive user information. If these data are leaked, it will seriously threaten the privacy of users. In response to this problem, the existing method that combines differential privacy with the K-modes algorithm can effectively prevent privacy leakage. Nevertheless, differential privacy adds noise to the data while protecting data privacy, which will reduce the availability of clustering results. In this paper, we propose a high-availability K-modes clustering mechanism based on differential privacy(HAKC). In this mechanism, based on the use of differential privacy to protect data privacy, we select the initial centroid of the clustering by calculation, and improve the calculation method of the distance between the data point and the centroid in the iterative process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, T., Jia., W., Xing, G., Li, M.: Exploiting statistical mobility models for efficient Wi-Fi deployment. IEEE Trans. Veh. Technol. 62(1), 360–373 (2012)
Chen, M., Wang, T., Ota, K., Dong, M., et al.: Intelligent resource allocation management for vehicles network: an A3C learning approach. Comput. Commun. 151, 485–494 (2020)
Zhang, S., Mao, X., Choo, K., Peng, T., et al.: A trajectory privacy-preserving scheme based on a dual-K mechanism for continuous location-based services. Inf. Sci. 527, 406–419 (2020)
Zhang, S., Li, X., Tan, Z., Peng, T., et al.: A caching and spatial k-anonymity driven privacy enhancement scheme in continuous location-based services. Future Gener. Comput. Syst. 94, 40–50 (2019)
Cao, W., Wu, S., Yu, Z., Wong, H.: Exploring correlations among tasks, clusters, and features for multitask clustering. IEEE Trans. Neural Netw. Learn. Syst. 30(2), 355–368 (2019)
Wang, S., Sun, Y., Bao, Z.: On the efficiency of K-means clustering: evaluation, optimization, and algorithm selection. Proc. VLDB Endowment 14(2), 163–175 (2020)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Zhang, S., Wang, G., Alam, B., Liu, Q.: A dual privacy preserving scheme in continuous location-based services. IEEE Internet Things J. 5(5), 4191–4200 (2018)
Liu, Q., Peng, Y., Wu, J., Wang, T., et al.: Secure multi-keyword fuzzy searches with enhanced service quality in cloud computing. IEEE Trans. Netw. Serv. Manage. (2020). https://doi.org/10.1109/TNSM.2020.3045467
Wang, T., Cao, Z., Wang, S., Wang, J., et al.: Privacy-enhanced data collection based on deep learning for internet of vehicles. IEEE Trans. Ind. Inf. 16(10), 6663–6672 (2019)
Yuan, L., Zhang, S., Zhu, G., Alinani, K., Peng.: privacy-preserving mechanism for mixed data clustering with local differential privacy. Concurrency Comput. Pract. Experience (2021). https://doi.org/10.1002/cpe.6503
Dwork, C.: Differential privacy. In: 33th International Conference on Automata, Languages and Programming - Volume Part II, pp. 1–19. Springer, Germany (2006)
Dewri, R., Thurimella, R.: Exploiting service similarity for privacy in location-based search queries. IEEE Trans. Parallel Distrib. Syst. 25(2), 374–383 (2014)
Jana, M.: Composition attack against social network data. Comput. Secur. 5, 115–129 (2018)
Zhao, B., Yang, K., Wang, Z., Li, H., et al.: Anonymous and privacy-preserving federated learning with industrial big data. IEEE Trans. Ind. Inf. 17(9), 6314–6323 (2021)
Jung, W., Kwon, S., Shim, K.: TIDY: publishing a time interval dataset with differential privacy. IEEE Trans. Knowl. Data Eng. 33(5), 2280–2294 (2021)
Takagi, S., Cao, Y., Asano, Y., Yoshikawa, M.: Geo-graph-indistinguishability: protecting location privacy for LBS over road networks. In: Foley, S.N. (ed.) DBSec 2019. LNCS, vol. 11559, pp. 143–163. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22479-0_8
Xiao, X., Tao, Y., Chen, M.: Optimal random perturbation at multiple privacy levels. Proc. VLDB Endowment 2(1), 814–825 (2010)
Kifer, D.: On estimating the swapping rate for categorical data. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 557–566. ACM, USA (2015)
Hardt, M., Rothblum, G.: multiplicative weights mechanism for privacy-preserving data analysis. In: 51th Annual Symposium on Foundations of Computer Science, pp. 61–70. IEEE, USA (2010)
Mohan, P., Thakurta, A., Shi, E.: GUPT: privacy preserving data analysis made easy. In: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 349–360. ACM, USA (2012)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. J. ACM 60(2), 1–25 (2013)
Li, N., Qardaji, W., Su, D., Cao, J.: PrivBasis: frequent itemset mining with differential privacy. Proc. VLDB Endowment 5(11), 1340–1351 (2012)
Xiao, X., Wang, G., Gehrke, K.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2010)
Blum, B., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM, USA (2005)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Ni, L., Li, C., Wang, X., Jiang, H., et al.: DP-MCDBSCAN: differential privacy preserving multi-Core DBSCAN clustering for network user data. IEEE Access 6, 21053–21063 (2018)
Su, D., Cao, J., Li, N., Bertino, E.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 1–33 (2017)
Nguyen, H.: Privacy-preserving mechanisms for K-modes clustering. Comput. Secur. 78, 60–75 (2018)
Ghosh, A., Rougharden, M., Sundararajan, M.: Universally utility maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Mcsherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: 15th ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM, USA (2009)
Bhatt, K., Dalal, P., Panwar, P.: A cluster centres initialization method for clustering categorical data using genetic algorithm. Int. J. Digital Appl. Contemp. Res. 2(1), 1–8 (2013)
Acknowledgments
This work was supported in part by the Hunan Provincial Education Department of China under Grant number 21A0318, and the Research project on Teaching Reform of Ordinary Colleges and Universities in Hunan Province under Grant Number HNJG-2021-0651.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, S., Yuan, L., Li, Y., Chen, W., Ding, Y. (2022). A High-Availability K-modes Clustering Method Based on Differential Privacy. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-95388-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95387-4
Online ISBN: 978-3-030-95388-1
eBook Packages: Computer ScienceComputer Science (R0)