Abstract
Inspired by the current practice where mixed data is the norm instead of exceptions and the privacy concerns on data management, we propose a differentially private mixed data clustering (DPMC) algorithm considering the cluster analysis on both numerical and categorical data. First, we design an adaptive privacy budget allocation method to analyze the loss due to added noise, thus determining the number of iterations and the privacy budget given accuracy and dataset characteristics. Next, we develop an optimization method based on consistency inference for categorical attributes, in order to improve the clustering performance. Finally, comparative experiments have been carried out using four real-world datasets. The results demonstrate significant improvement in balancing between privacy protection and performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, B., Ding, M., Shaham, S., et al.: When machine learning meets privacy: a survey and outlook. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)
Ji, S., Du, T., Li, J., et al.: A review of machine learning model security and privacy research. J. Softw. 32(01), 41–67 (2021)
Liao, S., Wu, J., Mumtaz, S., et al.: Cognitive balance for fog computing resource in internet of things: an edge learning approach. IEEE Trans. Mob. Comput. 21(5), 1596–1608 (2022)
Lin, X., Wu, J., Bashir, A., et al.: Blockchain-based incentive energy-knowledge trading in IoT: joint power transfer and AI design. IEEE Internet Things J. 9(16), 14685–14698 (2022)
Wang, N., Yang, W., Wang, X., et al.: A blockchain based privacy-preserving federated learning scheme for Internet of Vehicles. Digital Commun. Netw. (2022)
Yang, W., Wang, N., Guan, Z., Wu, L., Du, X., Guizani, M.: A practical cross-device federated learning framework over 5G networks. IEEE Wireless Commun. (2022).https://doi.org/10.1109/MWC.005.2100435
Wei, L., Chen, C., Zhang, L., et al.: The issues of machine learning security and privacy protection. J. Comput. Res. Dev. 57(10), 2066–2085 (2020)
Li, Y., Yin, Y., Gao, H., et al.: Non-aggregated data sharing for privacy protection: a review. J. Commun. 42(06), 195–212 (2021)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Guan, Z., Lv, Z., Sun, X., et al.: A differentially private big data nonparametric Bayesian clustering algorithm in smart grid. IEEE Trans. Netw. Sci. Eng. 7(4), 2631–2641 (2020)
Su, D., Cao, J., Li, N., et al.: Differential private k-means clustering. In: 2016 Proceedings of the sixth ACM conference on data and application security and privacy, pp. 26–37. ACM (2016)
Zhu, S., Liu, S., Sun, G.: Shape-similar differential privacy trajectory protection mechanism based on relative entropy and K-means. J. Commun. 42(02), 113–123 (2021)
Liu, Q., Yu, J., Han, J., et al.: Differentially private and utility-aware publication of trajectory data. Expert Syst. Appl. 180(7), 115–120 (2021)
Gao, Z., Sun, Y., Cui, X., et al.: Privacy-preserving hybrid K-means. Int. J. Data Warehousing Mining (IJDWM) 14(2), 1–17 (2018)
Xu, Q., et al.: Trajectory data protection based on differential privacy k-means. In: 2020 39th Chinese Control Conference (CCC), pp. 7649–7654. IEEE (2020)
Chen, H., Yan, Z., Zhu, X., et al.: Differential privacy high dimensional data publishing method based on cluster analysis. J. Comput. Appl. 41(09), 2578–2585 (2021)
Sweeney, L.: k-anonymity: a model for protecting privacy. Internat. J. Uncertain. Fuzziness Knowledge-Based Syst. 10(05), 557–570 (2002)
Machanavajjhala, A., Gehrke, J., Kifer, D., et al.: l-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), p. 24. IEEE (2006)
Dwork, C., McSherry, F., Nissim, K., et al.: Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, pp. 265–284. Springer, Berlin Heidelberg (2006)
Liu, Z., Lv, H., Li, M., et al.: A novel self-adaptive grid-partitioning noise optimization algorithm based on differential privacy. Comput. Sci. Inf. Syst. 16(3), 915–938 (2019)
Awan, J., Slaykovic, A.: Structure and sensitivity in differential privacy: comparing k-norm mechanisms. J. Am. Stat. Assoc. 116(534), 935–954 (2021)
Blum, A., Dwork, C., Mcsherry, F., et al.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Nguyen, H.: Privacy-preserving mechanisms for k-modes clustering. Comput. Secur. 78(sep.), 60–75 (2018)
Varun, R., Gangwar, R.: Geometrical link aware geocast routing for energy balancing in wireless sensor networks. J. Discrete Math. Sci. Cryptography 24(5), 1375–1391 (2021)
Nguyen, H., Chaturved, A., Xu, Y.: Differentially private k-Means via exponential mechanism and max cover. In: 2021 Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9101–9108. AAAI (2021)
Liu, M., Zheng, H., Liu, Q., et al.: A backdoor embedding method for backdoor detection in deep neural networks. In: Proceedings of the First International Conference on Ubiquitous Security (UbiSec 2021), Guangzhou, China, 28–31 December 2021, Communications in Computer and Information Science 1557, pp. 1–12, Springer (2022)
Acknowledgments
This work is supported by the science and technology project of State Grid Corporation of China entitled: "Research on Power Marketing Data Sharing and Model Fusion Technology Based on Federated Learning" (Grant No. 5700-202113262A-0–0-00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cheng, K., Chen, L., Yang, H., Luo, D., Yuan, S., Guan, Z. (2023). Differentially Private Clustering Algorithm for Mixed Data. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_27
Download citation
DOI: https://doi.org/10.1007/978-981-99-0272-9_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)