Skip to main content

Differentially Private Clustering Algorithm for Mixed Data

  • Conference paper
  • First Online:
Ubiquitous Security (UbiSec 2022)

Abstract

Inspired by the current practice where mixed data is the norm instead of exceptions and the privacy concerns on data management, we propose a differentially private mixed data clustering (DPMC) algorithm considering the cluster analysis on both numerical and categorical data. First, we design an adaptive privacy budget allocation method to analyze the loss due to added noise, thus determining the number of iterations and the privacy budget given accuracy and dataset characteristics. Next, we develop an optimization method based on consistency inference for categorical attributes, in order to improve the clustering performance. Finally, comparative experiments have been carried out using four real-world datasets. The results demonstrate significant improvement in balancing between privacy protection and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liu, B., Ding, M., Shaham, S., et al.: When machine learning meets privacy: a survey and outlook. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)

    Article  Google Scholar 

  2. Ji, S., Du, T., Li, J., et al.: A review of machine learning model security and privacy research. J. Softw. 32(01), 41–67 (2021)

    MATH  Google Scholar 

  3. Liao, S., Wu, J., Mumtaz, S., et al.: Cognitive balance for fog computing resource in internet of things: an edge learning approach. IEEE Trans. Mob. Comput. 21(5), 1596–1608 (2022)

    Article  Google Scholar 

  4. Lin, X., Wu, J., Bashir, A., et al.: Blockchain-based incentive energy-knowledge trading in IoT: joint power transfer and AI design. IEEE Internet Things J. 9(16), 14685–14698 (2022)

    Article  Google Scholar 

  5. Wang, N., Yang, W., Wang, X., et al.: A blockchain based privacy-preserving federated learning scheme for Internet of Vehicles. Digital Commun. Netw. (2022)

    Google Scholar 

  6. Yang, W., Wang, N., Guan, Z., Wu, L., Du, X., Guizani, M.: A practical cross-device federated learning framework over 5G networks. IEEE Wireless Commun. (2022).https://doi.org/10.1109/MWC.005.2100435

  7. Wei, L., Chen, C., Zhang, L., et al.: The issues of machine learning security and privacy protection. J. Comput. Res. Dev. 57(10), 2066–2085 (2020)

    Google Scholar 

  8. Li, Y., Yin, Y., Gao, H., et al.: Non-aggregated data sharing for privacy protection: a review. J. Commun. 42(06), 195–212 (2021)

    Google Scholar 

  9. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1

    Chapter  MATH  Google Scholar 

  10. Guan, Z., Lv, Z., Sun, X., et al.: A differentially private big data nonparametric Bayesian clustering algorithm in smart grid. IEEE Trans. Netw. Sci. Eng. 7(4), 2631–2641 (2020)

    Article  MathSciNet  Google Scholar 

  11. Su, D., Cao, J., Li, N., et al.: Differential private k-means clustering. In: 2016 Proceedings of the sixth ACM conference on data and application security and privacy, pp. 26–37. ACM (2016)

    Google Scholar 

  12. Zhu, S., Liu, S., Sun, G.: Shape-similar differential privacy trajectory protection mechanism based on relative entropy and K-means. J. Commun. 42(02), 113–123 (2021)

    Google Scholar 

  13. Liu, Q., Yu, J., Han, J., et al.: Differentially private and utility-aware publication of trajectory data. Expert Syst. Appl. 180(7), 115–120 (2021)

    Google Scholar 

  14. Gao, Z., Sun, Y., Cui, X., et al.: Privacy-preserving hybrid K-means. Int. J. Data Warehousing Mining (IJDWM) 14(2), 1–17 (2018)

    Article  Google Scholar 

  15. Xu, Q., et al.: Trajectory data protection based on differential privacy k-means. In: 2020 39th Chinese Control Conference (CCC), pp. 7649–7654. IEEE (2020)

    Google Scholar 

  16. Chen, H., Yan, Z., Zhu, X., et al.: Differential privacy high dimensional data publishing method based on cluster analysis. J. Comput. Appl. 41(09), 2578–2585 (2021)

    Google Scholar 

  17. Sweeney, L.: k-anonymity: a model for protecting privacy. Internat. J. Uncertain. Fuzziness Knowledge-Based Syst. 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  18. Machanavajjhala, A., Gehrke, J., Kifer, D., et al.: l-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), p. 24. IEEE (2006)

    Google Scholar 

  19. Dwork, C., McSherry, F., Nissim, K., et al.: Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, pp. 265–284. Springer, Berlin Heidelberg (2006)

    Google Scholar 

  20. Liu, Z., Lv, H., Li, M., et al.: A novel self-adaptive grid-partitioning noise optimization algorithm based on differential privacy. Comput. Sci. Inf. Syst. 16(3), 915–938 (2019)

    Article  Google Scholar 

  21. Awan, J., Slaykovic, A.: Structure and sensitivity in differential privacy: comparing k-norm mechanisms. J. Am. Stat. Assoc. 116(534), 935–954 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  22. Blum, A., Dwork, C., Mcsherry, F., et al.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)

    Google Scholar 

  23. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

  24. Nguyen, H.: Privacy-preserving mechanisms for k-modes clustering. Comput. Secur. 78(sep.), 60–75 (2018)

    Google Scholar 

  25. Varun, R., Gangwar, R.: Geometrical link aware geocast routing for energy balancing in wireless sensor networks. J. Discrete Math. Sci. Cryptography 24(5), 1375–1391 (2021)

    Article  MATH  Google Scholar 

  26. Nguyen, H., Chaturved, A., Xu, Y.: Differentially private k-Means via exponential mechanism and max cover. In: 2021 Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9101–9108. AAAI (2021)

    Google Scholar 

  27. Liu, M., Zheng, H., Liu, Q., et al.: A backdoor embedding method for backdoor detection in deep neural networks. In: Proceedings of the First International Conference on Ubiquitous Security (UbiSec 2021), Guangzhou, China, 28–31 December 2021, Communications in Computer and Information Science 1557, pp. 1–12, Springer (2022)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the science and technology project of State Grid Corporation of China entitled: "Research on Power Marketing Data Sharing and Model Fusion Technology Based on Federated Learning" (Grant No. 5700-202113262A-0–0-00).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhitao Guan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, K., Chen, L., Yang, H., Luo, D., Yuan, S., Guan, Z. (2023). Differentially Private Clustering Algorithm for Mixed Data. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0272-9_27

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0271-2

  • Online ISBN: 978-981-99-0272-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics