Skip to main content

A High-Availability K-modes Clustering Method Based on Differential Privacy

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13156))

Abstract

In categorical data mining, the K-modes algorithm is a classic algorithm that has been widely used. However, the data analyzed by the K-modes algorithm usually contains sensitive user information. If these data are leaked, it will seriously threaten the privacy of users. In response to this problem, the existing method that combines differential privacy with the K-modes algorithm can effectively prevent privacy leakage. Nevertheless, differential privacy adds noise to the data while protecting data privacy, which will reduce the availability of clustering results. In this paper, we propose a high-availability K-modes clustering mechanism based on differential privacy(HAKC). In this mechanism, based on the use of differential privacy to protect data privacy, we select the initial centroid of the clustering by calculation, and improve the calculation method of the distance between the data point and the centroid in the iterative process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, T., Jia., W., Xing, G., Li, M.: Exploiting statistical mobility models for efficient Wi-Fi deployment. IEEE Trans. Veh. Technol. 62(1), 360–373 (2012)

    Google Scholar 

  2. Chen, M., Wang, T., Ota, K., Dong, M., et al.: Intelligent resource allocation management for vehicles network: an A3C learning approach. Comput. Commun. 151, 485–494 (2020)

    Article  Google Scholar 

  3. Zhang, S., Mao, X., Choo, K., Peng, T., et al.: A trajectory privacy-preserving scheme based on a dual-K mechanism for continuous location-based services. Inf. Sci. 527, 406–419 (2020)

    Article  Google Scholar 

  4. Zhang, S., Li, X., Tan, Z., Peng, T., et al.: A caching and spatial k-anonymity driven privacy enhancement scheme in continuous location-based services. Future Gener. Comput. Syst. 94, 40–50 (2019)

    Article  Google Scholar 

  5. Cao, W., Wu, S., Yu, Z., Wong, H.: Exploring correlations among tasks, clusters, and features for multitask clustering. IEEE Trans. Neural Netw. Learn. Syst. 30(2), 355–368 (2019)

    Article  Google Scholar 

  6. Wang, S., Sun, Y., Bao, Z.: On the efficiency of K-means clustering: evaluation, optimization, and algorithm selection. Proc. VLDB Endowment 14(2), 163–175 (2020)

    Article  Google Scholar 

  7. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)

    Article  Google Scholar 

  8. Zhang, S., Wang, G., Alam, B., Liu, Q.: A dual privacy preserving scheme in continuous location-based services. IEEE Internet Things J. 5(5), 4191–4200 (2018)

    Article  Google Scholar 

  9. Liu, Q., Peng, Y., Wu, J., Wang, T., et al.: Secure multi-keyword fuzzy searches with enhanced service quality in cloud computing. IEEE Trans. Netw. Serv. Manage. (2020). https://doi.org/10.1109/TNSM.2020.3045467

  10. Wang, T., Cao, Z., Wang, S., Wang, J., et al.: Privacy-enhanced data collection based on deep learning for internet of vehicles. IEEE Trans. Ind. Inf. 16(10), 6663–6672 (2019)

    Article  Google Scholar 

  11. Yuan, L., Zhang, S., Zhu, G., Alinani, K., Peng.: privacy-preserving mechanism for mixed data clustering with local differential privacy. Concurrency Comput. Pract. Experience (2021). https://doi.org/10.1002/cpe.6503

  12. Dwork, C.: Differential privacy. In: 33th International Conference on Automata, Languages and Programming - Volume Part II, pp. 1–19. Springer, Germany (2006)

    Google Scholar 

  13. Dewri, R., Thurimella, R.: Exploiting service similarity for privacy in location-based search queries. IEEE Trans. Parallel Distrib. Syst. 25(2), 374–383 (2014)

    Article  Google Scholar 

  14. Jana, M.: Composition attack against social network data. Comput. Secur. 5, 115–129 (2018)

    Google Scholar 

  15. Zhao, B., Yang, K., Wang, Z., Li, H., et al.: Anonymous and privacy-preserving federated learning with industrial big data. IEEE Trans. Ind. Inf. 17(9), 6314–6323 (2021)

    Article  Google Scholar 

  16. Jung, W., Kwon, S., Shim, K.: TIDY: publishing a time interval dataset with differential privacy. IEEE Trans. Knowl. Data Eng. 33(5), 2280–2294 (2021)

    Article  Google Scholar 

  17. Takagi, S., Cao, Y., Asano, Y., Yoshikawa, M.: Geo-graph-indistinguishability: protecting location privacy for LBS over road networks. In: Foley, S.N. (ed.) DBSec 2019. LNCS, vol. 11559, pp. 143–163. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22479-0_8

    Chapter  Google Scholar 

  18. Xiao, X., Tao, Y., Chen, M.: Optimal random perturbation at multiple privacy levels. Proc. VLDB Endowment 2(1), 814–825 (2010)

    Article  Google Scholar 

  19. Kifer, D.: On estimating the swapping rate for categorical data. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 557–566. ACM, USA (2015)

    Google Scholar 

  20. Hardt, M., Rothblum, G.: multiplicative weights mechanism for privacy-preserving data analysis. In: 51th Annual Symposium on Foundations of Computer Science, pp. 61–70. IEEE, USA (2010)

    Google Scholar 

  21. Mohan, P., Thakurta, A., Shi, E.: GUPT: privacy preserving data analysis made easy. In: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 349–360. ACM, USA (2012)

    Google Scholar 

  22. Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. J. ACM 60(2), 1–25 (2013)

    Article  Google Scholar 

  23. Li, N., Qardaji, W., Su, D., Cao, J.: PrivBasis: frequent itemset mining with differential privacy. Proc. VLDB Endowment 5(11), 1340–1351 (2012)

    Article  Google Scholar 

  24. Xiao, X., Wang, G., Gehrke, K.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2010)

    Article  Google Scholar 

  25. Blum, B., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM, USA (2005)

    Google Scholar 

  26. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

  27. Ni, L., Li, C., Wang, X., Jiang, H., et al.: DP-MCDBSCAN: differential privacy preserving multi-Core DBSCAN clustering for network user data. IEEE Access 6, 21053–21063 (2018)

    Article  Google Scholar 

  28. Su, D., Cao, J., Li, N., Bertino, E.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 1–33 (2017)

    Article  Google Scholar 

  29. Nguyen, H.: Privacy-preserving mechanisms for K-modes clustering. Comput. Secur. 78, 60–75 (2018)

    Article  Google Scholar 

  30. Ghosh, A., Rougharden, M., Sundararajan, M.: Universally utility maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)

    Article  MathSciNet  Google Scholar 

  31. Mcsherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: 15th ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM, USA (2009)

    Google Scholar 

  32. Bhatt, K., Dalal, P., Panwar, P.: A cluster centres initialization method for clustering categorical data using genetic algorithm. Int. J. Digital Appl. Contemp. Res. 2(1), 1–8 (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Hunan Provincial Education Department of China under Grant number 21A0318, and the Research project on Teaching Reform of Ordinary Colleges and Universities in Hunan Province under Grant Number HNJG-2021-0651.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaobo Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, S., Yuan, L., Li, Y., Chen, W., Ding, Y. (2022). A High-Availability K-modes Clustering Method Based on Differential Privacy. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95388-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95387-4

  • Online ISBN: 978-3-030-95388-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics