Abstract
Preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied in the interactive and the non-interactive settings. However, existing interactive differentially private clustering algorithms suffer from a non-convergence problem, i.e., these algorithms may not terminate without a predefined number of iterations. This problem severely impacts the clustering quality and the efficiency of the algorithm. To resolve this problem, we propose a novel iterative approach in the interactive settings which controls the orientation of the centroids movement over the iterations to ensure the convergence by injecting DP noise in a selected area. We prove that, in the expected case, our approach converges to the same centroids as Lloyd’s algorithm in at most twice the iterations of Lloyd’s algorithm. We perform experimental evaluations on real-world datasets to show that our algorithm outperforms the state-of-the-art of the interactive differentially private clustering algorithms with a guaranteed convergence and better clustering quality to meet the same DP requirement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 901–914. ACM (2013)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006). http://cs.uef.fi/sipu/datasets/
Fränti, P., Virmajoki, O.: Clustering datasets (2018). http://cs.uef.fi/sipu/datasets/
Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1106–1125 (2010)
Komarek, P.: Komarix datasets (2018). http://komarix.org/ac/ds/
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)
McSherry, F.: Privacy integrated queries. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM (2009)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 2007 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103. IEEE (2007)
Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.: GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM (2012)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)
Park, M., Foulds, J., Choudhary, K., Welling, M.: DP-EM: differentially private expectation maximization. In: Artificial Intelligence and Statistics, pp. 896–904 (2017)
Su, D., Cao, J., Li, N., Bertino, E., Lyu, M., Jin, H.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 16 (2017)
Zhang, J., Xiao, X., Yang, Y., Zhang, Z., Winslett, M.: PrivGene: differentially private model fitting using genetic algorithms. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 665–676. ACM (2013)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997). http://cs.uef.fi/sipu/datasets/
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported by Australian Government Research Training Program Scholarship, Australian Research Council Discovery Project DP150104871, National Key R & D Program of China Project #2017YFB0203201, and supported with supercomputing resources provided by the Phoenix HPC service at the University of Adelaide. The corresponding author is Hong Shen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, Z., Shen, H. (2019). A Convergent Differentially Private k-Means Clustering Algorithm. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-16148-4_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)