Privacy Aware K-Means Clustering with High Utility

Nguyen, Thanh Dai; Gupta, Sunil; Rana, Santu; Venkatesh, Svetha

doi:10.1007/978-3-319-31750-2_31

Thanh Dai Nguyen¹⁹,
Sunil Gupta¹⁹,
Santu Rana¹⁹ &
…
Svetha Venkatesh¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3298 Accesses
6 Citations

Abstract

Privacy-preserving data mining aims to keep data safe, yet useful. But algorithms providing strong guarantees often end up with low utility. We propose a novel privacy preserving framework that thwarts an adversary from inferring an unknown data point by ensuring that the estimation error is almost invariant to the inclusion/exclusion of the data point. By focusing directly on the estimation error of the data point, our framework is able to significantly lower the perturbation required. We use this framework to propose a new privacy aware K-means clustering algorithm. Using both synthetic and real datasets, we demonstrate that the utility of this algorithm is almost equal to that of the unperturbed K-means, and at strict privacy levels, almost twice as good as compared to the differential privacy counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
available at URL https://archive.ics.uci.edu/ml/datasets.html.

References

Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM SIGMOD Rec. 29(2), 439–450 (2000). ACM
Article Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Ciriani, V., di Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining. Advances in Database Systems, vol. 34, pp. 105–136. Springer, US (2008)
Chapter Google Scholar
Malik, M.B., Ghazi, M.A., Ali, R.: Privacy preserving data mining techniques: current scenario and future prospects. In: ICCCT 2012, pp. 26–32. IEEE (2012)
Google Scholar
Begelman, G., Keller, P., Smadja, F., et al.: Automated tag clustering: improving search and exploration in the tag space. In: Collaborative Web Tagging Workshop at WWW2006, pp. 15–33 (2006)
Google Scholar
Fred, A.L., Jain, A.K.: Data clustering using evidence accumulation. In: ICPR 2002, vol. 4, pp. 276–280. IEEE (2002)
Google Scholar
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: ACM SIGIR 2004, pp. 210–217 (2004)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: KDD 2003, pp. 206–215. ACM (2003)
Google Scholar
Inan, A., Kaya, S.V., Saygın, Y., Savaş, E., Hintoğlu, A.A., Levi, A.: Privacy preserving clustering on horizontally partitioned data. Data Knowl. Eng. 63(3), 646–666 (2007)
Article Google Scholar
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: KDD 2005, pp. 593–599. ACM (2005)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: NIPS 2009, pp. 289–296 (2009)
Google Scholar
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. In: ICDMW 2009, pp. 114–121. IEEE (2009)
Google Scholar
Hua, J., Xia, C., Zhong, S.: Differentially private matrix factorization. In: IJCAI (2015)
Google Scholar
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the sulq framework. In: PODS 2005, pp. 128–138. ACM (2005)
Google Scholar
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: ACM SIGMOD International Conference on Management of Data (2009)
Google Scholar
Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private \(k\)-means clustering. CoRR, abs/1504.05998 (2015)
Google Scholar
Rana, S., Gupta, S., Venkatesh, S.: Differentially private random forest with high utility. In: IEEE International Conference on Data Mining (2015)
Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Chapter Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
MATH Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Chapter Google Scholar
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Salibian-Barrera, M., Zamar, R.H.: Bootstrapping robust estimates of regression. Ann. Stat. 30, 556–582 (2002)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center for Pattern Recognition and Data Analytics, Deakin University, Geelong, 3216, Australia
Thanh Dai Nguyen, Sunil Gupta, Santu Rana & Svetha Venkatesh

Authors

Thanh Dai Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Santu Rana
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh Dai Nguyen .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T.D., Gupta, S., Rana, S., Venkatesh, S. (2016). Privacy Aware K-Means Clustering with High Utility. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-31750-2_31
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics