Clustering Algorithm for Privacy Preservation on MapReduce

Zhao, Zheng; Shang, Tao; Liu, Jianwei; Guan, Zhengyu

doi:10.1007/978-3-030-00009-7_56

Zheng Zhao¹⁶,
Tao Shang¹⁷,
Jianwei Liu¹⁷ &
…
Zhengyu Guan¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11064))

Included in the following conference series:

International Conference on Cloud Computing and Security

1995 Accesses

Abstract

Until now, a lot of clustering algorithms for differential privacy (DP) have been proposed. Practically, there still exist difficulties in implementing these algorithms in a big data platform. In this paper, we proposed a clustering algorithm for privacy preservation on MapReduce. The algorithm is implemented from two aspects. Firstly, the optimized Canopy algorithm is implemented to get the optimal number of clusters and the initial center points on MapReduce. Secondly, the DP K-means algorithm is implemented to get the final clusters on MapReduce. As a result, the proposed algorithm can generate the optimal clustering number that is same with the standard classified data set and can achieve better accuracy of the clusters with the suitable privacy budget $\varepsilon $.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A DP Canopy K-Means Algorithm for Privacy Preservation of Hadoop Platform

Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data

Big data clustering with varied density based on MapReduce

Article Open access 22 August 2019

References

Blum, A., Dwork, C., Mcsherry, F., et al.: Practical privacy: the SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM, New Work (2005)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. J. Roy. Stat. Soc. Ser. C. Appl. Stat. 28(1), 100–108 (1979)
MATH Google Scholar
Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: A combined approach for clustering bases on K-means and gravitational search algorithm. Swarm. Evol. Comput. 6, 47–52 (2012)
Article Google Scholar
Hua, Y.H., Miao, K.X.: Understanding Big Data Processing and Programming, 1st edn. China Machine Press, Beijing (2014)
Google Scholar
Li, Y., Hao, Z.F., Wen, W., Xie, G.Q.: Research on differential privacy preserving K-means clustering. Comput. Sci. 40(3), 287–290 (2013)
Google Scholar
Mccallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM, New York (2000)
Google Scholar
Mendes, R., Vilela, J.P.: Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5(99), 10562–10582 (2017)
Article Google Scholar
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM, New York (2007)
Google Scholar
Reddy, D., Jana, P.K.: Initialization for K-means clustering using Voronoi diagram. Procedia Technol. 4(4), 395–400 (2012)
Article Google Scholar
Shang, T., Zhao, Z., Guan, Z., Liu, J.: A DP canopy K-means algorithm for privacy preservation of hadoop platform. In: Wen, S., Wu, W., Castiglione, A. (eds.) CSS 2017. LNCS, vol. 10581, pp. 189–198. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69471-9_14
Chapter Google Scholar
Xiong, P., Zhu, T.Q., Wang, X.F.: A survey on differential privacy and applications. Chin. J. Comput. 37(1), 101–122 (2014)
Google Scholar
Zhang, W.J., Gu, X.F., Chen, L.F.: A K-means initial clustering center selection algorithm based on mean-standard deviation. J. Remote Sens. 10(5), 715–721 (2006)
Google Scholar

Download references

Acknowledgment

Project supported by the National Key Research and Development Program of China (No. 2016YFC1000307) and the National Natural Science Foundation of China (No. 61571024) for valuable helps.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Beihang University, Beijing, 100083, China
Zheng Zhao
School of Cyber Science and Technology, Beihang University, Beijing, 100083, China
Tao Shang, Jianwei Liu & Zhengyu Guan

Authors

Zheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Shang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyu Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Shang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Zhaoqing Pan
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Shang, T., Liu, J., Guan, Z. (2018). Clustering Algorithm for Privacy Preservation on MapReduce. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11064. Springer, Cham. https://doi.org/10.1007/978-3-030-00009-7_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-00009-7_56
Published: 21 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00008-0
Online ISBN: 978-3-030-00009-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics