skip to main content
10.1145/3433210.3453104acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Privacy-preserving Density-based Clustering

Published: 04 June 2021 Publication History

Abstract

Clustering is an unsupervised machine learning technique that outputs clusters containing similar data items. In this work, we investigate privacy-preserving density-based clustering which is, for example, used in financial analytics and medical diagnosis. When (multiple) data owners collaborate or outsource the computation, privacy concerns arise. To address this problem, we design, implement, and evaluate the first practical and fully private density-based clustering scheme based on secure two-party computation. Our protocol privately executes the DBSCAN algorithm without disclosing any information (including the number and size of clusters). It can be used for private clustering between two parties as well as for private outsourcing of an arbitrary number of data owners to two non-colluding servers. Our implementation of the DBSCAN algorithm privately clusters data sets with 400 elements in 7 minutes on commodity hardware. Thereby, it flexibly determines the number of required clusters and is insensitive to outliers, while being only factor 19x slower than today's fastest private K-means protocol (Mohassel et al., PETS'20) which can only be used for specific data sets. We then show how to transfer our newly designed protocol to related clustering algorithms by introducing a private approximation of the TRACLUS algorithm for trajectory clustering which has interesting real-world applications like financial time series forecasts and the investigation of the spread of a disease like COVID-19.

Supplementary Material

MP4 File (asiaccs_ppDBSCAN_presentation_1.0.mp4)
Presentation video of "Privacy-preserving Density-based Clustering" by Beyza Bozdemir, Sébastien Canard, Orhan Ermis, Helen Möllering, Melek Önen and Thomas Schneider.

References

[1]
M. Ahmed, A. N. Mahmood, and Md. R. Islam. 2016. A Survey of Anomaly Detection Techniques in Financial Domain. In Future Generation Computer Systems.
[2]
U. M. Aïvodji, K. Huguenin, M. Huguer, and M. Killijian. 2018. Sride: A Privacy-Preserving Ridesharing System. In WISEC. ACM.
[3]
N. Almutairi, F. Coenen, and K. Dures. 2018. Secure Third Party Data Clustering Using $¶hi$ Data: Multi-User Order Preserving Encryption and Super Secure Chain Distance Matrices. In International Conference on Innovative Techniques and Applications of Artificial Intelligence.
[4]
A. Amirbekyan and V. Estivill-Castro. 2006. Privacy Preserving DBSCAN for Vertically Partitioned Data. In Intelligence and Security Informatics. Springer.
[5]
I. V. Anikin and R. M. Gazimov. 2017. Privacy Preserving DBSCAN Clustering Algorithm for Vertically Partitioned Data in Distributed Systems. In International Siberian Conference on Control and Communications. IEEE.
[6]
O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. PéRez, and I. Perona. 2013. An Extensive Comparative Study of Cluster Validity Indices. Pattern Recognition (2013).
[7]
G. Asharov, Y. Lindell, T. Schneider, and M. Zohner. 2013. More Efficient Oblivious Transfer and Extensions for Faster Secure Computation. In CCS. ACM.
[8]
M.-F. Balcan, T. Dick, Y. Liang, W. Mou, and H. Zhang. 2017. Differentially Private Clustering in High-Dimensional Euclidean Spaces. In International Conference on Machine Learning (ICML). PMLR.
[9]
A. Bampoulidis, A. Bruni, L. Helminger, D. Kales, C. Rechberger, and R. Walch. 2020. Privately Connecting Mobility to Infectious Diseases via Applied Cryptography. https://eprint.iacr.org/2020/522.
[10]
D. Beaver. 1991. Efficient Multiparty Protocols Using Circuit Randomization. In CRYPTO. Springer.
[11]
M. Bellare, V. T. Hoang, S. Keelveedhi, and P. Rogaway. 2013. Efficient Garbling from a Fixed-Key Blockcipher. In S&P. IEEE.
[12]
P. Besse, B. Guillouet, J.-M. Loubes, and F. Royer. 2016. Review & Perspective for Distance Based Clustering of Vehicle Trajectories. In Transactions on Intelligent Transportation Systems. IEEE.
[13]
P. Bunn and R. Ostrovsky. 2007. Secure Two-Party K-means Clustering. In CCS. ACM.
[14]
H. Chaudhari, R. Rachuri, and A. Suresh. 2020. Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning. In NDSS. The Internet Society.
[15]
H. Chen, I. Chillotti, Y. Dong, O. Poburinnaya, I. Razenshteyn, and M. S. Riazi. 2020. SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search. In USENIX Security. USENIX.
[16]
D. Demmler, T. Schneider, and M. Zohner. 2015. ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation. In NDSS. The Internet Society.
[17]
Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, I. Lagendijk, and T. Toft. 2009. Privacy-Preserving Face Recognition. In PoPETS. Springer.
[18]
Z. Erkin, J. R. Troncoso-pastoriza, R. L. Lagendijk, and F. Perez-Gonzalez. 2013. Privacy-preserving Data Aggregation in Smart Metering Systems: An Overview. In IEEE Signal Processing Magazine.
[19]
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
[20]
P. Fr"anti and S. Sieranoja. 2018. K-means Properties on Six Clustering Benchmark Datasets. In Applied Intelligence. Springer.
[21]
F. D. Garcia and B. Jacobs. 2010. Privacy-friendly Energy-metering via Homomorphic Encryption. In International Workshop on Security and Trust Management. Springer.
[22]
C. Gentry. 2009. A fully Homomorphic Encryption Scheme. Stanford University.
[23]
Z. Gheid and Y. Challal. 2016. Efficient and Privacy-Preserving K-means Clustering for Big Data Mining. In TrustCom/BigDataSE/ISPA. IEEE.
[24]
O. Goldreich, S. Micali, and A. Wigderson. 1987. How to Play ANY Mental Game. In STOC. ACM.
[25]
Q. Guo, X. Lu, Y. Gao, J. Zhang, B. Yan, D. Su, A. Song, X. Zhao, and G. Wang. 2017. Cluster Analysis: A New Approach for Identification of Underlying Risk Factors for Coronary Artery Disease in Essential Hypertensive Patients. In Scientific Reports.
[26]
P. Hallgren, C. Orlandi, and A. Sabelfeld. 2017. PrivatePool: Privacy-Preserving Ridesharing. In Computer Security Foundations (CSF). IEEE.
[27]
K. Hamada, R. Kikuchi, D. Ikarashi, K. Chida, and K. Takahashi. 2012. Practically Efficient Multi-party Sorting Protocols from Comparison Sort Algorithms. In International Conference on Information Security and Cryptology (ICISC). Springer.
[28]
M. Huang, Q. Bao, Y. Zhang, and W. Feng. 2019. A Hybrid Algorithm for Forecasting Financial Time Series Data Based on DBSCAN and SVR. In Information.
[29]
L. Hubert and P. Arabie. 1985. Comparing Partitions. In Journal of Classification. Springer.
[30]
Y. Ishai, J. Kilian, K. Nissim, and E. Petrank. 2003. Extending Oblivious Transfers Efficiently. In CRYPTO. Springer.
[31]
G. Jagannathan, K. Pillaipakkamnatt, R. Wright, and D. Umano. 2010. Communication-efficient Privacy-Preserving Clustering. In Transactions on Data Privacy. Springer.
[32]
G. Jagannathan and R. N. Wright. 2005. Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data. In SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM.
[33]
K. J"arvinen, H. Lepp"akoski, E. S. Lohan, P. Richter, T. Schneider, O. Tkachenko, and Z. Yang. 2019. PILOT: Practical Privacy-Preserving Indoor Localization using OuTsourcing. In EuroS&P. IEEE.
[34]
A. J"aschke and F. Armknecht. 2018. Unsupervised Machine Learning on Encrypted Data. In SAC. Springer.
[35]
S. Jha, L. Kruger, and P. McDaniel. 2005. Privacy Preserving Clustering. In ESORICS. Springer.
[36]
D. Jiang, A. Xue, S. Ju, W. Chen, and H. Ma. 2008. Privacy-preserving DBSCAN on Horizontally Partitioned Data. In International Symposium on IT in Medicine and Education. IEEE.
[37]
S. Kamara and M. Raykova. 2011. Secure Outsourced Computation in a Multi-Tenant Cloud. In IBM Workshop on Cryptography and Security in Clouds.
[38]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by Simulated Annealing. In SCIENCE.
[39]
V. Kolesnikov and T. Schneider. 2008. Improved Garbled Circuit: Free XOR Gates and Applications. In ICALP. Springer.
[40]
D. Kopanaki, N. Pelekis, A. Gkoulalas-Divanis, M. Vodas, and Y. Theodoridis. 2012. A Framework for Mobility Pattern Mining and Privacy- Aware Querying of Trajectory Data. In Hellenic Data Management Symposium.
[41]
H.-P. Kriegel and M. Pfeifle. 2005. Density-Based Clustering of Uncertain Data. In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
[42]
K. A. Kumar and C. P. Rangan. 2007. Privacy Preserving DBSCAN Algorithm for Clustering. In Advanced Data Mining and Applications. Springer.
[43]
K. Kursawe, G. Danezis, and M. Kohlweiss. 2011. Privacy-friendly Aggregation for the Smart-Grid. In PETS. Springer.
[44]
J.-G. Lee, J. Han, and K.-Y. Whang. 2007. Trajectory Clustering: a Partition-and-Group Framework. In SIGMOD International Conference on Management of Data. ACM.
[45]
D. Liu, E. Bertino, and X. Yi. 2014. Privacy of Outsourced K-Means Clustering. In ASIACCS. ACM.
[46]
J. Liu, L. Xiong, J. Luo, and J. Z. Huang. 2013. Privacy Preserving Distributed DBSCAN Clustering. In Transactions on Data Privacy.
[47]
P. Mohassel and P. Rindal. 2018. ABY(^3 ): A Mixed Protocol Framework for Machine Learning. In CCS. ACM.
[48]
P. Mohassel, M. Rosulek, and N. Trieu. 2020. Practical Privacy-Preserving K-means Clustering. In PoPETS. Sciendo.
[49]
D. Moulavi, P. A. Jaskowiak, R. J. G. B. Campello, A. Zimek, and J. Sander. 2014. Density-based clustering validation. In International Conference on Data Mining. SIAM.
[50]
M. Naor and B. Pinkas. 1999. Oblivious Transfer and Polynomial Evaluation. In STOC. ACM.
[51]
L. Ni, C. Li, X. Wang, H. Jiang, and J. Yu. 2018. DP-MCDBSCAN: Differential Privacy Preserving Multi-Core DBSCAN Clustering for Network User Data. In IEEE Access. IEEE.
[52]
E. Pagnin, G. Gunnarsson, P. Talebi, C. Orlandi, and A. Sabelfeld. 2019. TOPPool: Time-aware Optimized Privacy-Preserving Ridesharing. In PoPETS. Sciendo.
[53]
N. G. Pavlidis, V. P. Plagianakos, D. K. Tasoulis, and M. N. Vrahatis. 2006. Financial Forecasting through Unsupervised Clustering and Neural Networks. Operational Research (2006).
[54]
N. Pelekis, A. Gkoulalas-Divanis, M. Vodas, A. Plemenos, D. Kopanaki, and Y. Theodoridis. 2012. Private-HERMES: A Benchmark Framework for Privacy-Preserving Mobility Data Querying and Mining Methods. In Extending Database Technology. ACM.
[55]
G. Punj and D. W. Stewart. 1983. Cluster Analysis in Marketing Research: Review and Suggestions for Application. In Journal of Marketing Research.
[56]
Y. Qi and M. J. Atallah. 2008. Efficient Privacy-preserving K-nearest Neighbor Search. In International Conference on Distributed Computing Systems. IEEE.
[57]
M. S. Rahman, A. Basu, and S. Kiyomoto. 2017. Towards Outsourced Privacy-Preserving Multiparty DBSCAN. In Pacific Rim International Symposium on Dependable Computing. IEEE.
[58]
D. Rathee, T. Schneider, and K. K. Shukla. 2019. Improved Multiplication Triple Generation over Rings via RLWE-Based AHE. In CANS. Springer.
[59]
P. Rousseeuw. 1987. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. In Journal of Computational and Applied Mathematics.
[60]
S. Samet, A. Miri, and L. Orozco-Barbosa. 2007. Privacy Preserving K-means Clustering in Multi-Party Environment. In SECRYPT.
[61]
J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. 1998. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
[62]
A. Sangers, M. van Heesch, T. Attema, T. Veugen, M. Wiggerman, J. Veldsink, O. Bloemen, and D Worm. 2019. Secure Multiparty PageRank Algorithm for Collaborative Fraud Detection. In FC. Springer.
[63]
U. Stemmer. 2020. Locally Private K-means Clustering. In SIAM Symposium on Discrete Algorithms. ACM.
[64]
D. Su, J. Cao, N. Li, E. Bertino, and H. Jin. 2016. Differentially Private K-Means Clustering. In Conference on Data and Application Security and Privacy. ACM.
[65]
D. Su, J. Cao, N. Li, E. Bertino, M. Lyu, and H. Jin. 2017. Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization. In Transactions on Privacy and Security. ACM.
[66]
C. Troncoso, M. Payer, J.-P. Hubaux, M. Salathé, J. Larus, E. Bugnion, W. Lueks, T. Stadler, A. Pyrgelis, D. Antonioli, et al. 2020. Decentralized Privacy-Preserving Proximity Tracing. IEEE Data Engineering Bulletin (2020).
[67]
A. Ultsch. 2005. Clustering wih som: U*c. In Workshop on Self-Organizing Maps.
[68]
J. Vaidya and C. Clifton. 2003. Privacy-Preserving k-Means Clustering over Vertically Partitioned Data. In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
[69]
N. X. Vinh, J. Epps, and J. Bailey. 2010. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. The Journal of Machine Learning Research (2010).
[70]
W. Wu, J. Liu, H. Wang, J. Hao, and M. Xian. 2020. Secure and Efficient Outsourced K-means Clustering using Fully Homomorphic Encryption with Ciphertext Packing Technique. In Transactions on Knowledge and Data Engineering. IEEE.
[71]
W. M. Wu and H. K. Huang. 2015. A DP-DBScan Clustering Algorithm based on Differential Privacy Preserving. In Computer Engineering and Science.
[72]
W. Xu, L. Huang, Y. Luo, Y. Yao, and W. W. Jing. 2007. Protocols for Privacy-Preserving DBSCAN Clustering. In Int. Journal of Security and Its Applications.
[73]
A. C. Yao. 1986. How to Generate and Exchange Secrets. In FOCS. IEEE.
[74]
S. Zahur, M. Rosulek, and D. Evans. 2015. Two Halves Make a Whole - Reducing Data Transfer in Garbled Circuits Using Half Gate. In EUROCRYPT. Springer.

Cited By

View all
  • (2025)Privacy-Preserving Byzantine-Robust Federated Learning via Multiparty Homomorphic EncryptionComputing and Combinatorics10.1007/978-981-96-1093-8_36(432-446)Online publication date: 20-Feb-2025
  • (2024)Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic EncryptionSensors10.3390/s2415482624:15(4826)Online publication date: 25-Jul-2024
  • (2024)FSS-DBSCAN: Outsourced Private Density-Based Clustering via Function Secret SharingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344623319(7759-7773)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security
May 2021
975 pages
ISBN:9781450382878
DOI:10.1145/3433210
  • General Chairs:
  • Jiannong Cao,
  • Man Ho Au,
  • Program Chairs:
  • Zhiqiang Lin,
  • Moti Yung
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. private machine learning
  3. secure computation

Qualifiers

  • Research-article

Conference

ASIA CCS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Privacy-Preserving Byzantine-Robust Federated Learning via Multiparty Homomorphic EncryptionComputing and Combinatorics10.1007/978-981-96-1093-8_36(432-446)Online publication date: 20-Feb-2025
  • (2024)Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic EncryptionSensors10.3390/s2415482624:15(4826)Online publication date: 25-Jul-2024
  • (2024)FSS-DBSCAN: Outsourced Private Density-Based Clustering via Function Secret SharingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344623319(7759-7773)Online publication date: 2024
  • (2024)PPA-DBSCAN: Privacy-Preserving -Approximate Density-Based ClusteringIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337534721:6(5324-5340)Online publication date: Nov-2024
  • (2024)Privacy-preserving Hybrid Learning Framework for HealthcareProcedia Computer Science10.1016/j.procs.2024.09.215246(3420-3429)Online publication date: 2024
  • (2024)Approximate DBSCAN on obfuscated dataJournal of Information Security and Applications10.1016/j.jisa.2023.10366480:COnline publication date: 17-Apr-2024
  • (2024)PPPCTComputers in Biology and Medicine10.1016/j.compbiomed.2024.108351173:COnline publication date: 1-May-2024
  • (2023)SecBerg: Secure and Practical Iceberg Queries in CloudIEEE Transactions on Services Computing10.1109/TSC.2023.326471016:5(3696-3710)Online publication date: Sep-2023
  • (2023)pSafety: Privacy-Preserving Safety Monitoring in Online Ride Hailing ServicesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.313057120:1(209-224)Online publication date: 1-Jan-2023
  • (2023)Practical and Privacy-Preserving Density-Based Clustering via ShufflingGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437594(50-55)Online publication date: 4-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media