ABSTRACT
Large graph datasets have become invaluable assets for studying problems in business applications and scientific research. These datasets, collected and owned by data owners, may also contain privacy-sensitive information. When using public clouds for elastic processing, data owners have to protect both data ownership and privacy from curious cloud providers. We propose a cloud-centric framework that allows data owners to efficiently collect graph data from the distributed data contributors, and privately store and analyze graph data in the cloud. Data owners can conduct expensive operations in untrusted public clouds with privacy and scalability preserved. The major contributions of this work include two privacy-preserving approximate eigen decomposition algorithms (the secure Lanczos and Nystrom methods) for spectral analysis of large graph matrices, and a personalized privacy-preserving data submission method based on differential privacy that allows for the trade-off between data sparsity and privacy. For a N-node graph, the proposed approach allows a data owner to finish the core operations with only O(N) client-side costs in computation, storage, and communication. The expensive O(N2) operations are performed in the cloud with the proposed privacy-preserving algorithms. We prove that our approach can satisfactorily preserve data privacy against the untrusted cloud providers. We have conducted an extensive experimental study to investigate these algorithms in terms of the intrinsic relationships among costs, privacy, scalability, and result quality.
- C. C. Aggarwal and P. S. Yu. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2010. Google ScholarDigital Library
- D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pages 1027--1035, 2007. Google ScholarDigital Library
- M. J. Atallah and K. B. Frikken. Securely outsourcing linear algebra computations. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, pages 48--59, 2010. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In International Conference on World Wide Web, 2007. Google ScholarDigital Library
- P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2:73--120, 2005.Google ScholarCross Ref
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- D. Boneh, E.-J. Goh, and K. Nissim. Evaluating 2-dnf formulas on ciphertexts. In Proceedings of the Second International Conference on Theory of Cryptography, TCC'05, pages 325--341, Berlin, Heidelberg, 2005. Springer-Verlag. Google ScholarDigital Library
- Z. Brakerski, C. Gentry, and V. Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS '12, pages 309--325, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Chen. Gcreep: Google engineer stalked teens, spied on chats. Gawker, http://gawker.com/5637234/, 2010.Google Scholar
- W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E. Y. Chang. Parallel spectral clustering in distributed systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(PrePrints), 2010. Google ScholarDigital Library
- J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Cambridge University Press, 1985.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- C. Dwork. Differential privacy. In International Colloquium on Automata, Languages andProgramming. Springer, 2006. Google ScholarDigital Library
- L. Elden. Matrix Methods in Data Mining and Pattern Recognition. SIAM, 2007. Google ScholarDigital Library
- C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 2004. Google ScholarDigital Library
- C. Gentry. Fully homomorphic encryption using ideal lattices. In Annual ACM Symposium on Theory of Computing, pages 169--178, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Y. Huang, D. Evans, J. Katz, and L. Malka. Faster secure two-party computation using garbled circuits. In Proceedings of the 20th USENIX Conference on Security, SEC'11, pages 35--35, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The performance of mapreduce: An in-depth study. In Proceedings of Very Large Databases Conference (VLDB), 2010. Google ScholarDigital Library
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: Mining peta-scale graphs. Knowledge and Information Systems (KAIS), 2010. Google ScholarDigital Library
- S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and A. Smith. Analyzing graphs with node differential privacy. Theory of Cryptography (9783642365935), page 457, 2013. Google ScholarDigital Library
- J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman and Hall/CRC, 2007. Google ScholarDigital Library
- S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nyström method. J. Mach. Learn. Res., 13(1):981--1006, 2012. Google ScholarDigital Library
- R. Lehoucq, D. Sorensen, and C. Yang. ARPACK Users' Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, 1998.Google Scholar
- K. Liu and E. Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 93--106, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- J. Mcauley and J. Leskovec. Discovering social circles in ego networks. ACM Trans. Knowl. Discov. Data, 8(1):4:1--4:28, Feb. 2014. Google ScholarDigital Library
- M. E. J. Newman. Spectral methods for community detection and graph partitioning. Phys. Rev. E, 88:042822, Oct 2013.Google ScholarCross Ref
- V. Nikolaenko, S. Ioannidis, U. Weinsberg, M. Joye, N. Taft, and D. Boneh. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC conference on Computer and communications security, pages 801--812, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, pages 223--238. Springer-Verlag, 1999. Google ScholarDigital Library
- O. Regev. On lattices, learning with errors, random linear codes, and cryptography. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 84--93, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM conference on Computer and communications security, CCS '09, pages 199--212, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007. Google ScholarDigital Library
- C. Wang, K. Ren, J. Wang, and K. M. R. Urs. Harnessing the cloud for securely solving large-scale systems of linear equations. In Proceedings of ICDCS, pages 549--558, Washington, DC, USA, 2011. Google ScholarDigital Library
- Y. Wang, X. Wu, and L. Wu. Differential privacy preserving spectral graph analysis. In J. Pei, V. Tseng, L. Cao, H. Motoda, and G. Xu, editors, Advances in Knowledge Discovery and Data Mining, volume 7819 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013.Google Scholar
- T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2009. Google ScholarDigital Library
- X. Wu, X. Ying, K. Liu, and L. Chen. A survey of privacy-preservation of graphs and social networks. In C. C. Aggarwal and H. Wang, editors, Managing and Mining Graph Data, Advances in Database Systems. Springer US, 2010.Google Scholar
- B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. Newsl., 10(2):12--22, Dec. 2008. Google ScholarDigital Library
Index Terms
- Privacy-Preserving Spectral Analysis of Large Graphs in Public Clouds
Recommendations
Personality-based Knowledge Extraction for Privacy-preserving Data Analysis
K-CAP '17: Proceedings of the 9th Knowledge Capture ConferenceIn this paper, we present a differential privacy preserving approach, which extracts personality-based knowledge to serve privacy guarantee data analysis on personal sensitive data. Based on the approach, we further implement an end-to-end privacy ...
Protecting sensitive place visits in privacy-preserving trajectory publishing
Highlights- We propose a method for privacy-preserving trajectory publishing.
- It aims at ...
AbstractThe rise of mobile computing has generated huge amount of trajectory data. Since these data are valuable for many people, publishing them while providing adequate individual privacy protection has been a challenging task. In this paper,...
Customized privacy preserving for inherent data and latent data
The huge amount of sensory data collected from mobile devices has offered great potentials to promote more significant services based on user data extracted from sensor readings. However, releasing user data could also seriously threaten user privacy. ...
Comments