skip to main content
10.1145/2897845.2897857acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Privacy-Preserving Spectral Analysis of Large Graphs in Public Clouds

Published:30 May 2016Publication History

ABSTRACT

Large graph datasets have become invaluable assets for studying problems in business applications and scientific research. These datasets, collected and owned by data owners, may also contain privacy-sensitive information. When using public clouds for elastic processing, data owners have to protect both data ownership and privacy from curious cloud providers. We propose a cloud-centric framework that allows data owners to efficiently collect graph data from the distributed data contributors, and privately store and analyze graph data in the cloud. Data owners can conduct expensive operations in untrusted public clouds with privacy and scalability preserved. The major contributions of this work include two privacy-preserving approximate eigen decomposition algorithms (the secure Lanczos and Nystrom methods) for spectral analysis of large graph matrices, and a personalized privacy-preserving data submission method based on differential privacy that allows for the trade-off between data sparsity and privacy. For a N-node graph, the proposed approach allows a data owner to finish the core operations with only O(N) client-side costs in computation, storage, and communication. The expensive O(N2) operations are performed in the cloud with the proposed privacy-preserving algorithms. We prove that our approach can satisfactorily preserve data privacy against the untrusted cloud providers. We have conducted an extensive experimental study to investigate these algorithms in terms of the intrinsic relationships among costs, privacy, scalability, and result quality.

References

  1. C. C. Aggarwal and P. S. Yu. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pages 1027--1035, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. J. Atallah and K. B. Frikken. Securely outsourcing linear algebra computations. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, pages 48--59, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In International Conference on World Wide Web, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2:73--120, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Boneh, E.-J. Goh, and K. Nissim. Evaluating 2-dnf formulas on ciphertexts. In Proceedings of the Second International Conference on Theory of Cryptography, TCC'05, pages 325--341, Berlin, Heidelberg, 2005. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Brakerski, C. Gentry, and V. Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS '12, pages 309--325, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chen. Gcreep: Google engineer stalked teens, spied on chats. Gawker, http://gawker.com/5637234/, 2010.Google ScholarGoogle Scholar
  10. W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E. Y. Chang. Parallel spectral clustering in distributed systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(PrePrints), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Cambridge University Press, 1985.Google ScholarGoogle Scholar
  12. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Dwork. Differential privacy. In International Colloquium on Automata, Languages andProgramming. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Elden. Matrix Methods in Data Mining and Pattern Recognition. SIAM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Gentry. Fully homomorphic encryption using ideal lattices. In Annual ACM Symposium on Theory of Computing, pages 169--178, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Huang, D. Evans, J. Katz, and L. Malka. Faster secure two-party computation using garbled circuits. In Proceedings of the 20th USENIX Conference on Security, SEC'11, pages 35--35, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The performance of mapreduce: An in-depth study. In Proceedings of Very Large Databases Conference (VLDB), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: Mining peta-scale graphs. Knowledge and Information Systems (KAIS), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and A. Smith. Analyzing graphs with node differential privacy. Theory of Cryptography (9783642365935), page 457, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman and Hall/CRC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nyström method. J. Mach. Learn. Res., 13(1):981--1006, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Lehoucq, D. Sorensen, and C. Yang. ARPACK Users' Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, 1998.Google ScholarGoogle Scholar
  24. K. Liu and E. Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 93--106, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Mcauley and J. Leskovec. Discovering social circles in ego networks. ACM Trans. Knowl. Discov. Data, 8(1):4:1--4:28, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. E. J. Newman. Spectral methods for community detection and graph partitioning. Phys. Rev. E, 88:042822, Oct 2013.Google ScholarGoogle ScholarCross RefCross Ref
  28. V. Nikolaenko, S. Ioannidis, U. Weinsberg, M. Joye, N. Taft, and D. Boneh. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC conference on Computer and communications security, pages 801--812, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, pages 223--238. Springer-Verlag, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Regev. On lattices, learning with errors, random linear codes, and cryptography. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 84--93, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM conference on Computer and communications security, CCS '09, pages 199--212, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Wang, K. Ren, J. Wang, and K. M. R. Urs. Harnessing the cloud for securely solving large-scale systems of linear equations. In Proceedings of ICDCS, pages 549--558, Washington, DC, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Wang, X. Wu, and L. Wu. Differential privacy preserving spectral graph analysis. In J. Pei, V. Tseng, L. Cao, H. Motoda, and G. Xu, editors, Advances in Knowledge Discovery and Data Mining, volume 7819 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013.Google ScholarGoogle Scholar
  35. T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Wu, X. Ying, K. Liu, and L. Chen. A survey of privacy-preservation of graphs and social networks. In C. C. Aggarwal and H. Wang, editors, Managing and Mining Graph Data, Advances in Database Systems. Springer US, 2010.Google ScholarGoogle Scholar
  37. B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. Newsl., 10(2):12--22, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-Preserving Spectral Analysis of Large Graphs in Public Clouds

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security
          May 2016
          958 pages
          ISBN:9781450342339
          DOI:10.1145/2897845

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 May 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASIA CCS '16 Paper Acceptance Rate73of350submissions,21%Overall Acceptance Rate418of2,322submissions,18%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader