skip to main content
10.1145/2642937.2643006acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Program analysis for secure big data processing

Published:15 September 2014Publication History

ABSTRACT

The ubiquitous nature of computers is driving a massive increase in the amount of data generated by humans and machines. Two natural consequences of this are the increased efforts to (a) derive meaningful information from accumulated data and (b) ensure that data is not used for unintended purposes. In the direction of analyzing massive amounts of data (a.), tools like MapReduce, Spark, Dryad and higher level scripting languages like Pig Latin and DryadLINQ have significantly improved corresponding tasks for software developers. The second, but equally important aspect of ensuring confidentiality (b.), has seen little support emerge for programmers: while advances in cryptographic techniques allow us to process directly on encrypted data, programmer-friendly and efficient ways of programming such data analysis jobs are still missing. This paper presents novel data flow analyses and program transformations for Pig Latin, that automatically enable the execution of corresponding scripts on encrypted data. We avoid fully homomorphic encryption because of its prohibitively high cost; instead, in some cases, we rely on a minimal set of operations performed by the client. We present the algorithms used for this translation, and empirically demonstrate the practical performance of our approach as well as improvements for programmers in terms of the effort required to preserve data confidentiality.

References

  1. Amazon EC2. http://amazon.com/ec2.Google ScholarGoogle Scholar
  2. Apache Pig. http://pig.apache.org.Google ScholarGoogle Scholar
  3. Apache PigMix benchmark. https://cwiki.apache.org/confluence/display/PIG/PigMix.Google ScholarGoogle Scholar
  4. HElib. https://github.com/shaih/HElib.Google ScholarGoogle Scholar
  5. The GNU Multiple Precision Arithmetic Library. https://gmplib.org/.Google ScholarGoogle Scholar
  6. Wikipedia database download. http://en.wikipedia.org/wiki/Wikipedia: Database_download.Google ScholarGoogle Scholar
  7. Zero MQ. http://zeromq.org.Google ScholarGoogle Scholar
  8. A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical Privacy: The SuLQ Framework. In PODS, pages 128--138, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Boldyreva, N. Chenette, Y. Lee, and A. O'Neill. Order-preserving symmetric encryption. In EUROCRYPT, pages 224--241, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Brun and N. Medvidovic. Keeping Data Private while Computing in the Cloud. In IEEE CLOUD, pages 285--294, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Chlipala. Static checking of dynamically-varying security policies in database-backed applications. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Daemen and V. Rijmen. The Design of Rijndael: AES - The Advanced Encryption Standard. Springer Verlag, Berlin, Heidelberg, New York, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. DeveloperWorks. Process your Data with Apache Pig, 2012. http://www.ibm.com/developerworks/library/l-apachepigdataquery/.Google ScholarGoogle Scholar
  16. I. Dinur and K. Nissim. Revealing Information While Preserving Privacy. In PODS, pages 202--210, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Dwork and K. Nissim. Privacy-Preserving Datamining on Vertically Partitioned Databases. In CRYPTO, pages 528--544, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. ElGamal. A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. IEEE Transactions on Information Theory, 31(4):469--472, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Gentry, A. Sahai, and B. Waters. Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based. In CRYPTO, volume 1, pages 75--92, Aug. 2013.Google ScholarGoogle Scholar
  20. Hadoop. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  21. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using pql: A program query language. SIGPLAN Not., 40(10):365--383, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Mitchell, R. Sharma, D. Stefan, and J. Zimmerman. Information-flow control for programming on encrypted data. In Computer Security Foundations Symposium (CSF), 2012 IEEE 25th, pages 45--60, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. D. Nielsen and M. I. Schwartzbach. A domain-specific programming language for secure multiparty computation. In Proceedings of the 2007 Workshop on Programming Languages and Analysis for Security, pages 21--30. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In SIGMOD, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Paillier. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In EUROCRYPT, pages 223--238, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Parno, J. M. McCune, D. Wendlandt, D. G. Andersen, and A. Perrig. Clamp: Practical prevention of large-scale data leaks. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, SP '09, pages 154--169. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan. CryptDB: Protecting Confidentiality with Encrypted Query Processing. In SOSP, pages 85--100, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: Security and Privacy for MapReduce. In NSDI, pages 297--312, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Santos, R. Rodrigues, K. P. Gummadi, and S. Saroiu. Policy-sealed data: A new abstraction for building trusted cloud services. In Proceedings of the 21st USENIX Conference on Security Symposium, pages 10--10. USENIX Association, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Schneier. Description of a new variable-length key, 64-bit block cipher (blowfish). In Fast Software Encryption, pages 191--204. Springer-Verlag, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Schwarzkopf, D. Murray, and S. Hand. The Seven Deadly Sins of Cloud Computing Research. In HotClouds, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In MSST, pages 1--10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. J. Stephen and P. Eugster. Assured Cloud-Based Data Analysis with ClusterBFT. In Middleware, pages 82--102, 2013.Google ScholarGoogle Scholar
  35. J. J. Stephen, S. Savvides, R. Seidel, and P. Eugster. Practical Confidentiality Preserving Big Data Analysis. In HotCloud, 2014.Google ScholarGoogle Scholar
  36. S. Tetali, M. Lesani, R. Majumdar, and T. Millstein. MrCrypt: Static Analysis for Secure Cloud Computations. In OOPSLA, pages 271--286, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Tu, F. Kaashoek, S. Madden, and N. Zeldovich. Processing Analytical queries over encrypted data. In PVLDB, pages 289--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Yip, X. Wang, N. Zeldovich, and M. F. Kaashoek. Improving application security with data flow assertions. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 291--304. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Program analysis for secure big data processing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering
          September 2014
          934 pages
          ISBN:9781450330138
          DOI:10.1145/2642937

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 September 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASE '14 Paper Acceptance Rate82of337submissions,24%Overall Acceptance Rate82of337submissions,24%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader