skip to main content
10.1145/2046707.2046767acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Sedic: privacy-aware data intensive computing on hybrid clouds

Published: 17 October 2011 Publication History

Abstract

The emergence of cost-effective cloud services offers organizations great opportunity to reduce their cost and increase productivity. This development, however, is hampered by privacy concerns: a significant amount of organizational computing workload at least partially involves sensitive data and therefore cannot be directly outsourced to the public cloud. The scale of these computing tasks also renders existing secure outsourcing techniques less applicable. A natural solution is to split a task, keeping the computation on the private data within an organization's private cloud while moving the rest to the public commercial cloud. However, this hybrid cloud computing is not supported by today's data-intensive computing frameworks, MapReduce in particular, which forces the users to manually split their computing tasks. In this paper, we present a suite of new techniques that make such privacy-aware data-intensive computing possible. Our system, called Sedic, leverages the special features of MapReduce to automatically partition a computing job according to the security levels of the data it works on, and arrange the computation across a hybrid cloud. Specifically, we modified MapReduce's distributed file system to strategically replicate data, moving sanitized data blocks to the public cloud. Over this data placement, map tasks are carefully scheduled to outsource as much workload to the public cloud as possible, given sensitive data always stay on the private cloud. To minimize inter-cloud communication, our approach also automatically analyzes and transforms the reduction structure of a submitted job to aggregate the map outcomes within the public cloud before sending the result back to the private cloud for the final reduction. This also allows the users to interact with our system in the same way they work with MapReduce, and directly run their legacy code in our framework. We implemented Sedic on Hadoop and evaluated it using both real and synthesized computing jobs on a large-scale cloud test-bed. The study shows that our techniques effectively protect sensitive user data, offload a large amount of computation to the public cloud and also fully preserve the scalability of MapReduce.

References

[1]
Network performance within amazon ec2 and to amazon s3. http://blog.rightscale.com/2007/10/28/network-performance-within-amazon%-ec2-and-to-amazon-s3/, 2008.
[2]
Testing amazon web services bandwidth. http://jonathanmaim.com/2008/05/testing-amazon-web-services-bandwidth.h%tml, 2008.
[3]
Amazon virtual private cloud. http://aws.amazon.com/vpc/, 2011.
[4]
Awareness, trust and security to shape government cloud adoption. http://www.lockheedmartin.com/data/assets/isgs/documents/CloudComputing%WhitePaper.pdf, 2011.
[5]
Darpa intrusion detection data set. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/in%dex.html, 2011.
[6]
Enron email dataset. http://www.cs.cmu.edu/~enron/, 2011.
[7]
Future grid portal. https://portal.futuregrid.org/, 2011.
[8]
An introduction to distributed intrusion detection systems. http://www.symantec.com/connect/articles/introduction/distributed/intru%sion/detection/systems, 2011.
[9]
Social media data helping to target extend demand gen campaigns. http://www.demandgenreport.com/archives/feature-articles/594-social-med%ia.html, 2011.
[10]
Sony: Hacker stole playstation users' personal info. http://edition.cnn.com/2011/TECH/gaming.gadgets/04/26/playstation.netwo%rk.hack/index.html, 2011.
[11]
Soot: a java optimization framework. http://www.sable.mcgill.ca/soot/, 2011.
[12]
Spam archieve. http://untroubled.org/spam/, 2011.
[13]
Summary of the amazon ec2 and amazon rds service disruption in the us east region. http://aws.amazon.com/message/65648/, 2011.
[14]
Target marketing. http://en.wikipedia.org/wiki/Target_market, 2011.
[15]
M. J. Atallah and K. B. Frikken. Securely outsourcing linear algebra computations. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, ASIACCS '10, pages 48--59, New York, NY, USA, 2010. ACM.
[16]
M. J. Atallah, F. Kerschbaum, and W. Du. Secure and private sequence comparisons. In Proceedings of the 2003 ACM workshop on Privacy in the electronic society, WPES '03, pages 39--44, New York, NY, USA, 2003. ACM.
[17]
M. J. Atallah and J. Li. Secure outsourcing of sequence comparisons. Int. J. Inf. Secur., 4:277--287, October 2005.
[18]
D. Bernstein, E. Ludvigson, K. Sankar, S. Diamond, and M. Morrow. Blueprint for the intercloud - protocols and formats for cloud computing interoperability. Internet and Web Applications and Services, International Conference on, 0:328--336, 2009.
[19]
M. Blanton and M. Aliasgari. Secure outsourcing of dna searching via finite automata. In S. Foresti and S. Jajodia, editors, DBSec, volume 6166 of Lecture Notes in Computer Science, pages 49--64. Springer, 2010.
[20]
F. Bruekers, S. Katzenbeisser, K. Kursawe, and P. Tuyls. Privacy-preserving matching of dna profiles. Technical Report Report 2008/203, ACR Cryptology ePrint Archive, 2008.
[21]
S. Chong, J. Liu, A. C. Myers, X. Qi, K. Vikram, L. Zheng, and X. Zheng. Secure web applications via automatic partitioning. SIGOPS Oper. Syst. Rev., 41:31--44, October 2007.
[22]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.
[23]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.
[24]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1--38, 1977.
[25]
Y. Duan, N. Youdao, J. Canny, and J. Zhan. P4p: Practical large-scale privacy-preserving distributed computation robust against malicious users abstract. In Proceedings of the 19th USENIX Security Symposium, Washington, DC, August 2010.
[26]
C. Dwork. Differential privacy. In in ICALP, pages 1--12. Springer, 2006.
[27]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative mapreduce. Technical report, Indiana University, Bloomington, IN, 2010.
[28]
T. A. S. Foundation. Apache Hadoop Project. http://hadoop.apache.org/, 2010.
[29]
B. Furht. Cloud computing fundamentals. In B. Furht and A. Escalante, editors, Handbook of Cloud Computing, pages 3--19. Springer US, 2010.
[30]
C. Gentry. Fully homomorphic encryption using ideal lattices. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC '09, pages 169--178, New York, NY, USA, 2009. ACM.
[31]
S. Hohenberger and A. Lysyanskaya. How to securely outsource cryptographic computations. In J. Kilian, editor, Theory of Cryptography, volume 3378 of Lecture Notes in Computer Science, pages 264--282. Springer Berlin / Heidelberg, 2005.
[32]
S. Jha, L. Kruger, and V. Shmatikov. Towards practical privacy for genomic computation. In 2008 IEEE Symposium on Security and Privacy, 2008.
[33]
A. V. Konstantinou, T. Eilam, M. Kalantar, A. A. Totok, W. Arnold, and E. Snible. An architecture for virtual solution composition and deployment in infrastructure clouds. In Proceedings of the 3rd international workshop on Virtualization technologies in distributed computing, VTDC '09, pages 9--18, New York, NY, USA, 2009. ACM.
[34]
B. Langmead, M. Schatz, J. Lin, M. Pop, and S. Salzberg. Searching for SNPs with cloud computing. Genome Biology, 10(11):R134, November 2009.
[35]
A. W. S. LLC. Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/, 2010.
[36]
A. W. S. LLC. Amazon Web Services Customer Agreement. http://aws.amazon.com/agreement/, 2010.
[37]
M. C. I. Lockheed~Martin, LM Cyber Security~Alliance. Awareness, trust and security to shape government cloud adoption. http://www.lockheedmartin.com/data/assets/isgs/documents/CloudComputing%WhitePaper.pdf, April 2010.
[38]
T. Matsumoto, K. Kato, and H. Imai. Speeding up secret computations with insecure auxiliary devices. In Proceedings of the 8th Annual International Cryptology Conference on Advances in Cryptology, pages 497--506, London, UK, 1990. Springer-Verlag.
[39]
A. C. Myers and B. Liskov. Protecting privacy using the decentralized label model. ACM Trans. Softw. Eng. Methodol., 9(4):410--442, 2000.
[40]
NSF. Award abstract #$091081 - futuregrid: An experimental, high-performance grid test-bed, 2009.
[41]
I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: Security and privacy for mapreduce. In NSDI, pages 297--312. USENIX Association, 2010.
[42]
M. C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363--1369, 2009.
[43]
A. W. Services. Aws case study: Washington post. http://aws.amazon.com/solutions/case-studies/washington-post/, December As of 2010.
[44]
D. Szajda, M. Pohl, J. Owen, and B. G. Lawson. Toward a practical data privacy scheme for a distributed implementation of the smith-waterman genome sequence comparison algorithm. In NDSS. The Internet Society, 2006.
[45]
C. Trapnell and S. L. Salzberg. How to map billions of short reads onto genomes. Nature biotechnology, 27(5):455--457, May 2009.
[46]
M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan. Fully homomorphic encryption over the integers. In H. Gilbert, editor, Advances in Cryptology - EUROCRYPT 2010, volume 6110 of Lecture Notes in Computer Science, pages 24--43. Springer Berlin / Heidelberg, 2010.
[47]
C. Wang, Q. Wang, K. Ren, and W. Lou. Privacy-preserving public auditing for data storage security in cloud computing. In Proceedings of the 29th conference on Information communications, INFOCOM'10, pages 525--533, Piscataway, NJ, USA, 2010. IEEE Press.
[48]
R. Wang, X. Wang, Z. Li, H. Tang, M. K. Reiter, and Z. Dong. Privacy-preserving genomic computation through program specialization. In CCS '09: Proceedings of the 16th ACM conference on Computer and communications security, pages 338--347, New York, NY, USA, 2009. ACM.
[49]
Z. Yang, S. Zhong, and R. N. Wright. Privacy-preserving classification of customer data without loss of accuracy. In In SIAM SDM, pages 21--23, 2005.

Cited By

View all
  • (2025)SPAM: An Enhanced Performance of Security and Privacy-Aware Model over Split Learning in Consumer ElectronicsProgramming and Computer Software10.1134/S036176882470081650:8(875-899)Online publication date: 12-Jan-2025
  • (2023)Flare: A Fast, Secure, and Memory-Efficient Distributed Analytics FrameworkProceedings of the VLDB Endowment10.14778/3583140.358315816:6(1439-1452)Online publication date: 20-Apr-2023
  • (2023)A survey on social-physical sensing: An emerging sensing paradigm that explores the collective intelligence of humans and machinesCollective Intelligence10.1177/263391372311708252:2(263391372311708)Online publication date: 25-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '11: Proceedings of the 18th ACM conference on Computer and communications security
October 2011
742 pages
ISBN:9781450309486
DOI:10.1145/2046707
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic program analysis
  2. cloud security
  3. computation split
  4. data privacy
  5. mapreduce

Qualifiers

  • Research-article

Conference

CCS'11
Sponsor:

Acceptance Rates

CCS '11 Paper Acceptance Rate 60 of 429 submissions, 14%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)SPAM: An Enhanced Performance of Security and Privacy-Aware Model over Split Learning in Consumer ElectronicsProgramming and Computer Software10.1134/S036176882470081650:8(875-899)Online publication date: 12-Jan-2025
  • (2023)Flare: A Fast, Secure, and Memory-Efficient Distributed Analytics FrameworkProceedings of the VLDB Endowment10.14778/3583140.358315816:6(1439-1452)Online publication date: 20-Apr-2023
  • (2023)A survey on social-physical sensing: An emerging sensing paradigm that explores the collective intelligence of humans and machinesCollective Intelligence10.1177/263391372311708252:2(263391372311708)Online publication date: 25-Apr-2023
  • (2023)A Role-Based Encryption (RBE) Scheme for Securing Outsourced Cloud Data in a Multi-Organization ContextIEEE Transactions on Services Computing10.1109/TSC.2022.319425216:3(1647-1661)Online publication date: 1-May-2023
  • (2022)Review on Advanced Cost Effective Approach for Privacy with Dataset in Cloud StorageJournal of ISMAC10.36548/jismac.2022.2.0014:2(73-83)Online publication date: 13-Jul-2022
  • (2022)What is the price for joining securely?Proceedings of the VLDB Endowment10.14778/3494124.349414615:3(659-672)Online publication date: 4-Feb-2022
  • (2022)Differentially Oblivious Data Analysis With Intel SGX: Design, Optimization, and EvaluationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.310631719:6(3741-3758)Online publication date: 1-Nov-2022
  • (2021)Integrating Cybersecurity Into a Big Data EcosystemMILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)10.1109/MILCOM52596.2021.9652997(69-76)Online publication date: 29-Nov-2021
  • (2020)Big Data Privacy in Biomedical ResearchIEEE Transactions on Big Data10.1109/TBDATA.2016.26088486:2(296-308)Online publication date: 1-Jun-2020
  • (2020)A Hybrid Data Access Control Using AES and RSA for Ensuring Privacy in Electronic Healthcare Records2020 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS)10.1109/ICPECTS49113.2020.9337051(1-5)Online publication date: 10-Dec-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media