Abstract
MapReduce is a powerful model for parallel data processing. The motivation of this work is to allow running map-reduce jobs partially on untrusted infrastructures, such as public clouds and desktop grid, while using a trusted infrastructure, such as private cloud, to ensure that no outsider could get the ’entire’ information. Our idea is to break data into meaningless chunks and spread them on a combination of public and private clouds so that the compromise would not allow the attacker to reconstruct the whole data-set. To realize this, we use the Information Dispersion Algorithms (IDA), which allows to split a file into pieces so that, by carefully dispersing the pieces, there is no method for a single node to reconstruct the data if it cannot collaborate with other nodes. We propose a protocol that allows MapReduce computing nodes to exchange the data and perform IDA-aware MapReduce computation. We conduct experiments on the Grid’5000 testbed and report on performance evaluation of the prototype.
This work is supported by the French Agence Nationale de la Recherche through the MapReduce grant under contract ANR-10-SEGI- 001-01, as well as INRIA ARC BitDew.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Eggers, K.W., LaPadula, L.J., Olson, I.M., Abrams, M.D.: A generalized framework for access control: an informal description. In: Proceedings of the 13th NIST-NCSC National Computer Security Conference, pp. 135–143 (1990)
Christophe Cérin and Gilles Fedak. Desktop grid Computing. CRC Press (June 2012)
Dingledine, R., Freedman, M.J., Molnar, D.: The free haven project: Distributed anonymous storage service. In: Federrath, H. (ed.) Designing Privacy Enhancing Technologies. LNCS, vol. 2009, pp. 67–95. Springer, Heidelberg (2001)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004)
Dwork, C.: Differential privacy in new settings. In: Charikar, M. (ed.) Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 174–183. SIAM (2010)
Dwork, C.: Differential privacy. In: Encyclopedia of Cryptography and Security, 2nd edn., pp. 338–340 (2011)
McCarty, B.: Selinux: Nsa’s open source security enhanced linux. O’Reilly and Associates (2004)
Rabin, M.O.: Efficient dispersal of information for security, load balancing, and fault tolerance. J. ACM 36(2), 335–348 (1989)
Rabin, M.O.: The information dispersal algorithm and its applications. In: Capocelli, R.M. (ed.) Sequences, pp. 406–419. Springer, New York (1990)
Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for mapreduce. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI 2010, p. 20. USENIX Association, Berkeley (2010)
Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: Proceedings of the 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC 2010, pp. 193–200. IEEE Computer Society, Washington, DC (2010)
Wei, W., Du, J., Yu, T., Gu, X.: Securemr: A service integrity assurance framework for mapreduce. In: Proceedings of the 2009 Annual Computer Security Applications Conference, ACSAC 2009, pp. 73–82. IEEE Computer Society, Washington, DC (2009)
White, T.: Hadoop: The Definitive Guide. Definitive Guide Series. O’Reilly Media, Incorporated (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ben Cheikh, A., Abbes, H., Fedak, G. (2014). Towards Privacy for MapReduce on Hybrid Clouds Using Information Dispersal Algorithm. In: Hameurlain, A., Dang, T.K., Morvan, F. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2014. Lecture Notes in Computer Science, vol 8648. Springer, Cham. https://doi.org/10.1007/978-3-319-10067-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-10067-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10066-1
Online ISBN: 978-3-319-10067-8
eBook Packages: Computer ScienceComputer Science (R0)