skip to main content
10.1145/2752952.2752960acmconferencesArticle/Chapter ViewAbstractPublication PagessacmatConference Proceedingsconference-collections
research-article

Initial Encryption of large Searchable Data Sets using Hadoop

Published: 01 June 2015 Publication History

Abstract

With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours.

References

[1]
Popa, R. A., Redfield, C., Zeldovich, N., & Balakrishnan, H. 2011. Cryptdb: protecting confidentiality with encrypted query processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles.
[2]
Boldyreva, A., Chenette, N., Lee, Y., & O'neill, A. 2009. Order-preserving symmetric encryption. In Advances in Cryptology - EUROCRYPT.
[3]
Paillier, P. 1999. Public-key cryptosystems based on composite degree residuosity classes. In Advances in cryptology - EUROCRYPT.
[4]
Pohlig, S. C., Hellman, M. E. 1978. An improved algorithm for computing logarithms over and its cryptographic significance (corresp.). In Transactions on Information Theory.
[5]
Apache Hadoop. http://hadoop.apache.org/.
[6]
Apache Sqoop. http://sqoop.apache.org/.
[7]
Apache Oozie. http://oozie.apache.org/.
[8]
Apache Ambari. http://ambari.apache.org/.
[9]
SAP HANA. http://hana.sap.com/abouthana.html.
[10]
SAP HANA 'IMPORT FROM'. http://help.sap.com/saphelp_hanaplatform/helpdata/en/20/f712e175191014907393741fadcb97/content.htm.
[11]
SAP Simple Finance aka sFIN. http://scn.sap.com/docs/DOC-59882.
[12]
Kerschbaum et al. 2013. Optimal Re-Emcryption Strategy for Joins in Encrypted Databases. In Working Conference on Data and Applications Security and Privacy (DBSec).
[13]
Florian Kerschbaum et al. 2013. Adjustably encrypted in-memory column-store. In ACM Conference on Computer and Communications Security.

Cited By

View all
  • (2016)Automated k-Anonymization and l-Diversity for Shared Data PrivacyProceedings, Part I, 27th International Conference on Database and Expert Systems Applications - Volume 982710.1007/978-3-319-44403-1_7(105-120)Online publication date: 5-Sep-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SACMAT '15: Proceedings of the 20th ACM Symposium on Access Control Models and Technologies
June 2015
242 pages
ISBN:9781450335560
DOI:10.1145/2752952
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database
  2. hadoop
  3. performance
  4. searchable encryption

Qualifiers

  • Research-article

Conference

SACMAT '15
Sponsor:

Acceptance Rates

SACMAT '15 Paper Acceptance Rate 17 of 59 submissions, 29%;
Overall Acceptance Rate 177 of 597 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Automated k-Anonymization and l-Diversity for Shared Data PrivacyProceedings, Part I, 27th International Conference on Database and Expert Systems Applications - Volume 982710.1007/978-3-319-44403-1_7(105-120)Online publication date: 5-Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media