Abstract
We introduce a new data representation that serves mainly for privacy preserving data storage with efficient search and retrieval capabilities over the distributed systems. The cornerstone of the proposed scheme is based on a novel algorithm that splits an input bit sequence \(\mathtt {B[1..n]}\) into two as left and right partitions with well–control over the partition sizes, and the reconstruction of \(\texttt{B}\) in absence of either partition is hard to achieve. The algorithm processes the input bit stream in blocks of \(\texttt{d}\)–bits, where initially each block is replaced with another \(\texttt{d}\)–bit according to a randomly chosen permutation of the set \(\mathtt {\{0,1,..2^d{-}1\}}\). Following the replacement, the leftmost bits of each block up until and including the \(\texttt{q}\)th set bit are appended to the left and the remaining bits to the right partition. We prove that the expected length of the left partition is \(\mathtt {\ell \approx {2qn}/{d}}\) bits and the right partition becomes of length \(\mathtt {|R| = n - \ell }\) bits. Therefore, there is no overhead on the new representation with respect to original input. We also show that due to the randomization step, the input data \(\texttt{B}\) is not required to follow any special probability distribution to have the mentioned partitioning ratio \(\mathtt {\rho = 2q{/}d}\) and it is possible to tune the parameters \(\texttt{d}\) and \(\texttt{q}\) to support any desired ratio \(\mathtt {\rho }\) on the input. We consider recursive application of that splitting algorithm on each partitions, which can be viewed as generating a full binary tree with \(\texttt{k}\)–leaves such that at each internal node the data is subject to the proposed splitting operation. Such a construction represents an input bit sequence \(\mathtt {B[1..n]}\) with \(\texttt{k}\) partitions as \(\mathtt {P_1, P_2, \ldots ,P_k}\), where it is hard to reconstruct the original data in absence of any \(\mathtt {P_i}\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bard, G.V., Ault, S.V., Courtois, N.T.: Statistics of random permutations and the cryptanalysis of periodic block ciphers. Cryptologia 36(3), 240–262 (2012)
Beimel, A.: Secret-sharing schemes: a survey. In: Chee, Y.M., et al. (eds.) IWCC 2011. LNCS, vol. 6639, pp. 11–46. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20901-7_2
Benet, J.: IPFS-content addressed, versioned, P2P file system. arXiv preprint arXiv:1407.3561 (2014)
Camtepe, S., et al.: Compcrypt-lightweight ANS-based compression and encryption. IEEE Trans. Inf. Forensics Secur. 16, 3859–3873 (2021)
Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms, pp. 13–22 (2001)
Duda, J.: Asymmetric numeral systems. arXiv preprint arXiv:0902.0271 (2009)
Duda, J., Niemiec, M.: Lightweight compression with encryption based on asymmetric numeral systems. arXiv preprint arXiv:1612.04662 (2016)
Durstenfeld, R.: Algorithm 235: random permutation. Commun. ACM 7(7), 420 (1964). https://doi.org/10.1145/364520.364540
Fraenkel, A.S., Klein, S.T.: Complexity aspects of guessing prefix codes. Algorithmica 12, 409–419 (1994)
Gillman, D.W., Mohtashemi, M., Rivest, R.L.: On breaking a huffman code. IEEE Trans. Inf. Theory 42(3), 972–976 (1996)
Kaaniche, N., Laurent, M.: Data security and privacy preservation in cloud storage environments based on cryptographic mechanisms. Comput. Commun. 111, 120–141 (2017)
Li, R., Song, T., Mei, B., Li, H., Cheng, X., Sun, L.: Blockchain for large-scale internet of things data storage and protection. IEEE Trans. Serv. Comput. 12(5), 762–771 (2018)
Li, Y., Gai, K., Qiu, L., Qiu, M., Zhao, H.: Intelligent cryptography approach for secure distributed big data storage in cloud computing. Inf. Sci. 387, 103–115 (2017). https://doi.org/10.1016/j.ins.2016.09.005. https://www.sciencedirect.com/science/article/pii/S0020025516307319
Martins, P., Sousa, L., Mariano, A.: A survey on fully homomorphic encryption: an engineering perspective. ACM Comput. Surv. (CSUR) 50(6), 1–33 (2017)
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: 2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 60–70. SIAM (2007)
Plackett, R.L.: The analysis of permutations. J. R. Stat. Soc.: Ser. C: Appl. Stat. 24(2), 193–202 (1975)
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms (TALG) 3(4), 43-es (2007)
Rubin, F.: Cryptographic aspects of data compression codes. Cryptologia 3(4), 202–205 (1979)
Sharma, P., Jindal, R., Borah, M.D.: Blockchain technology for cloud storage: a systematic literature review. ACM Comput. Surv. (CSUR) 53(4), 1–32 (2020)
Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68552-4_12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Külekci, M.O. (2024). Randomized Data Partitioning with Efficient Search, Retrieval and Privacy-Preservation. In: Wu, W., Tong, G. (eds) Computing and Combinatorics. COCOON 2023. Lecture Notes in Computer Science, vol 14422. Springer, Cham. https://doi.org/10.1007/978-3-031-49190-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-49190-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49189-4
Online ISBN: 978-3-031-49190-0
eBook Packages: Computer ScienceComputer Science (R0)