Experimental analyses of the K-hidden algorithm

https://doi.org/10.1016/j.engappai.2016.05.010Get rights and content

Abstract

The K-hidden algorithm is proposed in our previous work, and it is a more fine-grained algorithm for generating negative databases (NDBs). The hardness of reversing the K-hidden-NDB (i.e., the NDB that is generated by the K-hidden algorithm) by the local search strategy and the Unit Clause heuristics has been analyzed in theory in the previous work. It was demonstrated that the K-hidden-NDB could be more hard-to-reverse (with regard to the local search strategy) and diverse than the p-hidden-NDB and q-hidden-NDB (NDBs that are generated by the typical p-hidden algorithm and q-hidden algorithm, respectively). However, no experiments was carried out in the previous work to verify the hardness of reversing the K-hidden-NDB and its diversity. In this paper, several experiments, which employ three SAT solvers, are carried out to verify the hardness of reversing the K-hidden-NDB and its diversity. Two solvers are traditional SAT solvers (i.e., WalkSAT and zChaff) based on the local search strategy and the Unit Clause heuristics, respectively, and they are widely used in verifying the hardness of SAT instances and reversing NDBs. Another one is a state-of-the-art SAT solver, which won the gold prize of the Sequential Random SAT Track in SAT Competition 2014.

Introduction

The negative database (NDB), which is inspired by the Negative Selection (NS) mechanism in Biological Immune System, was first proposed by Esponda, 2008, Esponda et al., 2004b. In traditional databases, data are presented, stored and operated based on what they actually are, and these databases are called positive databases. In contrast to positive databases, in negative databases, data are presented, stored and operated based on what they are not, i.e., negative databases represent the information in the complementary set of the positive databases. Esponda et al., 2004a, Esponda et al., 2004b, Esponda et al., 2009 proved that reversing the NDB is equivalent to solving the SAT problem, and it is NP-hard. Based on this property, researchers tried to design algorithms for generating hard-to-reverse NDBs and exploit the properties of NDBs for privacy preservation and data security. Compared with traditional security techniques, e.g., classical encryption algorithms, one of the important advantages of the NDB is that it can directly support some database operators and computations (without decryption) (Esponda et al., 2007a, Liu et al., 2013, Zhao and Luo, 2013a), e.g., Insert, Union, Morph and Hamming distance estimation. Because the generation and some operations of NDBs could be efficient, NDBs could be more suitable for big data applications. Nowadays, the NDB has been applied to several scenarios, e.g., information hiding (Esponda, 2008), privacy protection (Esponda et al., 2007b), authentication system (Dasgupta and Azeem, 2008), secure iris recognition (Bringer and Chabanne, 2010, Zhao et al., 2015a), privacy-preserving data mining (Liu et al., 2013), secure multi-party computation (Zhao and Luo, 2013a) and privacy-preserving data publication (Du et al., 2014).

It is well known that engineering applications usually require a fine-grained control to ensure the flexibility and dependability. Specifically, in those applications that involve security and privacy, different users or data may have different security requirements, and thus, an algorithm or a mechanism that can control the security in a fine-grained manner will be very important. For example, in the privacy-preserving data publication, different data may involve different levels of privacy, and thus, they need different levels of protection in publication. Therefore, an effective privacy-preserving data publication model (e.g., the k-anonymity model (Sweeney, 2002) or l-diversity model (Machanavajjhala et al., 2007)) usually is able to control the security of privacy in a fine-grained manner. Presently, a few algorithms for generating hard-to-reverse NDBs have been proposed. Compared with existing algorithms, the K-hidden algorithm proposed in our previous work could be more fine-grained (Zhao et al., 2015b).

The K-hidden algorithm is more fine-grained than most of existing NDB generation algorithms, e.g., the typical p-hidden algorithm (Liu et al., 2014) and q-hidden algorithm (Jia et al., 2005). The K-hidden algorithm controls the distribution of different types of entries in NDBs by K−1 parameters, and thus controls the hardness of reversing NDBs (by the local search strategy). It was demonstrated that the K-hidden-NDB (i.e., the NDB generated by the K-hidden algorithm) could be more hard-to-reverse (against the local search strategy (Selman et al., 1995)) than the p-hidden-NDB (the NDB generated by the p-hidden algorithm) and the q-hidden-NDB (the NDB generated by the q-hidden algorithm) (Zhao et al., 2015b). The K-hidden-NDB could be the same hard-to-reverse with regard to the Unit Clause (UC) heuristics (Jia et al., 2005, Achlioptas and Peres, 2004, Achlioptas, 2001a) as the q-hidden-NDB (Zhao et al., 2015b).

However, no experiments was carried out in the previous work (Zhao et al., 2015b) to verify the hardness of reversing the K-hidden-NDB. This paper is an extension of the work in (Zhao et al., 2015b), and experimental results are presented. Specifically, three SAT solvers are employed to solve the SAT instances that are converted from K-hidden-NDBs. Two of them are traditional SAT solvers (i.e., WalkSAT (Selman et al., 1995) and zChaff (Mahajan et al., 2004)) based on the local search strategy and the UC heuristics, respectively, and they are widely used in verifying the hardness of SAT instances and reversing NDBs (Liu et al., 2014, Jia et al., 2005, Achlioptas and Peres, 2004, Achlioptas, 2001a). Another one is a state-of-the-art SAT solver called Dimetheus (Gableske, 2014a, Gableske, 2014b), and it won the gold prize of the Sequential Random SAT Track in SAT Competition 2014. Experimental results show that the K-hidden-NDB could be more hard-to-reverse against these solvers than the q-hidden-NDB and p-hidden-NDB, and the K-hidden-NDB could achieve more diverse hardness levels.

Section snippets

Existing algorithms for generating NDBs

Currently, several algorithms for generating NDBs have been proposed. Specifically, the first group of algorithms for generating NDBs were designed by Esponda et al., 2004b, Esponda et al., 2004a. They proposed the prefix algorithm and Randomize_NDB (RNDB) algorithm. The prefix algorithm is simple and efficient, and it can generate a complete NDB (i.e., an NDB that covers all the strings in complementary set of the positive database). However, the prefix algorithm is a deterministic algorithm,

Negative databases

The negative database (NDB) was first proposed by Esponda et al. (2004b). Instead of storing the original data, a negative database stores the compressed form of the complementary set of the original database for the security purpose. It has been proved that reversing the NDB is an NP-hard problem (Esponda et al., 2004b). Currently, most of existing work about the NDB is based on the binary representation. The details of the binary NDB are given as follows.

Assume the positive database is DB={x1

Experiments

In this section, three SAT solvers are chosen for verifying the hardness of reversing the K-hidden-NDB. The first one is WalkSAT (Selman et al., 1995), and it is a traditional SAT solver based on the local search strategy. The second one is zChaff (Mahajan et al., 2004), and it is a traditional SAT solver based on the DPLL strategy and the Unit Clause heuristics. The two solvers are widely used in verifying the hardness of SAT instances and reversing NDBs. The last SAT solver is Dimetheus (

Conclusions and future work

This paper is an extension of our previous work (Zhao et al., 2015b). The NDB is a promising privacy-preserving technique, the K-hidden algorithm was proposed in our previous work (Zhao et al., 2015b) for generating NDBs. The K-hidden algorithm is more fine-grained than most of existing algorithms for generating NDBs. It was demonstrated in (Zhao et al., 2015b) that the K-hidden-NDB could be more hard-to-reverse against the local search strategy than the p-hidden-NDB and the q-hidden-NDB in

Acknowledgements

This work is partly supported by National Natural Science Foundation of China (No. 61175045).

References (32)

  • D. Achlioptas

    Lower bounds for random 3-SAT via differential equations

    Theor. Comput. Sci.

    (2001)
  • R. Liu et al.

    Hiding multiple solutions in a hard 3-SAT formula

    Data Knowl. Eng.

    (2015)
  • Achlioptas, D., Beame, P., Molloy, M., 2001b. A sharp threshold in proof complexity. in: Proceedings of the...
  • D. Achlioptas et al.

    The Threshold for Random k-SAT is 2k log2-O(k)

    J. Am. Math. Soc

    (2004)
  • D. Achlioptas et al.

    Hiding satisfying assignments: two are better than one

    J. Artif. Intell. Res.

    (2005)
  • Y. Asahiro et al.

    Random generation of test instances with controlled attributes. DIMACS Series Discrete Math

    Theor. Comput. Sci

    (1996)
  • Balint, A., Schoning, U., 2012. Choosing probability distributions for stochastic local search and the role of make...
  • Bringer, J., Chabanne, H., 2010. Negative databases for biometric data. In: Proceedings of the 12th ACM Workshop on...
  • Dasgupta, D., Azeem, R., 2008. An investigation of negative authentication systems. In: Proceedings of the Third...
  • X. Du et al.

    Negative publication of data

    Int. J. Immune Comput

    (2014)
  • Esponda, F., Ackley, E.S., Forrest, S., Helman, P., 2004a. Online negative database. In: Proceedings of the Third...
  • Esponda, F., Forrest, S., Helman, P., 2004b. Enhancing privacy through negative representations of data, University of...
  • F. Esponda

    Negative representations of information (Ph.D. thesis). Department of Computer Science

    (2005)
  • Esponda, F., Trias, E.D., Ackley, E.S., Forrest, S., 2007a. A relational algebra for negative databases, Department of...
  • F. Esponda et al.

    Protecting data privacy through hard-to-reverse negative databases

    Int. J. Inf. Secur.

    (2007)
  • Esponda, F., 2008. Hiding a needle in a haystack using negative databases, In: Proceedings of the 10th International...
  • Cited by (0)

    View full text