Lookup arguments allow to prove that the elements of a committed vector come from a (bigger) committed table. They enable novel approaches to reduce the prover complexity of general-purpose zkSNARKs, implementing “non-arithmetic operations" such as range checks, XOR and AND more efficiently. We extend the notion of lookup arguments along two directions and improve their efficiency: (1) we extend vector lookups to matrix lookups (where we can prove that a committed matrix is a submatrix of a committed table). (2) We consider the notion of zero-knowledge lookup argument that keeps the privacy of both the sub-vector/sub-matrix and the table. (3) We present new zero-knowledge lookup arguments, dubbed cq+, zkcq+ and cq++, more efficient than the state of the art, namely the recent work by Eagen, Fiore and Gabizon named cq. Finally, we give a novel application of zero-knowledge matrix lookup argument to the domain of zero-knowledge decision tree where the model provider releases a commitment to a decision tree and can prove zero-knowledge statistics over the committed data structure. Our scheme based on lookup arguments has succinct verification, prover’s time complexity asymptotically better than the state of the art, and is secure in a strong security model where the commitment to the decision tree can be malicious.
This is due to the fact that \(\textsf {cq} \) assumes an SRS of the same size as the table \(\boldsymbol{\textbf{t}}\), and this allows avoiding a degree check. This condition, though, is often not guaranteed (e.g., in a SNARK for constraint systems larger than such a table).
Specifically, giving up only to the privacy of the structure of the decision tree while keeping private the values of the thresholds and labels.
Recently, Setty, Thaler and Wahby [35] introduced a new lookup argument for a restricted subclass of tables. Their work is extremely efficient, and in particular more efficient than \(\textsf {cq} \), for such a restricted class of tables. On the other hand, \(\textsf {cq} \) can handle arbitrary tables. For this reason, we refer to \(\textsf {cq} \) as the state-of-art for arbitrary tables.
We believe that this does not pose any problems neither for correctness nor for soundness, as indeed, one could argue this is a feature rather than a bug.
As a bottleneck, the dependency [40] has on the hash function is one that is hard to remove. Applying a hash function optimized for SNARK constraints, e.g. the one we used to experimentally run [40]—SWIFFT—nonetheless yields high constants in practice regardless of the proof system used as a backend.
As argued in [8], we can define a vacuous CP-SNARK for opening in the AGM where the prover does nothing and the verifier checks that the commitment is a valid group element. However, Lipmaa et al. [28] recently defined AGMOS, a more realistic variant of the AGM where the algebraic adversary can obliviously sample group elements. They pointed out that KZG is only extractable after the prover has successfully opened the commitment at some point. In this case, such a vacuous CP-SNARK is not sufficient. We leave it to further work to prove the security of our protocols in AGMOS.
Alternatively, one could define one single algorithm \(\textsf{Der}\) that handles both public and private data. In this case, one needs to redefine the Universal SNARK’s framework to handle zero knowledge correctly. Our definition instead is only functional as we require that \(\textsf{Preproc}\), \(\textsf{Prove}\) form a two-step prover algorithm for a Universal SNARK.
Alternatively, we can consider the same subgroup used for the matrix commitment and thus \(|{\mathbb {H}}| = N_{\textsf{tot}}\cdot d\).
The idea is to consider the table \(\boldsymbol{\textbf{b}} = (j)_{j\in [B]}\) and prove, through a lookup argument, that that \(\boldsymbol{\mathbf {\bar{x}}} \prec \boldsymbol{\textbf{b}}\) where \(\boldsymbol{\mathbf {\bar{x}}}\) is the vectorization of \(\boldsymbol{\textbf{X}}\).
We approximate the size of field elements with that of \(\mathbb {G}_1\) elements.
In typical applications of decision trees the labels are integer values belonging to a small domains, for example, either booleans or bytes.
Here expressed as a sum instead of a fraction. Since the size of the sample is public this is equivalent.
These estimates refer to running times on an AWS EC2 c5.9xlarge. This architecture is comparable to the one used in [40].
This work has received funding from the MESRI-BMBF French-German joint project named PROPOLIS (ANR-20-CYAL-0004-01), the Dutch Research Council (NWO) under Project Spark! Living Lab (439.18.453B), the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project PICOCRYPT (grant agreement No. 101001283), and from the Spanish Government MCIN/AEI/ 10.13039/501100011033/ under projects PRODIGY (TED2021-132464B-I00) and ESPADA (PID2022-142290OB-I00). The last two projects are co-funded by European Union FEDER and NextGenerationEU/PRTR funds.
We thank Melek Onën for her contributions during the early stages of this project.
