Abstract
Several prominent privacy regulation (e.g., CCPA and GDPR) require service providers to let consumers request access to, correct, or delete, their personal data. Compliance necessitates verification of consumer identity. This is not a problem for consumers who already have an account with a service provider since they can authenticate themselves via a successful account log-in. However, there are no such methods for accountless consumers, even though service providers routinely collect data about casual consumers, i.e., those without accounts. Currently, in order to access their collected data, accountless consumers are asked to provide Personally Identifiable Information (PII) to service providers, which is privacy-invasive.
To address this problem, we propose \(\mathcal {P}\textsf{IVA}\) : Privacy-Preserving Identity Verification for Accountless Users, a technique based on Private List Intersection (PLI) and its variants. First, we introduce PLI, a close relative of private set intersection (PSI), a well-known cryptographic primitive that allows two or more mutually suspicious parties to compute the intersection of their private input sets. PLI takes advantage of the (ordered and fixed) list structure of each party’s private set. As a result, PLI is more efficient than PSI. We also explore PLI variants: PLI-cardinality (PLI-CA), threshold-PLI (t-PLI), and threshold-PLI-cardinality (t-PLI-CA), all of which yield less information than PLI. These variants are progressively better suited for addressing the accountless consumer authentication problem.
We prototype \(\mathcal {P}\textsf{IVA}\) and compare its performance against techniques based on regular PSI and garbled circuits (GCs). Results show that proposed PLI and PLI-CA constructions are more efficient than GC-based techniques, in terms of both computation and communication overheads. While GC-based t-PLI and t-PLI-CA execute faster, proposed constructs greatly outperform the former in terms of bandwidth, e.g., our t-PLI protocol consumes \(16\times \) less bandwidth. We also show that proposed protocols can be made secure against malicious adversaries, with only moderate increases in overhead. These variants outperform their GC-based counterparts by at least one order of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
While GDPR and LGPD apply to any entity processing residents’ data in their respective regions, CCPA only applies to for-profit businesses that have a gross revenue above $25 million, handle the data of more than 100,000 California residents, or generate 50% or more of their revenue by selling California residents’ PII (in 1798.140 (d)) [8].
- 2.
All our ZKPoK proofs are variants of Schnorr’s proof of discrete logarithm, made non-interactive given an RO hash. They add \(2\lambda \) bits to bandwidth and 1 (multi-)exponentiation to the computation of both parties per statement, see [9].
References
Repository for piva (2023). https://github.com/zane-a-karl/PLI
Adhatarao, S., Lauradoux, C., Santos, C.: Why IP-based subject access requests are denied? arXiv preprint arXiv:2103.01019 (2021)
Berlekamp, E.R., Welch, L.R.: Error correction for algebraic block codes (1986)
Boniface, C., Fouad, I., Bielova, N., Lauradoux, C., Santos, C.: Security analysis of subject access request procedures: how to authenticate data subjects safely when they request for their data. In: Privacy Technologies and Policy: 7th Annual Privacy Forum, APF 2019, pp. 182–209 (2019)
Brazil: Lei \(\text{n}^{\underline{o}}\) 13.709, de 14 de agosto de 2018 (2018). http://www.planalto.gov.br/ccivil_03/_Ato2015-2018/2018/Lei/L13709.htm
Bufalieri, L., La Morgia, M., Mei, A., Stefa, J.: GDPR: when the right to access personal data becomes a threat. In: ICWS (2020)
California Attorney General: California consumer privacy act regulations (2020). https://oag.ca.gov/sites/all/files/agweb/pdfs/privacy/oal-sub-final-text-of-regs.pdf?
California Legislature: Title 1.81.5. California consumer privacy act of 2018 (2018). https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3.&part=4. &lawCode=CIV &title=1.81.5
Camenisch, J., Stadler, M.: Proof systems for general statements about discrete logarithms. Technical Report/ETH Zurich, Department of Computer Science (1997)
Cramer, R., Gennaro, R., Schoenmakers, B.: A secure and optimally efficient multi-authority election scheme. In: Advances in Cryptology — EUROCRYPT 1997 (1997)
De Cristofaro, E., Gasti, P., Tsudik, G.: Fast and private computation of cardinality of set intersection and union. In: CANS (2012)
Di Martino, M., Robyns, P., Weyts, W., Quax, P., Lamotte, W., Andries, K.: Personal information leakage by abusing the \(\{\)GDPR\(\}\) ‘right of access’. In: Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019), pp. 371–385 (2019)
Duong, T., Phan, D.H., Trieu, N.: Catalic: delegated PSI cardinality with applications to contact tracing. In: Eurocrypt (2020)
European Data Protection Board: Guidelines 01/2022 on data subject rights - right of access, version 2.0 (2023)
European Parliament and Council: Regulation (EU) 2016/679, general data protection regulation (2016). https://eur-lex.europa.eu/eli/reg/2016/679/
Fisher, R.A., Yates, F.: Statistical Tables for Biological, Agricultural, and Medical Research. Hafner Publishing Company (1953)
Furukawa, J.: Efficient and verifiable shuffling and shuffle-decryption. IEICE Trans. (2005). https://doi.org/10.1093/ietfec/E88-A.1.172
Gamal, T.E.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory (1985)
Ghosh, S., Simkin, M.: The communication complexity of threshold private set intersection. In: CRYPTO (2019)
Groth, J.: A verifiable secret shuffle of homomorphic encryptions. Cryptology ePrint Archive, Paper 2005/246 (2005)
Heinrich, A., Hollick, M., Schneider, T., Stute, M., Weinert, C.: Privatedrop: practical privacy-preserving authentication for apple airdrop. In: USENIX Security (2021)
Jordan, S., Nakatsuka, Y., Ozturk, E., Paverd, A., Tsudik, G.: VICEROY: GDPR-/CCPA-compliant enforcement of verifiable accountless consumer requests. In: NDSS (2023)
Keller, M.: MP-SPDZ: a versatile framework for multi-party computation. In: ACM CCS (2020)
Martino, M.D., Meers, I., Quax, P., Andries, K., Lamotte, W.: Revisiting identification issues in GDPR “right of access” policies: a technical and longitudinal analysis. Proc. Priv. Enhancing Technol. (2022)
Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D., et al.: Location privacy via private proximity testing. In: NDSS, vol. 11 (2011)
Pagnin, E., Gunnarsson, G., Talebi, P., Orlandi, C., Sabelfeld, A.: Toppool: time-aware optimized privacy-preserving ridesharing. Cryptology ePrint Archive (2021)
Pinkas, B., Rosulek, M., Trieu, N., Yanai, A.: PSI from PaXoS: fast, malicious private set intersection. In: Eurocrypt (2020)
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. (1960)
Rindal, P., Schoppmann, P.: VOLE-PSI: fast OPRF and circuit-PSI from vector-OLE. In: Eurocrypt (2021)
Rosulek, M., Trieu, N.: Compact and malicious private set intersection for small sets. In: ACM CCS (2021)
Samarin, N., et al.: Lessons in VCR repair: compliance of android app developers with the California consumer privacy act (CCPA). Proc. Priv. Enhancing Technol. (2023)
Shamir, A.: How to share a secret. Commun. ACM (1979)
Take, K., Gallagher, K., Forte, A., McCoy, D., Greenstadt, R.: “it feels like whack-a-mole”: user experiences of data removal from people search websites. Proc. Priv. Enhancing Technol. (2022)
Trieu, N., Shehata, K., Saxena, P., Shokri, R., Song, D.: Epione: lightweight contact tracing with strong privacy. arXiv preprint arXiv:2004.13293 (2020)
Urban, T., Tatang, D., Degeling, M., Holz, T., Pohlmann, N.: The unwanted sharing economy: an analysis of cookie syncing and user transparency under GDPR. arXiv preprint arXiv:1811.08660 (2018)
Urban, T., Tatang, D., Degeling, M., Holz, T., Pohlmann, N.: A study on subject data access in online advertising after the GDPR. In: ESORICS (2019)
Yao, A.C.: How to generate and exchange secrets (extended abstract). In: 27th Annual Symposium on Foundations of Computer Science (1986)
Zhao, Y., Chow, S.S.M.: Can you find the one for me? In: WPES (2018)
Zhao, Y., Chow, S.S.: Are you the one to share? secret transfer with access structure. PETS (2017)
Acknowledgements
We thank ESORICS’24 reviewers for constructive feedback. This work was supported in part by funding from the NSF Award SATC-1956393.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Security Proofs for \(\mathcal {P}\textsf{IVA}\) Protocols
A Security Proofs for \(\mathcal {P}\textsf{IVA}\) Protocols
Proof of Theorem 1 (PLI security):
Correctness: Assume an execution of the protocol \(\varPi \) with honest \(\mathcal {C}\) and honest \(\mathcal {S}\). \(\mathcal {S}\) computes \(c_i:=\frac{a_{i,2}}{a_{i,1}^{sk}}\), which is equivalent to \((g^{y_i-x_i})^{r_i}\) as follows:
This becomes 1, if \(y_i\) is equal to \(x_i\), or looks random in G, otherwise, as the random \(r_i\) is not known to \(\mathcal {S}\). Therefore, \(\mathcal {S}\) outputs \(X\cap Y\), while \(\mathcal {C}\) outputs \(\perp \).
Server Privacy: For corrupted \(\mathcal {C}\), simulator \(\textsf{SIM}_C\) can be constructed as follows: \(\textsf{SIM}_C\) chooses n random values \((z_1,...z_n)\) in \(Z_q\) and encrypts them under pk, i.e. it sets \(b_i = \textsf{Enc}_{pk}(z_i)\) for all i, and it sends pk and \((b_1,...,b_n)\) to \(\mathcal {C}\). (Note that \(\textsf{SIM}_C\) could use \(\mathcal {C}\) ’s input X in the simulation, but the above algorithm does not need this input.) Because of the IND-CPA security of ElGamal encryption under the hardness assumption of the decisional Diffie-Hellman (DDH) problem, the ciphertexts produced by \(\textsf{SIM}_C\) are indistinguishable from the ones produced by \(\mathcal {S}\) in the real protocol execution.
Client Privacy: For corrupted \(\mathcal {S}\), simulator \(\textsf{SIM}_S\) can be constructed as follows: Given Y and \(X\cap Y\),
-
\(\textsf{SIM}_S\) receives the public key pk and \((b_1,...,b_n)\) from \(\mathcal {S}\), and it sets \(z_i = 0\) for all \(i \in X \cap Y\) and picks \(z_i \leftarrow _{\$}Z_q\) for all \(i\not \in X\cap Y\).
-
\(\textsf{SIM}_S\) computes \(a_i = Enc_{pk}(z_i)\) for all i, and sends \((a_1,...,a_n)\) to \({\mathcal {S}} \).
It follows that \({\mathcal {S}} \)’s view in the interaction with \(\textsf{SIM}_S\) matches the interaction with the real-world \(\mathcal {C}\): In both cases each \(a_i\) is either a random encryption of 1, if \(i\in X\cap Y\), or an encryption of a random value in \(Z_q\), if \(i\not \in X\cap Y\).
Proof of Theorem 2 (PLI-CA security):
Correctness: Similar to Theorem 1, \(c_i\) is 1 if \(y_i=x_i\), or is a random, otherwise. Thus, \(|\{c_i: c_i=1\}|=|X\cap Y|\), which the server outputs.
Server Privacy: Since the view of the (corrupted) \(\mathcal {C}\) is the same as the one in PLI, server privacy is also met in PLI-CA.
Client Privacy: Considering the corrupted \(\mathcal {S}\), a simulator \(\textsf{SIM}_S\) can be similarly constructed. In addition to \(\textsf{SIM}_S\) in Theorem 1, \(\textsf{SIM}_S\) sends \(\pi (A_1,...,A_n)\) to \({\mathcal {S}} \), for a randomly chosen \(\pi \) in \(\mathcal {P}_n\). For some \(\pi , \pi ' \in \mathcal {P}_n\), \({\mathcal {S}} \)’s view in the interaction with \(\textsf{SIM}_S\) is indistinguishable from the output of the protocol execution with real \(\mathcal {C}\), as \(\pi \{ \{1\}_{i \in X \cap Y}, \{z_i\}_{i \notin X \cap Y}\} \overset{c}{\equiv }\ \pi '\{ \{1\}_{i \in X \cap Y}, \{(y_i - x_i)r_i\}_{i \notin X \cap Y}\}.\)
Proof of Theorem 3 (t-PLI security):
Correctness: Assume an execution of the protocol \(\varPi \) with honest \(\mathcal {C}\) and honest \(\mathcal {S}\). For all \(i \in [n]\), \(\mathcal {S}\) computes \(s_i'\) as follows: \((s_i' || hs_i) :=H_i(a_{i,1}^{sk})\oplus e_i = H_i(a_{i,1}^{sk})\oplus H_i(a_{i,2}) \oplus s_i, \forall i\). Since \(H_i(a_{i,1}^{sk}) = H_i(((b_{i,1} \cdot g^{u_i})^{r_i})^{sk}) = H_i(((g^{w_i} \cdot g^{u_i})^{sk})^{r_i}) = H_i((h^{w_i} \cdot h^{u_i})^{r_i})\) and \(H_i(a_{i,2}) = H_i((b_{i,2} \cdot h^{u_i}g^{-x_i})^{r_i}) = H_i((h^{w_i}g^{y_i} \cdot h^{u_i}g^{-x_i}))^{r_i}) = H_i((h^{w_i}\cdot h^{u_i})^{r_i} (g^{y_i-x_i})^{r_i}),\) \(H_i(a_{i,2}) =H_i(a_{i,1}^{sk})\), if \(y_i=x_i\). Otherwise, \(H_i(a_{i,2})\) is a hash of a random element in G. Consequently, if \(y_i=x_i\), \(s_i'= s_i\), or is random, otherwise. Depending on the cardinality of intersection \(I := \{i \mid s_i'=s_i\}_{i\in [n]}\), three possible cases exist:
-
1.
\(|I| \ge k := \lceil \frac{n+t}{2}\rceil \): \(\mathcal {S}\) can apply \(\textsf{BW}\) algorithm to recover (p, err). \(\mathcal {S}\) outputs \(\{y_i: err(i)\ne 0\}_{i\in [n]}\), which is the set of input elements where their corresponding shares are correct, i.e., indices of intersecting elements.
-
2.
\(t \le |I|< k\): \(\mathcal {S}\) can examine every subset of size t, and check which subset reconstruct \(s'\) such that \(H_i(s')=hs_i\).
-
3.
\(|I| < t\): \(\mathcal {S}\) can neither reconstruct \(s'\) nor learn anything about \(x_i\)’s.
Server Privacy: Since the view of the (corrupted) \(\mathcal {C}\) is the same as the one in PLI (and PLI-CA), server privacy is also met in t-PLI.
Client Privacy: Considering corrupted \(\mathcal {S}\), a simulator \(\textsf{SIM}_S\) can be constructed as below. Given Y and \(|X\cap Y|\),
-
\(\textsf{SIM}_S\) receives the public key pk and \((b_1,...,b_n)\) from \({\mathcal {S}} \). It sets \(z_i = 0\) if \(y_i \in X \cap Y\), and \(z_i \leftarrow ^{\$} G\), otherwise.
-
\(\textsf{SIM}_S\) computes \(A_i = \textsf{Enc}_{pk}(z_i)\) for all i.
-
\(\textsf{SIM}_S\) generates \(S\leftarrow \$\), and computes the shares \((S_1,...,S_n) \leftarrow \textsf{Share}_{(t,n)}(S)\).
-
\(\textsf{SIM}_S\) computes \(E_i=H_i(A_{i,2})\oplus H'_i(S_i)\) for all i.
-
\(\textsf{SIM}_S\) sends \((A_{1,1},...,A_{n,1})\) and \((E_1,...,E_n)\) to \({\mathcal {S}} \).
Comparing \({\mathcal {S}} \)’s view in the interaction with \(\textsf{SIM}_S\) and in the real execution of \(\varPi \) with \(\mathcal {C}\), first, the tuples, \((a_{1,1},...,a_{n,1})\) and \((A_{1,1},...,A_{n,1})\), are indistinguishable because of the IND-CPA security of ElGamal encryption, under the hardness of DDH problem. Then, the tuples, \((e_1,...,e_n)\) and \((E_1,...,E_n)\) are also indistinguishable, as \((a_{1,2},...,a_{n,2})\) and \((A_{1,2},...,A_{n,2})\) are indistinguishable because of the IND-CPA security of ElGamal encryption, and through the security of one-time pad encryption with the randomly generated shares guaranteed by Shamir secret sharing scheme.
Proof of Theorem 4 (t-PLI-CA security):
Correctness: Similar to Theorem 3, \(H_i(a_{i,2}) = H_i(a^{sk}_{i,1})\) if \(y_i=x_i\), or is otherwise random. Thus, if the number of intersecting elements \(|I|\ge t\), \(\mathcal {S}\) can apply the Berlekamp-Welch algorithm or examine every subset to obtain the number of errors and obtain \(|\{s_i' : err(i) \ne 0\}_i|=|X\cap Y|\), which the server outputs. If \(|I|<t\), \(\mathcal {S}\) outputs \(\perp \).
Server Privacy: Since the view of the (corrupted) \(\mathcal {C}\) is the same as the one in PLI (and PLI-CA and t-PLI), server privacy is also met in t-PLI-CA.
Client Privacy: Due to the similarity of the protocols, the construction of the simulator \(\textsf{SIM}_S\), considering the corrupted \(\mathcal {S}\), is similar to the one in Theorem 3. In addition to \(\textsf{SIM}_S\) in Theorem 3, the following step is modified: \(\textsf{SIM}_S\) computes \(A_i=\pi (\textsf{Enc}_{pk}(z_i))\) for all i.
As in Theorem 3, \({\mathcal {S}} \)’s view (even without this modification) in the interaction with \(\textsf{SIM}_S\) and the output of the protocol execution with real \(\mathcal {C}\) are indistinguishable, and adding pseudorandom permutations randomly chosen in \(\mathcal {P}_n\) does not change it.
Proof Outlines for Security Against Malicious Participants:
For any malicious server \(\mathcal {S^*}\), simulator \(\textsf{SIM}_S\) can (1) emulate \(H^q\), capturing \(\mathcal {S^*}\) ’s queries to \(H^q\), (2) on \({\mathcal {S}} \)’s message, extracts \(sk=\textsf{DL}_g(pk)\) from the ZKPoK and sets \(M_i=\textsf{PreDec}_{sk}(b_i)\) for each i, and (3) forms \(\mathcal {S^*}\) ’s effective input Y into either protocol, by setting Y[i], for each i, to y s.t. \(\mathcal {S^*}\) queried (i, y) to H and \(M_i=g^{\bar{y}_i}\) for \(\bar{y}_i=H^q_i(y)\), and setting \(Y[i]=\bot \) if there is no such \(H^q\) query. After sending Y to PLI functionality and getting output \(\textsf{out}_S\) back, \(\textsf{SIM}_S\) can then simulate honest client’s message as in the proofs of Theorems 1–4 above.
For malicious client \(\mathcal {C^*}\), \(\textsf{SIM}_C\) extracts \(\delta _i,r_i\) for \(i\in [n]\) from \(\mathcal {C^*}\) ’s message, “pre-decrypt” each \(a_{i,2}\) as \(M_i=b_{i,2}(a_{i,2}^{-1}h^{\delta _i})^{1/r_i}\), and form \(\mathcal {C^*}\) ’s effective input X into PLI, by setting X[i], for each i, to x s.t. \(\mathcal {C^*}\) received \(\bar{x}_i=H^q_i(x)\) from RO \(H^q\) and \(M_i=g^{\bar{x}_i}\), and setting \(X[i]=\bot \) if \(\mathcal {C^*}\) made no such \(H^q\) query. An honest \(\mathcal {S}\) will output \(I=X\cap Y\). In the simulation, \(\textsf{SIM}_C\) verifies the same constraint as in the protocol for \(\bar{x}_i=H(i,x)\), hence \(i\in I\) if \(y_i=x_i\). \(\textsf{SIM}_C\) misses any match where the constraint is satisfied for \(\bar{y}_i=H^q_i(y)\) but \(\mathcal {C^*}\) does not query \(H^q_i\) on y. Since this can happen only with probability n/q, the simulated and real views are computationally indistinguishable.
For the modified t-PLI, \(\textsf{SIM}_C\) will extract \((\delta _i,r_i)\)’s as above, but it modifies the pre-decryption and derivation of \(\mathcal {C^*}\) ’s effective input X: \(\textsf{SIM}_C\) searches through \(\mathcal {C^*}\) ’s queries to \(H^q_i,H'_i,\) and \(H_i\), to determine if there exists s s.t., for some subset of at least t indexes i, \(\bar{x}_i=H^q_i(x_i)\), \(hs_i=H'_i(s)\), and \(ha_i=H_i(a_{i,2})\) satisfy that (1) \((ha_i\oplus e_i)[R]=hs_i\), where z[L] and z[R] stand for resp. the \(\mathbb {F}\) and \(\{0,1\}^{2\lambda }\) components of z; (2) \(g^{\bar{x}_i}=b_{i,2}(a_{i,2}^{-1}h^{\delta _i})^{1/r_i}\); and (3) \(s_i=(ha_i\oplus e_i)[L]\) lie on t-degree polynomial which interpolates to s. If \(\textsf{SIM}_C\) identifies such a subset, it forms X from all \((i,x_i)\) pairs found above and sets all other values in X to \(\bot \). If no such subset of at least t indexes is found, then \(\textsf{SIM}_C\) sets X to \((\bot )^n\). \(\textsf{SIM}_C\) efficiently and correctly simulates the real protocol because if \(H',H\) are RO’s then constraint \((ha_i\oplus e_i)[R]=hs_i\) can be satisfied at most by a single \((ha_i,hs_i)\) pair, except for negligible probability. On the other hand, if fewer than \(t+1\) matches exist in the real protocol, the \(hs_i\) components recovered by \(\mathcal {S}\) are indistinguishable from random, and thus the real protocol execution on X s.t. \(|X\cap Y|<t\) is indistinguishable from the simulation where \(\textsf{SIM}_C\) sets X to an all-empty sequence \((\bot )^n\).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hwang, S., Jarecki, S., Karl, Z., van Kempen, E., Tsudik, G. (2024). \(\mathcal {P}\textsf{IVA}\): Privacy-Preserving Identity Verification Methods for Accountless Users via Private List Intersection and Variants. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14984. Springer, Cham. https://doi.org/10.1007/978-3-031-70896-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-70896-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70895-4
Online ISBN: 978-3-031-70896-1
eBook Packages: Computer ScienceComputer Science (R0)