Abstract
Privacy preserving data mining algorithms are proposed to protect the participating parties’ data privacy in data mining processes. So far, most of these algorithms only work in the semi-honest model that assumes all parties follow the algorithms honestly. In this paper, we propose two privacy preserving perceptron learning algorithms in the malicious model, for horizontally and vertically partitioned data sets, respectively. So far as we know, our algorithms are the first perceptron learning algorithms that can protect data privacy in the malicious model.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The word adversary is mainly used in the cryptography area. Here in our scenario, we use it to refer to a curious participant of data mining.
Here we neglect the cut-and-choose step. The reason is that the cut-and-choose steps are relatively light weighted in the secure comparison algorithm. Also same as [25], we assume at least every one in four generated candidate r is less than N.
So far, the known optimal size complexity for a boolean circuit to compute multiplication is \(\Upomega(l\log l)\) [13].
These do not include the messages for synchronization, or messages of challenge since the values of these messages can be arbitrary numbers and changing their values are meaningless.
References
Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMODREC: ACM SIGMOD record 29
Canetti R (2000) Security and composition of multiparty cryptographic protocols. J Cryptol 13(1):143–202
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the fifth IEEE international conference on data mining, IEEE Computer Society, pp 589–592
Chen T, Zhong S (2009) Privacy-preserving backpropagation neural network learning. IEEE Trans Neural Netw 20(10):1554–1564
Cramer R, Damgård I, Nielsen J (2001) Multiparty computation from threshold homomorphic encryption. In: EUROCRYPT: advances in cryptology: proceedings of EUROCRYPT
Cristofaro ED, Kim J, Tsudik G (2010) Linear-complexity private set intersection protocols secure in malicious model. In: ASIACRYPT, Lecture Notes in Computer Science, vol 6477. Springer, pp 213–231
Dai W Crypto++ library (2010). http://www.cryptopp.com
Damgård I, Jurik M (2000) Efficient protocols based on probabilistic encryption using composite degree residue classes. BRICS, Department of Computer Science, University of Aarhus, Aarhus
Damgård I, Geisler M, Kroigard M (2008) Homomorphic encryption and secure comparison. Int J Appl Cryptogr 1:22–31
Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2. Wiley, New York
Fouque PA, Poupard G, Stern J (2001) Sharing decryption in the context of voting or lotteries. In: Financial cryptography, Springer, pp 90–104
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html
Fürer M (2007) Faster integer multiplication. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing, ACM, pp 57–66
Gilburd B, Schuster A, Wolff R (2004) Privacy-preserving data mining on data grids in the presence of malicious participants. In: HPDC. IEEE computer society, pp 225–234
Goldreich O (2010) A short tutorial of zero-knowledge. http://www.wisdom.weizmann.ac.il/oded/PS/zk-tut10.ps
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game. In: Proceedings of the nineteenth annual ACM symposium on theory of computing, ACM, pp 218–229
Goldwasser S, Micali S, Rackoff C (1985) The knowledge complexity of interactive proof-systems. In: Proceedings of the seventeenth annual ACM symposium on theory of computing, ACM, pp 291–304
Heer GR (1993) A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Proceedings of the international seminar on statistical confidentiality, pp 261–271
Kantarcioglu M, Kardes O (2009) Privacy-preserving data mining in the malicious model. Int J Inf Comput Secur 2:353–375
Laur S, Lipmaa H, Mielikänen T (2006) Cryptographically private sup-a port vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 618–624
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed em mixture modeling. Knowl Inf Syst 8(1):68–81
Lindell Y, Pinkas B (2007) An efficient protocol for secure two-party computation in the presence of malicious adversaries. Advances in cryptology- EUROCRYPT 2007, pp 52–78
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Bellare M (ed) CRYPTO, volume 1880 of Lecture Notes in Computer Science. Springer, New York, pp 36–54
Minsky M, Papert S (1969) Perceptrons: an introduction to computational geometry. MIT Press, Cambridge
Nishide T, Ohta K (2007) Multiparty computation for interval, equality, and comparison without bit-decomposition protocol. Public key cryptography PKC 2007, pp 343–360
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Advances in cryptology EUROCRYPT99, Springer, pp 223–238
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408
Shah D, Zhong S (2007) Two methods for privacy preserving data mining with malicious participants. Inf Sci 177(23):5468–5483
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) KDD. ACM, New York, pp 206–215
Vaidya J, Kantarcioglu M, Clifton C (2008) Privacy-preserving naive bayes classification. VLDB J 17(4):879–898
Author information
Authors and Affiliations
Corresponding author
Appendix: Security proofs
Appendix: Security proofs
In this section, we give formal security proofs for our algorithms in the malicious model. In particular, we adopt the security definitions in [19], which adopt the security definitions in [5] and apply them in the two-party case. The security of a protocol is formally defined by comparing the executions of this protocol in the real-life model that an active static adversary is present and in the ideal model that an incorruptible trusted party is available also.
Definition 1
The Real-Life Model (two-party case) [5] Let π be a two-party protocol and \({k\in\mathbb{N}}\) be the security parameter. In the real-life model, each party \(P^i\,(i\in\{0,1\})\) has a secret input x i s and a public input x i p . After executing the protocol, each party gets a private output y i j and returns a public output y i p . Let A be an adversary that can corrupt one of the two players and \(C\in\{0,1\}\) be the index of the corrupted party. A receives the public input and output of all parties.
Let \(\overrightarrow{x}=(x^0_s,x^0_p,x^1_s,x^1_p)\) be the two parties’ input, \(\overrightarrow{r} = (r^0,r^1,r^A)\) be the random input of P 0, P 1 and A, and \(a\in{\{0,1\}}^{\ast}\) be the A’s auxiliary input. After the execution of π in the real-life model with given input \(\overrightarrow{x}\) and under the attack of A, denote by \({ADVR}_{\pi,A}(k,\overrightarrow{x},C,a,\overrightarrow{r})\) and \({EXEC}^i_{\pi,A}(k,\overrightarrow{x},C,a,\overrightarrow{r})\) the output of the adversary and the output of party P i, respectively. Let
and denote by \({EXEC}_{\pi,A}(k,\overrightarrow{x},C,a)\) the random variable \({EXEC}_{\pi,A}(k,\overrightarrow{x},C,a,\overrightarrow{r})\) when \(\overrightarrow{r}\) is uniformly chosen.
Finally, we define a distribution ensemble EXEC π,A with security parameter k and index \((\overrightarrow{x},C,a)\) by
Definition 2
The Ideal Model (two-party case) [5] Let \({f: \mathbb{N}\times{({\{0,1\}}^\ast)}^{4}\times{\{0,1\}}^\ast\to{({\{0,1\}}^\ast)}^{4}}\) be a polynomial-time-bounded probabilistic two-party function. Define the inputs and outputs of f as follows
In the ideal model, each party \(P^i\,(i\in\{0,1\})\) sends her input (x i s , x i p ) to an incorruptible trusted party. The party draws r uniformly random and returns (y i j , y i p ) to party P i. The entire procedure takes place in the presence of an active static adversary S. At the beginning of the procedure, S sees both parties’ public inputs and also the corrupted party P C’s private input, substitutes (x C s , x i j ) with \(({x^C_s}^\prime,{x^C_p}^\prime)\) of his choice. Therefore, f is evaluated by the trusted party using the modified inputs. Similarly, we define
and a distribution ensemble by
Definition 3
Security in the Real-Life Model (two-party case) [5] Let f be a two-party function and let π be a two-party protocol. π securely evaluates f in the malicious model if for any polynomial-time bounded active static adversary A, there exists an polynomial-time bounded ideal-model adversary S such that
where \(\stackrel{c}{\approx}\) denotes the computational indistinguishability between two distribution ensembles.
We need to combine several secure function evaluation protocols to construct our perceptron learning protocol. In order to prove the security of the composed protocol, we first prove the composed protocol is secure when these secure sub-protocols can be called as oracles (the corresponding model is called hybrid model [5]). Then, using the composition theorem in [2], we can conclude the composed protocol that replaces the oracle calls with corresponding secure protocols is still secure.
Definition 4
The Hybrid Model (two-party case) [5] In the \((g_1,\ldots, g_m)\)-hybrid model, the execution of a protocol π proceeds as in the real-life model, except that the parties have oracle access to a trusted party for evaluating m two-party functions \((g_1,\ldots, g_m).\) These functions are evaluated as in the ideal model. Similarly we define the protocol’s output with the distribution ensemble:
Definition 5
Security in the Hybrid Model (two-party case) [5] Similar to the security definition in the real-life model, the security in the hybrid model can be defined by requiring for any adversary S in the hybrid model, there exists a polynomial-time bounded adversary S such that
Now we prove our algorithms’ security in the hybrid model by replacing the secure fundamental algorithms PKP, PMC, TTD, SVE, SVB, SXR, SSB, SVZ, and SCA used in our algorithms with the corresponding oracle calls to the trusted party in the hybrid model.
Theorem 1
Given the fundamental algorithms above are secure in the malicious model, Algorithm 2 is secure in the hybrid model.
Proof
Without loss of generality, we assume P 0 is corrupted by the adversary A in the hybrid model. For simplicity of presenting, we will use A as a subroutine, as well as the simulators for the secure algorithms in constructing the corresponding adversary in the ideal model.
For any adversary A operating in the hybrid model, given the description of A, the private inputs \({\bf X}^{\bf \prime}_{\bf 1},\ldots, {\bf X}^{\bf \prime}_{\bf n0}\) and the final output \(\overline{{\bf W}},\) we construct an adversary S A in the ideal model as follows:
In step 0.1, S A runs A to get \({\bf W}^{\bf 0}, \overline{{\bf W}^{\bf 0}}\) and \(PKP(\overline{{\bf W}^{\bf 0}})\). S A runs the S PKP , the simulator of POK, by giving the state of A and \(\overline{{\bf W}^{\bf 0}}\) as the inputs to S PKP . If S PKP outputs that the proof fails, S A terminates the protocol, otherwise, sets the state of A the state returned by S PKP . S A simulates the honest party P 1 by generating random W 1 and constructing correct zero-knowledge proof. S A feeds A with the proof. Then, S A runs A to compute \(\overline{{\bf W}} = \overline{{\bf W}^{\bf 0}}+_h\overline{{\bf W}^{{\bf 1}}}{\bf .}\)
In step 0.2, S A runs A to get \(\overline{{\bf X}^{\bf \prime}_{\bf k}}\) and \(PKP(\overline{{\bf X}^{\bf \prime}_{\bf k}})\) for every \({\bf X}^{\bf \prime}_{\bf k}\) that P 0 owns, and feeds S PKP with state of A and the proof \(PKP(\overline{{\bf X}^{\bf \prime}_{\bf k}}).\) If the simulator outputs that the proof fails, S A terminates the protocol, otherwise, sets the state of A the state returned by the simulator. S A simulates the honest party P 1 by generating random \({\bf Y}^{\bf \prime}_{\bf k}\) for all \({\bf Y}^{\bf \prime}_{\bf k}\) that P 1 owns and constructing correct zero-knowledge proofs. S A feeds A with these proofs.
In step 0.3, S A runs A to get \(\overline{{\varvec{\Updelta}} {\bf W}}.\)
In step 1.1, S A runs A to update \(\overline{{\bf W}}. \)
In step 2.1, for any sample \({\bf X}^{\bf \prime}_{\bf k}\) that belongs to P 0. S A runs A to get \(\overline{{\bf W}\cdot{\bf X}^{\bf \prime}_{\bf k}}\) and proofs \(PMC(\overline{x_{kj}^{\prime}}, \overline{w_j}, \overline{{x^{\prime}_{kj}} \cdot w_j}\)) for every \(j\in[0,p].\) Then, S A feeds A’s state and the proofs to S PMC , the simulator of PMC, sequentially. If any proof fails according to the outputs of the simulator, S A terminates the protocol, otherwise, sets the state of A to the state returned by the last run of S PMC . For any sample \({\bf X}^{\bf \prime}_{\bf k}\) that belongs to P 1, S A simulates the honest party P 1 by computing \(\overline{{\bf W}\cdot{\bf X}^{\bf \prime}_k},\) generating \(PMC(\overline{x_{kj}^{\prime}}, \overline{w_j}, \overline{{x^{\prime}_{kj}} \cdot w_j})\) for every \(j\in[0,p]\) correctly and feeds A with these proofs.
In step 2.2, S A feeds S SCA , the simulator of SCA, with state of A and \(\overline{{\bf W}\cdot{\bf X}^{\bf \prime}_{\bf k}},\,\overline{0}\) as inputs. If the simulator outputs that the computation is incorrect, S A terminates the protocol, otherwise, sets the state of A the state returned by the simulator.
In step 2.3, if sample X k is owned by P 0, S A runs A to get \(\overline{{\bf X}^{\bf \prime}_{\bf k}\cdot(1-r_k)},\) and the proofs \(PMC(\overline{{x^{\prime}_{kj}}}, \overline{1-r_k}, \overline{{x^{\prime}_{kj}} \cdot (1-r_k)})\) for every \(j\in[0,p].\) S A feeds the state of A and \(PMC(\overline{{x^{\prime}_{kj}}}, \overline{1-r_k}, \overline{{x^{\prime}_{kj}} \cdot (1-r_k)})\) as the input of S P MC sequentially for \(j\in[0,p], \) if any proofs fails, S A terminates the protocol, otherwise, sets the state of A the state returned by the last run of the simulator. Then, S A runs A to update the \(\overline{{\varvec{\Updelta} {\bf W}}}.\) If sample X k is owned by P 1, S A simulates honest party P 1 by constructing the required zero-knowledge proofs and feeds A with these proofs. Then, S A runs A to update the \(\overline{{\varvec{\Updelta}} {\bf W}}.\)
Note that, in each step, S A either runs A directly or uses simulators to simulate the real world by passing the ciphertexts or zero-knowledge proofs as inputs. It is straightforward to see A’s states in the first case are identical in both two worlds. Also in the second case, the ciphertexts and the zero-knowledge proofs are generated using a semantically secure cipher, and the views of A in both worlds are computationally indistinguishable. Therefore, in both cases, the states of A in both worlds are identical in both worlds for every step, and the outputs in both worlds should be computationally indistinguishable. □
Theorem 2
Given the fundamental sub-algorithms are secure in the malicious model, Algorithm 3 is secure in the hybrid model.
Proof
Without loss of generality, we assume P 0 is corrupted by the adversary A in the hybrid model. For simplicity of presenting, we will use A as a subroutine, as well as the simulators for the secure algorithms in constructing the corresponding adversary in the ideal model.
For any adversary A operating in the hybrid model, given the description of A, the private inputs \({\bf X}^{\bf \prime}_{\bf 1},\ldots, {\bf X}^{\bf \prime}_{\bf n}\) and the final output \(\overline{{\bf W}},\) we construct an adversary S A in the ideal model as follows:
In step 0.1, S A runs A to get \({\bf W}^{\bf 0}, \overline{{\bf W}^{\bf 0}}\) and \(PKP(\overline{{\bf W}^{\bf 0}}).\) S A runs the S PKP , the simulator of POK, by giving the state of A and \(\overline{{\bf W}^{\bf 0}}\) as the inputs to S PKP . If S PKP outputs that the proof fails, S A terminates the protocol, otherwise, sets the state of A the state returned by S PKP . S A simulates the honest party P 1 by generating random W 1 and constructing correct zero-knowledge proof. S A feeds A with the proof. Then, S A runs A to compute \(\overline{{\bf W}} = \overline{{\bf W}^{\bf 0}}+_h\overline{{\bf W}^{{\bf 1}}}\).
In step 0.2, S A runs A to get \(\overline{{\bf X}^{\bf \prime}_{\bf k}}\) and \(PKP(\overline{{\bf X}^{\bf \prime}_{\bf k}})\) for every \({\bf X}^{\bf \prime}_{\bf k}\) that P 0 owns, and feeds S PKP with state of A and the proof \(PKP(\overline{{\bf X}^{\bf \prime}_{\bf k}}).\) If the simulator outputs that the proof fails, S A terminates the protocol, otherwise, sets the state of A the state returned by the simulator. S A simulates the honest party P 1 by generating random \({\bf Y}^{\bf \prime}_{\bf k}\) for all \({\bf Y}^{\bf \prime}_{\bf k}\) that P 1 owns and constructing correct zero-knowledge proofs. S A feeds A with these proofs.
In step 0.3, S A runs A to get \(\overline{{\varvec{\Updelta} {\bf W}}}.\)
In step 1.1, S A runs A to update \(\overline{{\bf W}}.\)
In step 2.1, S A runs A to get \(\overline{{\bf W}^{\bf x}\cdot{\bf X}^{\bf \prime}_{\bf k}}\) and proofs \(PMC(\overline{x_{kj}^{\prime}}, \overline{w^x_j}, \overline{{x^{\prime}_{kj}} \cdot w_j^x})\) for every \(j\in[0,p].\) Then S A feeds A’s state and the proofs to S PMC , the simulator of PMC, sequentially. If any proof fails according to the outputs of the simulator, S A terminates the protocol, otherwise, sets the state of A to the state returned by the last run of S PMC . For any sample \({\bf Y}^{\bf \prime}_{\bf k}\) that belongs to P 1, S A simulates the honest party P 1 by computing \(\overline{{\bf W}^{\bf y}\cdot{{\bf Y}^{\bf \prime}_{\bf k}}},\) generating \(PMC(\overline{y_{kj}^{\prime}}, \overline{w^y_j}, \overline{{y^{\prime}_{kj}} \cdot w_j^y})\) for every \(j\in[0,q]\) correctly and feeds A with these proofs.
In step 2.2, S A runs A to get \(\overline{{\bf Z}_{\bf k}^{\bf \prime}\cdot{{\bf W}}}.\)
In step 2.3, given the state of A, S A feeds S SCA , the simulator of SCA, with \(\overline{{\bf Z}^{\bf \prime}_{\bf k}\cdot{\bf W}},\,\overline{0}\) as inputs. If the simulator outputs that the computation is incorrect, S A terminates the protocol, otherwise, sets the state of A the state returned by the simulator.
In step 2.4, S A runs A to get \(\overline{{\bf X}^{\bf \prime}_{\bf k}\cdot(1-r_k)},\) and the proofs \(PMC(\overline{{x^{\prime}_{kj}}}, \overline{1-r_k}, \overline{{x^{\prime}_{kj}} \cdot (1-r_k)})\) for every \(j\in[0,p].\) S A feeds \(PMC(\overline{{x^{\prime}_{kj}}}, \overline{1-r_k}, \overline{{x^{\prime}_{kj}} \cdot (1-r_k)})\) as the input of S P MC sequentially for \(j\in[0,p], \) if any proofs fails, S A terminates the protocol, otherwise, sets the state of A the state returned by the last run of the simulator. Then, S A simulates the honest party P 1 by computing the \(\overline{{\bf Y}^{\bf \prime}_{\bf k}\cdot(1-r_k)}\) and generating correct proofs \(PMC(\overline{{y^{\prime}_{kj}}}, \overline{1-r_k}, \overline{{y^{\prime}_{kj}} \cdot (1-r_k)})\) for every \(j\in[0,q].\) S A feeds A with these proofs.
In step 2.5, S A runs A to update the \(\overline{{\varvec{\Updelta}} {\bf W}}.\)
Similarly, in each step, S A either runs A directly or uses simulators to simulate the real world by passing the ciphertexts or zero-knowledge proofs as inputs. It is straightforward to see A’s states in the first case are identical in both two worlds. Also in the second case, the ciphertexts and the zero-knowledge proofs are generated using a semantically secure cipher, the views of A in both worlds are computationally indistinguishable. Therefore, in both cases, the states of A in both worlds are identical in both worlds for every step, and the outputs in both worlds should be computationally indistinguishable. □
Rights and permissions
About this article
Cite this article
Zhang, Y., Zhong, S. Privacy preserving perceptron learning in malicious model. Neural Comput & Applic 23, 843–856 (2013). https://doi.org/10.1007/s00521-012-1006-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1006-2