Skip to main content
Log in

High-performance secure multi-party computation for data mining applications

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Secure multi-party computation (MPC) is a technique well suited for privacy-preserving data mining. Even with the recent progress in two-party computation techniques such as fully homomorphic encryption, general MPC remains relevant as it has shown promising performance metrics in real-world benchmarks. Sharemind is a secure multi-party computation framework designed with real-life efficiency in mind. It has been applied in several practical scenarios, and from these experiments, new requirements have been identified. Firstly, large datasets require more efficient protocols for standard operations such as multiplication and comparison. Secondly, the confidential processing of financial data requires the use of more complex primitives, including a secure division operation. This paper describes new protocols in the Sharemind model for secure multiplication, share conversion, equality, bit shift, bit extraction, and division. All the protocols are implemented and benchmarked, showing that the current approach provides remarkable speed improvements over the previous work. This is verified using real-world benchmarks for both operations and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. In [13] the authors actually transform the problem so that it is enough to use just \(2n\) bits. However, the transformation assumes that bit shifts are cheap, making it impractical in the current MPC setting.

  2. We do not need to use Algorithm 12 because we do not introduce new digits on the left like we would in the case of a normal bit shift.

  3. Note that a bit shift can be used for efficient comparison as the highest bit of \(x\) is just \( x \gg 31\).

References

  1. Ben-David, A., Nisan, N., Pinkas, B.: FairplayMP: a system for secure multi-party computation. In: CCS ’08: Proceedings of the 15th ACM conference on Computer and Communications Security, pp. 257–266. ACM, New York, NY, USA (2008). http://doi.acm.org/10.1145/1455770.1455804

  2. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: A framework for fast privacy-preserving computations. In: ESORICS 2008: Proceedings of the 13th European Symposium on Research in Computer Security, Málaga, Spain, Oct 6–8, 2008, LNCS, vol. 5283, pp. 192–206. Springer (2008)

  3. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacy-preserving computations. Cryptology ePrint Archive, Report 2008/289 (2008). http://eprint.iacr.org/

  4. Bogdanov, D., Talviste, R., Willemson, J.: Deploying secure multi-party computation for financial data analysis. (short paper). In: Keromytis, A. (ed.) Proceedings of the 16th International Conference on Financial Cryptography and Data Security. FC’12. Lecture Notes in Computer Science, vol. 7397, pp. 57–64. Springer Berlin/Heidelberg (2012)

  5. Bogetoft, P., Christensen, D.L., Damgård, I., Geisler, M., Jakobsen, T.P., Krøigaard, M., Nielsen, J.D., Nielsen, J.B., Nielsen, K., Pagter, J., Schwartzbach, M.I., Toft, T.: Secure multiparty computation goes live. In: FC ’09: Proceedings of the 13th International Conference on Financial Cryptography, pp. 325–343 (2009)

  6. Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.: SEPIA: Privacy-Preserving aggregation of multi-domain network events and statistics. In: Proceedings of the USENIX Security Symposium ’10, pp. 223–239. Washington, DC, USA (2010)

  7. Canetti, R.: Universally composable security: A new paradigm for cryptographic protocols. In: FOCS ’01: 42nd Annual Symposium on Foundations of Computer Science, pp. 136–145 (2001)

  8. Damgård, I., Fitzi, M., Kiltz, E., Nielsen, J., Toft, T.: Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In: Proceedings of The 3rd Theory of Cryptography Conference, TCC 2006, LNCS, vol. 3876. Springer (2006)

  9. Doganay, M.C., Pedersen, T.B., Saygin, Y., Savaş, E., Levi, A.: Distributed privacy preserving \(k\)-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, PAIS ’08, pp. 3–11 (2008)

  10. Even, G., Seidel, P.M., Ferguson, W.E.: A parametric error analysis of Goldschmidt’s division algorithm. J. Comput. Syst. Sci. 70(1), 118–139 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010). URL http://archive.ics.uci.edu/ml

  12. Geisler, M.: Cryptographic Protocols: Theory and Implementation. Ph.D. thesis, Aarhus University (2010)

  13. Granlund, T., Montgomery, P.L.: Division by invariant integers using multiplication. In: PLDI ’94: Proceedings of the SIGPLAN ’94 Conference on Programming Language Design and Implementation, pp. 61–72 (1994)

  14. Henecka, W., Kögl, S., Sadeghi, A.R., Schneider, T., Wehrenberg, I.: TASTY: tool for automating secure two-party computations. In: CCS ’10: Proceedings of the 17th ACM conference on Computer and Communications Security, pp. 451–462. ACM (2010)

  15. Malka, L.: VMCrypt: modular software architecture for scalable secure computation. In: Chen, Y., Danezis, G., Shmatikov, V. (eds.) Proceedings of the 18th ACM Conference on Computer and Communications Security. CCS’11. pp. 715–724 (2011)

  16. Parhami, B.: Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press, Oxford (2010)

  17. Rodeheffer, T.: Software integer division. Microsoft Research Tech, Report MSR-TR-2008-141 (2008)

  18. SecureSCM. Technical report D9.1: Secure Computation Models and Frameworks. http://www.securescm.org (2008)

  19. Vaidya, J., Clifton, C.: Privacy-preserving \(k\)-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, KDD ’03, pp. 206–215 (2003)

Download references

Acknowledgments

Authors Dan Bogdanov, Margus Niitsoo and Jan Willemson acknowledge support from the European Regional Development Fund through the Estonian Center of Excellence in Computer Science (EXCS). Authors Dan Bogdanov and Jan Willemson acknowledge support from the European Regional Development Fund through the Software Technology and Applications Competence Centre (STACC) and from the Estonian Science Foundation through grant No. 8124. Author Dan Bogdanov also acknowledges support from the European Social Fund through the Estonian Doctoral School in Information and Communication Technology (IKTDK) and the Doctoral Studies and Internationalisation Programme (DoRa). Author Tomas Toft is supported by Confidential Benchmarking, financed by The Danish Agency for Science, Technology and Innovation; and acknowledges support from the Danish National Research Foundation and The National Science Foundation of China (under the grant 61061130540) for the Sino-Danish Center for the Theory of Interactive Computation, within which part of this work was performed, as well as from the Center for Research in the Foundations of Electronic Market (supported by the Danish Strategic Research Council) within which part of this work was performed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Bogdanov.

Appendices

Appendix 1: Bit shift protocols under a public shift

The protocols in this section allow us to perform two more standard bit-level operations on shared values, namely left and right shifts (\(\ll \) and \(\gg \)).Footnote 3

First, note that the left shift protocol is actually trivial, since left shift by \(p\) positions can be accomplished by multiplying the shared value by a public constant \(2^p\). This, in turn, can be done by locally multiplying all the shares by the same constant. Since no messages are exchanged, the protocol is trivially secure against a passive adversary.

Right shift, on the other hand, is more complicated because of the unknown overflow carry modulo \(2^n\). Thus, in order to build a right shift protocol, we first need a protocol to compute the overflow. This is considerably easier to do if the value in question is (temporarily) secret-shared between just two parties, because then the overflow is guaranteed to be either \(0\) or \(1\). We thus present two routines: Algorithm 10 for resharing a value to just two parties and Algorithm 11 for computing the overflow bit once the values are shared in this way.

figure a10

Theorem 9

Algorithm 10 is correct and secure against one passive attacker.

Proof

Correctness of the Algorithm is straightforward:

$$\begin{aligned} u^{\prime }&= u^{\prime }_1+u^{\prime }_2+u^{\prime }_3=0+u_2+r_2+u_3+r_3\\&= u_2+r_2+u_3+u_1-r_2=u. \end{aligned}$$

For security, note that \(\mathcal P _1\) has no incoming messages, whereas the only incoming messages for \(\mathcal P _2\) and \(\mathcal P _3\) are \(r_2\) and \(u_1-r_2\), respectively. These messages can be easily simulated with a random value. \(\square \)

figure a11

The correctness proof for Algorithm 11 is somewhat more complicated.

Theorem 10

Algorithm 11 is correct and secure against one passive attacker.

Proof

To prove correctness, we need to compute the overflow bit \(\lambda \). The overflow occurs exactly when \(u^{\prime }_2+u^{\prime }_3\ge 2^n\), or equivalently \(u^{\prime }_2\ge 2^n-u^{\prime }_3\). Note that modulo \(2^n\) the value \(2^n-u^{\prime }_3\) is represented just as \(-u^{\prime }_3\) (unless \(u^{\prime }_3=0\), which has to be treated separately). Thus,

$$\begin{aligned} \lambda = 1 \quad \Longleftrightarrow \quad u^{\prime }_2 \ge (-u^{\prime }_3)\ \text{ mod}\,{2^n}\, \wedge \, u^{\prime }_3\ne 0. \end{aligned}$$

In order to perform the comparison between \(u^{\prime }_2\) and \(-u^{\prime }_3\), we first run Algorithm 4 and obtain a bitwise shared vector \(\overline{[\![s]\!]}\), which contains all zeroes if \(u^{\prime }_2=-u^{\prime }_3\text{ mod}\,{2^n}\), or has just one bit in the highest position where they differ. Thus, the dot product \(\bigoplus _{i=0}^{n-1} \overline{[\![s]\!]}^{(i)}\wedge \overline{[\![-u^{\prime }_3]\!]}^{(i)}=1\) iff \(u^{\prime }_2 < -u^{\prime }_3\ \text{ mod}\ {2^n}\) and hence \(\lambda ^0=1\) iff \(u^{\prime }_2\ge -u^{\prime }_3\ \text{ mod}\ {2^n}\), as required. The only exception appears when \(u^{\prime }_3=0\), in which case, no overflow can occur, but \(\lambda ^0\) is set to \(1\). This mistake is easy to correct locally by \(\mathcal P _3\) who has the original \(u^{\prime }_3\) and can flip his own share of \(\lambda ^0\) in case \(u^{\prime }_3\) happens to be \(0\).

The security of the protocol is still trivial as it is just a composition of perfectly simulatable protocols. \(\square \)

We are now ready to present the right shift protocol. The main idea behind the public right shift protocol is to convert the input to a sum of two values (known to two of the parties) and then shift these down. This leaves us with two problems. First, discarding the low bits discards the carry bit for the least significant position that is retained. Second, the top carry bit of the addition would previously implicitly disappear as we consider addition modulo \(2^n\). Since the values have been shifted down, the carry bit will be present. The bulk of the work of the protocol consists of determining and correcting for these two carry bits.

The protocol itself is presented as Algorithm 12.

figure a12

Theorem 11

Algorithm 12 is correct and secure against one passive attacker.

Proof

Correctness of the algorithm follows from the discussion above. Since \(u^{\prime }_2+u^{\prime }_3=u+\lambda _1 2^n\), we have

$$\begin{aligned} v&= v_1+v_2+v_3\ \text{ mod}\ {2^n}=(u^{\prime }_2 \gg p) + (u^{\prime }_3 \gg p)\ \text{ mod}\ {2^n}\\&= u\gg p + \lambda _1 2^{n-p} - \lambda _2\ \text{ mod}\ {2^n}, \end{aligned}$$

hence

$$\begin{aligned} u\gg p = v - \lambda _1 2^{n-p} + \lambda _2\ \text{ mod}\ {2^n}. \end{aligned}$$

For security note that we are only composing perfectly simulatable subroutines. \(\square \)

This protocol can also be used for extracting the most significant bit for comparison purposes. As it is also slightly more efficient than the full bit extraction, we use it as the basis of the comparison in the current implementation for the comparison operator.

Appendix 2: Error calculation of Goldschmidt division

We will present an analysis of the effects of rounding errors. This is done by looking at the divergence from the “ideal” computation where no rounding takes place and for which the error terms can be fairly easily estimated. A similar analysis was performed in [10]. Their analysis was more detailed, but relied on using floating-point numbers, making it hard to apply it here directly.

Let \(N_i, D_i, F_i, c_0\) denote the actual real numbers encountered during the run of Newton Goldschmidt iterations as described in Sect. 9. In Sharemind, we are using the approximations of values \(x\) by fixed point numbers \(\widetilde{x}=2^{-n^{\prime }}\cdot \widehat{x}\) being represented by \(\widehat{x}\in \mathbb Z _{m^{\prime }}\) for some \(m^{\prime }\).

Recall that both the sequences \((N_i)\) and \((D_i)\) were converging from below to \(\frac{u}{v}\) and \(1\), respectively. To preserve the convergence from below in the presence of errors, extra care needs to be taken with rounding errors to make sure they are also one-sided.

Let the differences between the real values \(N_i, D_i\) and their approximations be \(\Delta N_i\) and \(\Delta D_i\) selected so that \(\widetilde{N_i}=N_i+\Delta N_i\) and \(\widetilde{D_i}=D_i-\Delta D_i\). Note that on line 13 of Algorithm 9, the value of \(\widehat{N_k}\) is always rounded up and the value of \(\widehat{D_k}\) is always rounded down. This guarantees that we have \(\Delta N_k,\Delta D_k\ge 0\) for all \(k\ge 1\). When the shares of \(\widehat{\widehat{N_k}}\) and \(\widehat{\widehat{D_k}}\) are right shifted to convert the elements back to the precision \(2^{-n^{\prime }}\) (Algorithm 9, line 13), additional truncation errors are introduced. Since there are three computing parties and we shift by \(n^{\prime }\) positions, the errors occurring at both upwards and downwards rounding are bounded by \(\delta =3\cdot 2^{-n^{\prime }}\). Thus, for \(k\ge 1\), we obtain

$$\begin{aligned} \widetilde{D_{k+1}}&> \widetilde{D_{k}}\cdot \widetilde{F_{k+1}} - \delta = \widetilde{D_{k}}\cdot (2-\widetilde{D_{k}}) - \delta \\&= (D_k-\Delta D_k)\cdot (2-D_k+\Delta D_k) - \delta \\&= D_{k+1} -2\Delta D_k(1-D_k)-(\Delta D_k)^2-\delta \end{aligned}$$

and

$$\begin{aligned} \widetilde{N_{k+1}}&\le \widetilde{N_{k}}\cdot \widetilde{F_{k+1}}+\delta = \widetilde{N_{k}}\cdot (2-\widetilde{D_{k}})+\delta \\&= (N_k+\Delta N_k)\cdot (2-D_k+\Delta D_k)+\delta \\&= N_{k+1}+N_k\Delta D_k + \Delta N_k(2-D_k+\Delta D_k)+\delta . \end{aligned}$$

This implies

$$\begin{aligned} \Delta D_{k+1}&= D_{k+1}-\widetilde{D_{k+1}} < 2\Delta D_k(1-D_k) +(\Delta D_k)^2+\delta \\&\le 2\Delta D_k 2^{-2^k} +(\Delta D_k)^2+\delta \end{aligned}$$

and

$$\begin{aligned} \Delta N_{k+1}&= \widetilde{N_{k+1}} - N_{k+1} \\&\le \Delta N_k(2-D_k+\Delta D_k)+N_k\Delta D_k +\delta \\&< \Delta N_k(1 + 2^{-2^k} + \Delta D_k)+\frac{u}{v}\Delta D_k+\delta . \end{aligned}$$

Since the first rounding error is introduced only after multiplication by \(F_1\), we have \(\Delta D_1,\Delta N_1\le \delta \). Thus, we can iterate these recurrent inequalities to get bounds for \(\Delta D_k,\Delta N_k\) in terms of \(\frac{u}{v}\) and \(\delta \).

In order to guarantee that truncation of the result will lead to a proper value, we will have to ensure that the end result \(R\) satisfies \(\lfloor \frac{u}{v} \rfloor \le \lfloor R \rfloor <\lfloor \frac{u}{v} \rfloor +1 \). Let \(1-D_k<2^{-p}\), in which case \(N_k < \frac{u}{v}(1-2^{-p})\) since \(\frac{N_k}{D_k} = \frac{u}{v}\). Recall that \(\widehat{c_0}\) was chosen so that \(0.5 \le v \widetilde{c_0} <1\), hence \(\widetilde{c_0} < \frac{1}{v} \le 2\widetilde{c_0}\). Consequently, \(\widetilde{N_k}\ge N_k<\frac{u}{v}(1-2^{-p})> \frac{u}{v} - u \widetilde{c_0} 2^{-p+1}\). Taking \(R= \widetilde{N_k}+\Delta \) where \(\Delta = u \widetilde{c_0} 2^{-p+1}\) thus guarantees \(R\ge \frac{u}{v}\) and \(\lfloor R \rfloor \ge \lfloor \frac{u}{v} \rfloor \).

We are left to show that \(R < \lfloor \frac{u}{v} \rfloor +1\). Let \(\Delta N_k < a + b \frac{u}{v}\) as obtained after iterating the above recurring inequalities \(k\) times. Then, \(R = \widetilde{N_k}+u \widetilde{c_0} 2^{-p+1} < N_k + a + 2 u \widetilde{c_0} (2^{-p} + b)\). Since \(N_k<\frac{u}{v}\le (\lfloor \frac{u}{v}\rfloor +1)-\frac{1}{v}<(\lfloor \frac{u}{v}\rfloor +1)-\widetilde{c_0}\), it suffices to show that \(a + 2 u \widetilde{c_0} (2^{-p} + b)< \widetilde{c_0}\), or equivalently \(\frac{a}{\widetilde{c_0}} + 2 u b + u2^{-p+1} < 1\). Since \(2^{-n}\le \widetilde{c_0} < 1\) and \(0\le u < 2^n\), this can be achieved by showing \(2^n(a+2b+2^{-p+1})<1\).

For \(n=32\), the required inequality can be guaranteed by taking \(k=5,\,{n^{\prime }}= 37\), in which case \(p>40.68\) (if the first iteration is done with \(F_1=2\sqrt{2}-2D_0\)), \(a,b <0.2\times 2^{-32}\). These choices imply \(m=32 + (5+1)\times 37=254\).

Appendix 3: Benchmark diagrams

Figures 1, 2, 3, 4, 5, and 6 compare the running times for the protocols in this paper with the protocols in [2]. The range between the minimal and maximal result is shown where multiple experiments were conducted. Missing data points indicate that the protocol was too inefficient to perform at that input size. The axes on the diagrams are drawn on a logarithmic scale. Since the right shift protocol is also used to implement greater-than comparisons, we compared it with the greater-than comparison protocol from [2]. This is an honest comparison, since the greater-than comparison can be implemented in computing the difference on two values and finding the highest bit using the right shift operation.

Fig. 1
figure 1

Benchmark results for the multiplication conversion operation

Fig. 2
figure 2

Benchmark results for the share conversion operation

Fig. 3
figure 3

Benchmark results for the equality comparison operation

Fig. 4
figure 4

Benchmark results for the greater-than comparison operation

Fig. 5
figure 5

Benchmark results for the bit extraction operation

Fig. 6
figure 6

Benchmark results for the division operations

 

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bogdanov, D., Niitsoo, M., Toft, T. et al. High-performance secure multi-party computation for data mining applications. Int. J. Inf. Secur. 11, 403–418 (2012). https://doi.org/10.1007/s10207-012-0177-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-012-0177-2

Keywords

Navigation